multimodal-datasets

Here are 23 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Nov 18, 2024
Jupyter Notebook

remyxai / VQASynth

Star

Compose multimodal datasets 🎹

dataset-generation spatial-reasoning synthetic-dataset-generation multimodal-deep-learning multimodal-datasets scene-reconstruction spatialvlm

Updated Jan 5, 2026
Python

drmuskangarg / Multimodal-datasets

Star

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available…

multimodal-datasets

Updated Jan 10, 2022

AnkurDeria / MFT

Star

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

deep-learning remote-sensing hyperspectral-image-classification multimodal-deep-learning multimodal-datasets transformer-models hsi-classification

Updated Dec 23, 2023
Jupyter Notebook

wisdomikezogwo / quilt1m

Star

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

vlm medical-dataset multimodal-datasets histopathology clip-model

Updated Jan 18, 2024
Python

yuanxiaosc / Multimodal-short-video-dataset-and-baseline-classification-model

Star

500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型（TensorFlow2.0）。

tensorflow-models classification-model multimodal-datasets video-dataset

Updated Jul 23, 2019
Jupyter Notebook

roboflow / rf100-vl

Star

Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"

computer-vision object-detection rf100 multimodal-datasets object-detection-benchmarks

Updated Mar 27, 2026
Python

marslanm / Multimodality-Representation-Learning

Star

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

cross-modal multimodal-deep-learning multimodal-datasets transformer-models multimodal-pre-trained-model vision-language-pretraining multimodal-applications multimodal-pretext

Updated Jun 16, 2025

piresramon / gpt-4-enem

Star

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

artificial-intelligence multimodal-datasets llms llm-inference

Updated Dec 6, 2024
Python

Yuco-Z / Awesome-Multi-Modal-Dialog

Star

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

dialogue awesome-list multimodal-learning multimodal multimodal-deep-learning multimodal-datasets paperlist dialogue-system multimodal-dialogue

Updated Jan 22, 2025

JunweiLiang / FVTA_MemexQA

Star

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

visual-question-answering vision-and-language multimodal-deep-learning multimodal-datasets multimodal-representation memex-question-answering memexqa-dataset

Updated Jul 1, 2019
Python

pspdada / SENTINEL

Star

[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".

image-captioning multimodal-datasets multimodal-large-language-models preference-alignment iccv2025

Updated Feb 12, 2026
Python

ddw2AIGROUP2CQUPT / Large-Scale-Multimodal-Face-Datasets

Star

Millions-Level Face/Human-Scene Image-Text Datasets

multimodal-datasets face-datasets human-datasets

Updated Jun 9, 2025

OlehOnyshchak / pyWikiMM

Star

Collects a multimodal dataset of Wikipedia articles and their images

Updated Mar 25, 2023
Python

VLR-CVC / DocVQA2026

Star

Official evaluation scripts and baseline prompts for the DocVQA 2026 (ICDAR 2026) Competition on Multimodal Reasoning over Documents.

competition vqa-dataset multimodal-datasets document-understanding

Updated Mar 16, 2026
Python

lujiaying / MUG-Bench

Star

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

multimodal-learning multimodal-datasets

Updated Dec 18, 2023
Python

deepmancer / vlm-toolbox

Star

Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation

deep-learning transformers pytorch deep-learning-library clip multi-granularity-dataset vision-framework multimodal-learning vision-and-language multimodal-deep-learning multimodal-datasets hierarchical-classification zero-shot-classification vision-language-transformer prompt-tuning soft-prompt-tuning

Updated Feb 16, 2025
Jupyter Notebook

NUSTM / EMDRC

Star

Towards Explainable Multimodal Depression Recognition for Clinical Interviews

dataset datasets depression mental-health affective-computing multimodal-datasets depression-analysis depression-detection

Updated Jan 28, 2025

gcunhase / AnnotatedMV-PreProcessing

Star

Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)

multimodal-datasets cognimuse

Updated Mar 1, 2021
Python

clp-research / language-models-multimodal-tasks

Star

Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"

language-model multimodal-learning multimodal-sentiment-analysis multimodal-datasets

Updated Aug 1, 2023
Python

Improve this page

Add a description, image, and links to the multimodal-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-datasets topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-datasets

Here are 23 public repositories matching this topic...

salesforce / LAVIS

remyxai / VQASynth

drmuskangarg / Multimodal-datasets

AnkurDeria / MFT

wisdomikezogwo / quilt1m

yuanxiaosc / Multimodal-short-video-dataset-and-baseline-classification-model

roboflow / rf100-vl

marslanm / Multimodality-Representation-Learning

piresramon / gpt-4-enem

Yuco-Z / Awesome-Multi-Modal-Dialog

JunweiLiang / FVTA_MemexQA

pspdada / SENTINEL

ddw2AIGROUP2CQUPT / Large-Scale-Multimodal-Face-Datasets

OlehOnyshchak / pyWikiMM

VLR-CVC / DocVQA2026

lujiaying / MUG-Bench

deepmancer / vlm-toolbox

NUSTM / EMDRC

gcunhase / AnnotatedMV-PreProcessing

clp-research / language-models-multimodal-tasks

Improve this page

Add this topic to your repo