Dataset Loaders
Dataset-specific preprocessing, split rules, and modality alignment are handled in each loader, while a unified training entry keeps all tasks on the same interface so comparisons stay fair and reruns remain reproducible.
A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Standardized evaluation across diverse multimodal tasks including emotion recognition, healthcare, and remote sensing.
Coverage of 37 datasets spanning 15+ modalities and 20+ predictive tasks across specialized domains.
Modular codebase designed for easy extension, allowing researchers to plug in new models and datasets effortlessly.
Task-specific evaluation with accuracy, macro-F1, AUPRC, and MSE, reported under a consistent protocol with multiple random seeds.
Explore the MULTIBENCH++ suite of 37 multimodal datasets across specialized domains. Click on each category to view the datasets. View full documentation →
Following the paper setup, MULTIBENCH++ evaluates 11 fusion baselines and advanced methods, including Transformer-centric feature fusion and decision-level logit fusion.
Concat
ConcatEarly
TensorFusion
LowRankTensorFusion
MultiplicativeInteractions2Modal
MultiplicativeInteractions3Modal
EarlyFusionTransformer
LateFusionTransformer
CrossAttentionFusion
CrossAttentionConcatFusion
MultiModalCrossAttentionFusion
MultiModalCrossAttentionConcatFusion
HierarchicalAttentionMultiToOne
HierarchicalAttentionOneToMulti
NLgate
MultimodalLateFusionClf
TMC (Trusted Multi-view Classification)
The implementation follows the benchmark design in the paper: unified data access, modular encoders and fusion blocks, and a standardized training/evaluation workflow. Source code is available on GitHub.
MULTIBENCH++ is designed as a reproducible evaluation stack rather than a single fixed model. Data handling, representation learning, and fusion logic are decoupled into reusable modules for cross-domain, repeatable method comparison.
This structure supports fair method comparison under a shared protocol while keeping new model integration straightforward.
Dataset-specific preprocessing, split rules, and modality alignment are handled in each loader, while a unified training entry keeps all tasks on the same interface so comparisons stay fair and reruns remain reproducible.
Modality-specific encoders convert heterogeneous inputs into compatible latent features for downstream fusion and task heads, and this separation allows controlled ablations or backbone replacement without changing the benchmark evaluation protocol.
The benchmark includes early fusion baselines, Transformer-centric feature interaction, and decision-level logit fusion modules reported in the paper, all organized for direct side-by-side evaluation where performance differences can be attributed to fusion strategy changes.
Training and evaluation follow a standardized routine with task-specific metrics, repeated runs, and fixed hardware/seed settings, while an automated tuning workflow improves reproducibility and reduces manual hyperparameter search overhead.
Comprehensive experimental results of multimodal fusion methods across specialized domains. The best result in each row is highlighted.
MultiBench++/
|-- datasets/ # Loaders spanning 37 datasets
|-- encoders/ # Unimodal backbones (BERT, ResNet)
|-- fusions/ # 11 benchmarked fusion methods
| |-- feature_fusion.py # Concat / TF / Transformer-centric methods
| `-- logit_fusion.py # LS / TMC
|-- baseline/ # Standardized experiment scripts
| `-- MAMI/baseline.py
|-- training_structures/ # Standardized training routines
|-- objective_functions/ # Loss functions & metrics
|-- configs/ # Hyperparameter configs
`-- requirements.txt # Dependencies
# 1. Environment Setup
conda create -n multibench python=3.9
conda activate multibench
pip install -r requirements.txt
# 2. Data Preparation
# Option A: follow MultiBenchplus-main/DATASET.md
# Option B: Baidu Netdisk: https://pan.baidu.com/s/11ITMTGO4KCnTLr05dnmThg?pwd=8rc9
# Place datasets under ./data/<DATASET_NAME>/
# 3. Run Experiments
cd baseline/MAMI
python baseline.py
@inproceedings{liang2021multibench,
title={MultiBench: Multiscale Benchmarks for Multimodal Representation Learning},
author={Liang, Paul Pu and Lyu, Yiwei and Fan, Xiang and Wu, Zetian and Cheng, Yun and Wu, Jason and Chen, Leslie Yufan and Wu, Peter and Lee, Michelle A and Zhu, Yuke and Salakhutdinov, Ruslan and Morency, Louis-Philippe},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021}
}
@inproceedings{xue2026multibench,
title={MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains},
author={Leyan Xue and Changqing Zhang and Kecheng Xue and Xiaohong Liu and Guangyu Wang and Zongbo Han},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}