The MPDD-AVG Challenge 2026 comprises two age-specific datasets — MPDD-Young and MPDD-Elderly — each featuring three complementary sub-tracks that explore different combinations of behavioral modalities and personality modeling. The challenge uniquely integrates semi-structured interview behavioral data with continuous gait monitoring from wearable sensors, enabling holistic assessment spanning cognitive-linguistic, affective-paralinguistic, and psychomotor domains.
This challenge is an updated version of MPDD2025 @ ACM MM 2025. Compared to the previous edition, MPDD-AVG introduces a new gait modality (IMU-based ambulatory monitoring), the G+P and A-V-G+P sub-tracks, and an extended annotation scheme including health condition labels.
The MPDD-Young dataset comprises data from 110 college students, investigating how academic stress, social environment, and personality traits contribute to depression in young adults. Participants underwent semi-structured interviews designed to assess academic stress, social functioning, and emotional well-being. Subsequently, participants walked naturally within a designated area while equipped with wearable IMU sensors.
Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (gender, age, birth region)
Classification tasks: Binary (normal / depressed) · Ternary (normal / mild / severe)
The MPDD-Elderly dataset comprises data from 110 older adults, examining how chronic illnesses, living conditions, and personality traits influence late-life depression manifestation. Participants engaged in semi-structured interviews and then walked freely within a designated area while wearing IMU sensors.
Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (age, gender, family situation, economic status) · Disease labels (endocrine, circulatory, nervous system)
Classification tasks: Binary · Ternary
As compared to existing datasets, MPDD-AVG significantly enhances both the breadth of behavioral modalities and the depth of individual difference annotations:
| Dataset | Audio-Visual | Gait | Depression | Personality | Gender | Age | Region | Disease |
|---|---|---|---|---|---|---|---|---|
| AVEC | ✓ | — | ✓ | — | ✓ | ✓ | — | — |
| DAIC-WoZ | ✓ | — | ✓ | — | ✓ | — | — | — |
| Pittsburgh | ✓ | — | ✓ | — | ✓ | ✓ | — | — |
| D-Vlog | ✓ | — | ✓ | — | ✓ | — | — | — |
| MMDA | ✓ | — | ✓ | — | ✓ | ✓ | — | — |
| EATD-Corpus | ✓ | — | ✓ | — | — | — | — | — |
| CMDC | ✓ | — | ✓ | — | ✓ | ✓ | — | — |
| MODMA | ✓ | — | ✓ | — | ✓ | ✓ | — | — |
| MPDD-AVG (Ours) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
80% of data used for training and validation; 20% for testing. Standardized splits provided.
Each of the two age-specific datasets (MPDD-Young and MPDD-Elderly) features three complementary sub-tracks:
Young adult depression detection focusing on 110 college students.
Elderly depression detection focusing on 110 older adults.
The Challenge employs comprehensive metrics to evaluate multimodal depression detection models across classification and regression tasks.
The final evaluation score for each track is calculated as:
Scoretrack = α · Macro-F1 + β · CCC + γ · κ
where α + β + γ = 1 and α = β = γ, reflecting the relative importance of classification performance, continuous score prediction, and diagnostic consistency.
MPDD-AVG 2026 adopts a two-tier evaluation framework: per-sub-track independent rankings that fairly compare systems on the same modality and population, and a cross-sub-track Generalization Award that recognises methods whose core design transfers robustly across different input modalities and age groups.
Each of the 6 sub-tracks has a dedicated submission channel on CodaLab and an independent leaderboard. Rankings within each sub-track are determined solely by the Track-Level Score defined above. A team may rank first in multiple sub-tracks.
Young-AV+P · Audio-Visual + Personality
Young-AVG+P · Full Multimodal + Personality
Young-G+P · Gait + Personality
Elderly-AV+P · Audio-Visual + Personality
Elderly-AVG+P · Full Multimodal + Personality
Elderly-G+P · Gait + Personality
Beyond individual sub-track winners, we present a special Generalization Award to the team whose method demonstrates the most consistent and robust performance across different modality combinations (A-V, A-V-G, G) and population groups (Young, Elderly). This award is designed to encourage the development of transferable, principled approaches rather than sub-track-specific tuning.
The G-Score is computed over all sub-tracks a team has participated in. It rewards high average performance while penalising inconsistency across sub-tracks, and grants a progressive coverage bonus for each additional sub-track entered beyond the eligibility threshold.
| Award | Ranking Basis | Eligibility |
|---|---|---|
| 🥇 Best System (× 6, one per sub-track) |
Highest Track-Level Score within the sub-track | ≥ 1 sub-track paper submitted |
| 🌐 Generalization Award | Highest G-Score across all participated sub-tracks | ≥ 2 sub-tracks paper submitted |
| 🔬 Best Personality Modeling | Largest average Track-Level Score gain when personality features (+P) are used vs. a personality-agnostic ablation, measured across all +P sub-tracks entered | ablation results in paper ≥ 2 +P sub-tracks |
We provide the following materials to all registered participants:
Submission access requests are now open. Each team is restricted to a maximum of 5 submission attempts per sub-track per day.
To be eligible for the final evaluation, each team must submit a system description paper via OpenReview (venue opens July 1, 2026). Papers must include thoroughly explained source code, well-trained models, and associated checkpoints. All submissions undergo peer review by the challenge technical program committee.