Challenge — MPDD-AVG 2026

Overview

The MPDD-AVG Challenge 2026 comprises two age-specific datasets — MPDD-Young and MPDD-Elderly — each featuring three complementary sub-tracks that explore different combinations of behavioral modalities and personality modeling. The challenge uniquely integrates semi-structured interview behavioral data with continuous gait monitoring from wearable sensors, enabling holistic assessment spanning cognitive-linguistic, affective-paralinguistic, and psychomotor domains.

This challenge is an updated version of MPDD2025 @ ACM MM 2025. Compared to the previous edition, MPDD-AVG introduces a new gait modality (IMU-based ambulatory monitoring), the G+P and A-V-G+P sub-tracks, and an extended annotation scheme including health condition labels.

Dataset

MPDD-Young

The MPDD-Young dataset comprises data from 110 college students, investigating how academic stress, social environment, and personality traits contribute to depression in young adults. Participants underwent semi-structured interviews designed to assess academic stress, social functioning, and emotional well-being. Subsequently, participants walked naturally within a designated area while equipped with wearable IMU sensors.

Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (gender, age, birth region)

Classification tasks: Binary (normal / depressed) · Ternary (normal / mild / severe)

MPDD-Elderly

The MPDD-Elderly dataset comprises data from 110 older adults, examining how chronic illnesses, living conditions, and personality traits influence late-life depression manifestation. Participants engaged in semi-structured interviews and then walked freely within a designated area while wearing IMU sensors.

Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (age, gender, family situation, economic status) · Disease labels (endocrine, circulatory, nervous system)

Classification tasks: Binary · Ternary

Dataset Comparison

As compared to existing datasets, MPDD-AVG significantly enhances both the breadth of behavioral modalities and the depth of individual difference annotations:

Dataset	Audio-Visual	Gait	Depression	Personality	Gender	Age	Region	Disease
AVEC	✓	—	✓	—	✓	✓	—	—
DAIC-WoZ	✓	—	✓	—	✓	—	—	—
Pittsburgh	✓	—	✓	—	✓	✓	—	—
D-Vlog	✓	—	✓	—	✓	—	—	—
MMDA	✓	—	✓	—	✓	✓	—	—
EATD-Corpus	✓	—	✓	—	—	—	—	—
CMDC	✓	—	✓	—	✓	✓	—	—
MODMA	✓	—	✓	—	✓	✓	—	—
MPDD-AVG (Ours)	✓	✓	✓	✓	✓	✓	✓	✓

80% of data used for training and validation; 20% for testing. Standardized splits provided.

Tracks & Sub-tracks

Each of the two age-specific datasets (MPDD-Young and MPDD-Elderly) features three complementary sub-tracks:

Track 1 — MPDD-Young

Young adult depression detection focusing on 110 college students.

A-V+P Audio-Visual with Personality Modeling
A-V-G+P Audio-Visual-Gait with Personality
G+P Gait with Personality Factors

Track 2 — MPDD-Elderly

Elderly depression detection focusing on 110 older adults.

A-V+P Audio-Visual with Personality Modeling
A-V-G+P Audio-Visual-Gait with Personality
G+P Gait with Personality Factors

Sub-track Descriptions

A-V+P: Audio-Visual interview features (acoustic + visual cues) conditioned on Big Five-10 personality traits. Participants may leverage the provided personality annotations to build personality-aware depression detection models.
A-V-G+P: Full multimodal fusion integrating conversational interview cues, IMU-based ambulatory gait patterns, and Big Five-10 personality information. This is the primary flagship sub-track.
G+P: Gait-only detection from IMU sensor data, conditioned on personality traits. Explores psychomotor symptoms of depression through ambulatory behavior analysis.

Evaluation Metrics

The Challenge employs comprehensive metrics to evaluate multimodal depression detection models across classification and regression tasks.

Classification Metrics

Accuracy (ACC) — proportion of correct predictions
Macro F1-Score — F1 averaged equally across all depression severity classes, accounting for class imbalance
Cohen's Kappa (κ) — inter-rater agreement accounting for chance, particularly valuable for assessing clinical diagnostic consistency

Regression Metrics

Root Mean Squared Error (RMSE) — quantifies prediction error magnitude on continuous PHQ-9 scores
Mean Absolute Error (MAE) — interpretable average prediction error
Concordance Correlation Coefficient (CCC) — assesses agreement between predicted and actual PHQ-9 scores, combining precision and accuracy

Track-Level Score

The final evaluation score for each track is calculated as:

Score_track = α · Macro-F1 + β · CCC + γ · κ

where α + β + γ = 1 and α = β = γ, reflecting the relative importance of classification performance, continuous score prediction, and diagnostic consistency.

Rankings & Awards

MPDD-AVG 2026 adopts a two-tier evaluation framework: per-sub-track independent rankings that fairly compare systems on the same modality and population, and a cross-sub-track Generalization Award that recognises methods whose core design transfers robustly across different input modalities and age groups.

Per-Sub-track Leaderboards

Each of the 6 sub-tracks has a dedicated submission channel on CodaLab and an independent leaderboard. Rankings within each sub-track are determined solely by the Track-Level Score defined above. A team may rank first in multiple sub-tracks.

Track 1 — MPDD-Young

Young-AV+P · Audio-Visual + Personality
Young-AVG+P · Full Multimodal + Personality
Young-G+P · Gait + Personality

Track 2 — MPDD-Elderly

Elderly-AV+P · Audio-Visual + Personality
Elderly-AVG+P · Full Multimodal + Personality
Elderly-G+P · Gait + Personality

Submission policy: Each team is limited to 5 submission attempts per sub-track per day. To be eligible for final evaluation, a team must additionally submit a system description paper via OpenReview by the paper deadline.

Cross-Sub-track Generalization Award

Beyond individual sub-track winners, we present a special Generalization Award to the team whose method demonstrates the most consistent and robust performance across different modality combinations (A-V, A-V-G, G) and population groups (Young, Elderly). This award is designed to encourage the development of transferable, principled approaches rather than sub-track-specific tuning.

Generalization Score (G-Score)

The G-Score is computed over all sub-tracks a team has participated in. It rewards high average performance while penalising inconsistency across sub-tracks, and grants a progressive coverage bonus for each additional sub-track entered beyond the eligibility threshold.

G-Score Computation — Three Steps

Normalise each sub-track score
For sub-track $i$, compute the team's z-score relative to all competing teams on that sub-track: $$z_i = \frac{S_i - \mu_i}{\sigma_i}$$ where $S_i$ is the team's Track-Level Score, and $\mu_i$, $\sigma_i$ are the mean and standard deviation of all valid teams' scores on sub-track $i$. Normalisation removes score-scale differences across sub-tracks so that a gain on a hard gait-only sub-track is treated comparably to the same gain on an audio-visual sub-track.
Compute mean performance and cross-sub-track variance
Over the $N$ sub-tracks the team has entered: $$\bar{z} = \frac{1}{N}\sum_{i=1}^{N} z_i, \qquad \sigma_z = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(z_i - \bar{z})^2}$$ $\bar{z}$ measures avera ge superiority; $\sigma_z$ measures how unevenly that performance is distributed — a large $\sigma_z$ indicates a team that excels on some sub-tracks but underperforms on others.
Compute G-Score with consistency penalty and progressive coverage bonus
$$\text{G-Score} = \bar{z} - \lambda \cdot \sigma_z + \delta \cdot (N - N_{\min})$$ The penalty term $\lambda \cdot \sigma_z$ discourages cherry-picking sub-tracks. The coverage bonus $\delta \cdot (N - N_{\min})$ grows linearly with each additional sub-track entered, rewarding broad participation — every extra sub-track contributes $+\delta$ to the G-Score regardless of whether a team joins all six.

Parameter Settings

$\lambda$ = 0.5 (consistency penalty weight)

$\delta$ = 0.05 (coverage bonus per extra sub-track)

$N_{\min}$ = 2 (eligibility threshold)

Max coverage bonus = +0.20 (all 6 sub-tracks)

Calibration note. Based on baseline results, Track-Level Scores lie in the range 0.14–0.41 (mean ≈ 0.26, std ≈ 0.07). With a typical field of 10+ teams, inter-team z-score differences per sub-track are approximately 0.3–0.5. The per-sub-track bonus of $\delta = 0.05$ therefore corresponds to roughly one-tenth of a standard deviation advantage — meaningful but not sufficient to let participation volume override genuine performance.

Award Categories (all the awarded teams are encouraged to submit a paper)

Award	Ranking Basis	Eligibility
🥇 Best System (× 6, one per sub-track)	Highest Track-Level Score within the sub-track	≥ 1 sub-track paper submitted
🌐 Generalization Award	Highest G-Score across all participated sub-tracks	≥ 2 sub-tracks paper submitted
🔬 Best Personality Modeling	Largest average Track-Level Score gain when personality features (+P) are used vs. a personality-agnostic ablation, measured across all +P sub-tracks entered	ablation results in paper ≥ 2 +P sub-tracks

Baseline & Resources

We provide the following materials to all registered participants:

Baseline code for all 6 sub-tracks (A-V+P, A-V-G+P, G+P for both Young and Elderly) — available on GitHub
Baseline paper describing the challenge background, dataset details, baseline systems, and results — available ~April 10, 2026
MPDD-AVG dataset feature set and privacy-constrained raw data
Standardized audio-visual and gait feature sets
Raw individual difference annotations including Big Five-10 personality traits, demographics, and health conditions

Dataset access: HuggingFace

Codabench Submission Platform

Submission access requests are now open. Each team is restricted to a maximum of 5 submission attempts per sub-track per day.

MPDD-Young

A-V+P — https://www.codabench.org/competitions/16033/
A-V-G+P — https://www.codabench.org/competitions/16034/
G+P — https://www.codabench.org/competitions/16035/

MPDD-Elderly

A-V-G+P — https://www.codabench.org/competitions/16077/
A-V+P — https://www.codabench.org/competitions/16078/
G+P — https://www.codabench.org/competitions/16079/

Paper Submission

To be eligible for the final evaluation, each team must submit a system description paper via OpenReview (venue opens July 1, 2026). Papers must include thoroughly explained source code, well-trained models, and associated checkpoints. All submissions undergo peer review by the challenge technical program committee.

If you have any questions, please feel free to join our WeChat group for communication ~

GET IN TOUCH: sstcneu@163.com | fuchangzeng@qhd.neu.edu.cn | shangming@mails.neu.edu.cn | zhangyiming1@mails.neu.edu.cn