MPDD-AVG 2026

Overview

The MPDD-AVG Challenge 2026 comprises two age-specific datasets — MPDD-Young and MPDD-Elderly — each featuring three complementary sub-tracks that explore different combinations of behavioral modalities and personality modeling. The challenge uniquely integrates semi-structured interview behavioral data with continuous gait monitoring from wearable sensors, enabling holistic assessment spanning cognitive-linguistic, affective-paralinguistic, and psychomotor domains.

This challenge is an updated version of MPDD2025 @ ACM MM 2025. Compared to the previous edition, MPDD-AVG introduces a new gait modality (IMU-based ambulatory monitoring), the G+P and A-V-G+P sub-tracks, and an extended annotation scheme including health condition labels.

Dataset

MPDD-Young

The MPDD-Young dataset comprises data from 110 college students, investigating how academic stress, social environment, and personality traits contribute to depression in young adults. Participants underwent semi-structured interviews designed to assess academic stress, social functioning, and emotional well-being. Subsequently, participants walked naturally within a designated area while equipped with wearable IMU sensors.

Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (gender, age, birth region)

Classification tasks: Binary (normal / depressed)  ·  Ternary (normal / mild / severe)

MPDD-Elderly

The MPDD-Elderly dataset comprises data from 110 older adults, examining how chronic illnesses, living conditions, and personality traits influence late-life depression manifestation. Participants engaged in semi-structured interviews and then walked freely within a designated area while wearing IMU sensors.

Annotations: PHQ-9 Scale Scores · Big Five-10 personality traits · Demographics (age, gender, family situation, economic status) · Disease labels (endocrine, circulatory, nervous system)

Classification tasks: Binary · Ternary

Dataset Comparison

As compared to existing datasets, MPDD-AVG significantly enhances both the breadth of behavioral modalities and the depth of individual difference annotations:

Dataset Audio-Visual Gait Depression Personality Gender Age Region Disease
AVEC
DAIC-WoZ
Pittsburgh
D-Vlog
MMDA
EATD-Corpus
CMDC
MODMA
MPDD-AVG (Ours)

80% of data used for training and validation; 20% for testing. Standardized splits provided.

Tracks & Sub-tracks

Each of the two age-specific datasets (MPDD-Young and MPDD-Elderly) features three complementary sub-tracks:

Track 1 — MPDD-Young

Young adult depression detection focusing on 110 college students.

  • A-V+P Audio-Visual with Personality Modeling
  • A-V-G+P Audio-Visual-Gait with Personality
  • G+P Gait with Personality Factors

Track 2 — MPDD-Elderly

Elderly depression detection focusing on 110 older adults.

  • A-V+P Audio-Visual with Personality Modeling
  • A-V-G+P Audio-Visual-Gait with Personality
  • G+P Gait with Personality Factors

Sub-track Descriptions

Evaluation Metrics

The Challenge employs comprehensive metrics to evaluate multimodal depression detection models across classification and regression tasks.

Classification Metrics

Regression Metrics

Track-Level Score

The final evaluation score for each track is calculated as:

Scoretrack = α · Macro-F1 + β · CCC + γ · κ

where α + β + γ = 1 and α = β = γ, reflecting the relative importance of classification performance, continuous score prediction, and diagnostic consistency.

Rankings & Awards

MPDD-AVG 2026 adopts a two-tier evaluation framework: per-sub-track independent rankings that fairly compare systems on the same modality and population, and a cross-sub-track Generalization Award that recognises methods whose core design transfers robustly across different input modalities and age groups.

Per-Sub-track Leaderboards

Each of the 6 sub-tracks has a dedicated submission channel on CodaLab and an independent leaderboard. Rankings within each sub-track are determined solely by the Track-Level Score defined above. A team may rank first in multiple sub-tracks.

Track 1 — MPDD-Young

Young-AV+P · Audio-Visual + Personality
Young-AVG+P · Full Multimodal + Personality
Young-G+P · Gait + Personality

Track 2 — MPDD-Elderly

Elderly-AV+P · Audio-Visual + Personality
Elderly-AVG+P · Full Multimodal + Personality
Elderly-G+P · Gait + Personality

Submission policy: Each team is limited to 5 submission attempts per sub-track per day. To be eligible for final evaluation, a team must additionally submit a system description paper via OpenReview by the paper deadline.

Cross-Sub-track Generalization Award

Beyond individual sub-track winners, we present a special Generalization Award to the team whose method demonstrates the most consistent and robust performance across different modality combinations (A-V, A-V-G, G) and population groups (Young, Elderly). This award is designed to encourage the development of transferable, principled approaches rather than sub-track-specific tuning.

Generalization Score (G-Score)

The G-Score is computed over all sub-tracks a team has participated in. It rewards high average performance while penalising inconsistency across sub-tracks, and grants a progressive coverage bonus for each additional sub-track entered beyond the eligibility threshold.

G-Score Computation — Three Steps

  1. Normalise each sub-track score
    For sub-track $i$, compute the team's z-score relative to all competing teams on that sub-track: $$z_i = \frac{S_i - \mu_i}{\sigma_i}$$ where $S_i$ is the team's Track-Level Score, and $\mu_i$, $\sigma_i$ are the mean and standard deviation of all valid teams' scores on sub-track $i$. Normalisation removes score-scale differences across sub-tracks so that a gain on a hard gait-only sub-track is treated comparably to the same gain on an audio-visual sub-track.
  2. Compute mean performance and cross-sub-track variance
    Over the $N$ sub-tracks the team has entered: $$\bar{z} = \frac{1}{N}\sum_{i=1}^{N} z_i, \qquad \sigma_z = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(z_i - \bar{z})^2}$$ $\bar{z}$ measures avera ge superiority; $\sigma_z$ measures how unevenly that performance is distributed — a large $\sigma_z$ indicates a team that excels on some sub-tracks but underperforms on others.
  3. Compute G-Score with consistency penalty and progressive coverage bonus
    $$\text{G-Score} = \bar{z} - \lambda \cdot \sigma_z + \delta \cdot (N - N_{\min})$$ The penalty term $\lambda \cdot \sigma_z$ discourages cherry-picking sub-tracks. The coverage bonus $\delta \cdot (N - N_{\min})$ grows linearly with each additional sub-track entered, rewarding broad participation — every extra sub-track contributes $+\delta$ to the G-Score regardless of whether a team joins all six.

Parameter Settings

$\lambda$ = 0.5   (consistency penalty weight)
$\delta$ = 0.05   (coverage bonus per extra sub-track)
$N_{\min}$ = 2   (eligibility threshold)
Max coverage bonus = +0.20   (all 6 sub-tracks)
Calibration note. Based on baseline results, Track-Level Scores lie in the range 0.14–0.41 (mean ≈ 0.26, std ≈ 0.07). With a typical field of 10+ teams, inter-team z-score differences per sub-track are approximately 0.3–0.5. The per-sub-track bonus of $\delta = 0.05$ therefore corresponds to roughly one-tenth of a standard deviation advantage — meaningful but not sufficient to let participation volume override genuine performance.

Award Categories (all the awarded teams are encouraged to submit a paper)

Award Ranking Basis Eligibility
🥇 Best System
(× 6, one per sub-track)
Highest Track-Level Score within the sub-track ≥ 1 sub-track  paper submitted
🌐 Generalization Award Highest G-Score across all participated sub-tracks ≥ 2 sub-tracks  paper submitted
🔬 Best Personality Modeling Largest average Track-Level Score gain when personality features (+P) are used vs. a personality-agnostic ablation, measured across all +P sub-tracks entered ablation results in paper  ≥ 2 +P sub-tracks

Baseline & Resources

We provide the following materials to all registered participants:

Dataset access: HuggingFace

Codabench Submission Platform

Submission access requests are now open. Each team is restricted to a maximum of 5 submission attempts per sub-track per day.

MPDD-Young

MPDD-Elderly

Paper Submission

To be eligible for the final evaluation, each team must submit a system description paper via OpenReview (venue opens July 1, 2026). Papers must include thoroughly explained source code, well-trained models, and associated checkpoints. All submissions undergo peer review by the challenge technical program committee.

If you have any questions, please feel free to join our WeChat group for communication ~
MPDD-AVG 2026 WeChat Group QR Code
GET IN TOUCH: sstcneu@163.com | fuchangzeng@qhd.neu.edu.cn | shangming@mails.neu.edu.cn | zhangyiming1@mails.neu.edu.cn