Shared Tasks | GroundLM 2026

Overview

Task Format

Shared tasks are intended to complement the workshop CFP by giving participants a common evaluation setup for grounded language-model behavior. Each task will provide public development material, submission-format guidance, and held-out evaluation data for final ranking.

Shared Task 1

GoldenViewVQA

Multi-view visual question answering with evidence-source identification in driving scenes.

Input Six synchronized camera views + question

Output Supporting view + answer choice

Public dev set 55 labeled examples

Primary metric Joint accuracy

View Task Dataset

Shared Task 2

LitTraceQA

Literature-grounded question answering with paper retrieval, evidence grounding, and answer generation.

Input Research question + paper metadata pool

Output Paper IDs + evidence + answer

Public dev set 55 labeled examples

Evaluation Retrieval, grounding, and answer accuracy

View Task Dataset

Timeline

Shared Task Dates

The schedule below applies to the GroundLM 2026 shared tasks.

Activity	Date
Dataset (dev) release	June 8, 2026 (Shared Task 1); June 15, 2026 (Shared Task 2)
Dataset (test) release	August 3, 2026
Submission of test output, code, and system papers	August 19, 2026
Acceptance and winner announcement	September 1, 2026
Camera-ready papers	September 10, 2026

Shared Task 1

GoldenViewVQA: Multi-View Grounded VQA for Driving Scenes

GoldenViewVQA evaluates whether multimodal systems can identify the visual evidence needed to answer a question. Given six synchronized NuScenes camera views and a question, a model must select the supporting camera view and answer the multiple-choice question.

Input Six synchronized NuScenes camera views + one question

Output Supporting view and answer choice

Public dev set 55 labeled examples, 330 image references

Primary metric Joint accuracy

Evaluation

Submissions should contain one JSON object per question with a question identifier, predicted supporting view, and predicted answer choice.

{"question_id": "sfall_0001_causality", "predicted_view": "CAM_FRONT", "predicted_answer_id": "A"}

The development evaluator reports view accuracy, answer accuracy, and joint accuracy. Final ranking will use a held-out input-only test set with private gold labels.

Development Release

The current Hugging Face release is the public development set. It is intended for task familiarization, data loading, prompt and model development, local evaluation, and submission-format checks. It should not be treated as the final leaderboard test set.

Resources

Submit to Shared Tasks Dataset on Hugging Face Schema Evaluator

The dataset provides benchmark annotations and NuScenes-relative image paths. Participants must obtain NuScenes through the official access process and comply with its terms of use.

Shared Task 2

LitTraceQA: Literature-Grounded Question Answering

LitTraceQA evaluates whether systems can answer research questions with grounded evidence from scientific papers. Given a question and requested answer types, a system must retrieve the relevant paper or papers from the released metadata pool, identify coarse evidence locations, and return the final answer in the requested format.

Input Research question, answer types, and paper metadata pool

Output Paper IDs, evidence locations, and final answer

Public dev set 55 labeled validation examples

Primary components Retrieval, evidence grounding, and answer accuracy

Evaluation

Submissions should contain one JSON object per question with a stable query identifier, retrieved paper IDs, supporting evidence locations, and the answer object.

{"query_id": "q_001", "gold_papers": [{"paper_id": "acl2025_00005"}], "evidence": [{"paper_id": "acl2025_00005", "source_type": "table", "locator": {"page": 6, "table_id": "Table 4"}}], "answer": {"freeform": {"text": "14.70"}}}

The public evaluator reports macro paper retrieval precision, recall, and F1; macro evidence precision, recall, and F1 using coarse evidence matches; and answer metrics for multiple-choice, freeform, and table outputs. Final ranking will use a hidden evaluation set.

Development Release

The current Hugging Face release is the public development set. It includes gold validation records, input-only validation records, a sample submission, a searchable paper metadata pool, a machine-readable schema, and a local evaluator. It is intended for format inspection, loader development, prompt tuning, and local validation, not as the hidden leaderboard test set.

Resources

Submit to Shared Tasks Dataset on Hugging Face Format Evaluation Evaluator

LitTraceQA annotations and benchmark files are released under CC BY-NC 4.0. Paper metadata remains subject to the original publishers' terms, and PDFs are not redistributed in the dataset.