GroundLM 2026 Shared Tasks

Shared Tasks

GroundLM 2026 will host shared tasks that turn grounding challenges into concrete, reproducible evaluations. Paper submissions remain the primary focus of the main workshop; this page collects shared-task information, datasets, and participation details.

Task Format

Shared tasks are intended to complement the workshop CFP by giving participants a common evaluation setup for grounded language-model behavior. Each task will provide public development material, submission-format guidance, and held-out evaluation data for final ranking.

Shared Task 1

GoldenViewVQA

Multi-view visual question answering with evidence-source identification in driving scenes.

Input Six synchronized camera views + question
Output Supporting view + answer choice
Public dev set 55 labeled examples
Primary metric Joint accuracy
Shared Task 2

LitTraceQA

Literature-grounded question answering with paper retrieval, evidence grounding, and answer generation.

Input Research question + paper metadata pool
Output Paper IDs + evidence + answer
Public dev set 55 labeled examples
Evaluation Retrieval, grounding, and answer accuracy

Shared Task Dates

The schedule below applies to the GroundLM 2026 shared tasks.

ActivityDate
Dataset (dev) release June 8, 2026 (Shared Task 1); June 15, 2026 (Shared Task 2)
Dataset (test) release August 3, 2026
Submission of test output, code, and system papers August 19, 2026
Acceptance and winner announcement September 1, 2026
Camera-ready papers September 10, 2026

GoldenViewVQA: Multi-View Grounded VQA for Driving Scenes

GoldenViewVQA evaluates whether multimodal systems can identify the visual evidence needed to answer a question. Given six synchronized NuScenes camera views and a question, a model must select the supporting camera view and answer the multiple-choice question.

Input Six synchronized NuScenes camera views + one question
Output Supporting view and answer choice
Public dev set 55 labeled examples, 330 image references
Primary metric Joint accuracy

Evaluation

Submissions should contain one JSON object per question with a question identifier, predicted supporting view, and predicted answer choice.

{"question_id": "sfall_0001_causality", "predicted_view": "CAM_FRONT", "predicted_answer_id": "A"}

The development evaluator reports view accuracy, answer accuracy, and joint accuracy. Final ranking will use a held-out input-only test set with private gold labels.

Development Release

The current Hugging Face release is the public development set. It is intended for task familiarization, data loading, prompt and model development, local evaluation, and submission-format checks. It should not be treated as the final leaderboard test set.

Resources

The dataset provides benchmark annotations and NuScenes-relative image paths. Participants must obtain NuScenes through the official access process and comply with its terms of use.

LitTraceQA: Literature-Grounded Question Answering

LitTraceQA evaluates whether systems can answer research questions with grounded evidence from scientific papers. Given a question and requested answer types, a system must retrieve the relevant paper or papers from the released metadata pool, identify coarse evidence locations, and return the final answer in the requested format.

Input Research question, answer types, and paper metadata pool
Output Paper IDs, evidence locations, and final answer
Public dev set 55 labeled validation examples
Primary components Retrieval, evidence grounding, and answer accuracy

Evaluation

Submissions should contain one JSON object per question with a stable query identifier, retrieved paper IDs, supporting evidence locations, and the answer object.

{"query_id": "q_001", "gold_papers": [{"paper_id": "acl2025_00005"}], "evidence": [{"paper_id": "acl2025_00005", "source_type": "table", "locator": {"page": 6, "table_id": "Table 4"}}], "answer": {"freeform": {"text": "14.70"}}}

The public evaluator reports macro paper retrieval precision, recall, and F1; macro evidence precision, recall, and F1 using coarse evidence matches; and answer metrics for multiple-choice, freeform, and table outputs. Final ranking will use a hidden evaluation set.

Development Release

The current Hugging Face release is the public development set. It includes gold validation records, input-only validation records, a sample submission, a searchable paper metadata pool, a machine-readable schema, and a local evaluator. It is intended for format inspection, loader development, prompt tuning, and local validation, not as the hidden leaderboard test set.

Resources

LitTraceQA annotations and benchmark files are released under CC BY-NC 4.0. Paper metadata remains subject to the original publishers' terms, and PDFs are not redistributed in the dataset.