Task Format
GoldenViewVQA
Multi-view visual question answering with evidence-source identification in driving scenes.
LitTraceQA
Literature-grounded question answering with paper retrieval, evidence grounding, and answer generation.
GoldenViewVQA: Multi-View Grounded VQA for Driving Scenes
GoldenViewVQA evaluates whether multimodal systems can identify the visual evidence needed to answer a question. Given six synchronized NuScenes camera views and a question, a model must select the supporting camera view and answer the multiple-choice question.
Evaluation
Submissions should contain one JSON object per question with a question identifier, predicted supporting view, and predicted answer choice.
{"question_id": "sfall_0001_causality", "predicted_view": "CAM_FRONT", "predicted_answer_id": "A"}
The development evaluator reports view accuracy, answer accuracy, and joint accuracy. Final ranking will use a held-out input-only test set with private gold labels.
Development Release
The current Hugging Face release is the public development set. It is intended for task familiarization, data loading, prompt and model development, local evaluation, and submission-format checks. It should not be treated as the final leaderboard test set.
Resources
The dataset provides benchmark annotations and NuScenes-relative image paths. Participants must obtain NuScenes through the official access process and comply with its terms of use.
LitTraceQA: Literature-Grounded Question Answering
LitTraceQA evaluates whether systems can answer research questions with grounded evidence from scientific papers. Given a question and requested answer types, a system must retrieve the relevant paper or papers from the released metadata pool, identify coarse evidence locations, and return the final answer in the requested format.
Evaluation
Submissions should contain one JSON object per question with a stable query identifier, retrieved paper IDs, supporting evidence locations, and the answer object.
{"query_id": "q_001", "gold_papers": [{"paper_id": "acl2025_00005"}], "evidence": [{"paper_id": "acl2025_00005", "source_type": "table", "locator": {"page": 6, "table_id": "Table 4"}}], "answer": {"freeform": {"text": "14.70"}}}
The public evaluator reports macro paper retrieval precision, recall, and F1; macro evidence precision, recall, and F1 using coarse evidence matches; and answer metrics for multiple-choice, freeform, and table outputs. Final ranking will use a hidden evaluation set.
Development Release
The current Hugging Face release is the public development set. It includes gold validation records, input-only validation records, a sample submission, a searchable paper metadata pool, a machine-readable schema, and a local evaluator. It is intended for format inspection, loader development, prompt tuning, and local validation, not as the hidden leaderboard test set.
Resources
LitTraceQA annotations and benchmark files are released under CC BY-NC 4.0. Paper metadata remains subject to the original publishers' terms, and PDFs are not redistributed in the dataset.