Datasets: NeurIPS LLM Challenge 2023

mdouglas 's Collections

Papers

Papers: GEC/Revision

Papers: Instruct

Papers: MoE/Ensemble

Papers: PEFT

Papers: Evaluation

Papers: Models

Papers: Quantization

Papers: Pruning

Papers: LLM as a Judge

Reading List

llm.c

Datasets: NeurIPS LLM Challenge 2023

updated Apr 10, 2024

Datasets that were under consideration for usage in my submission to the 2023 NeurIPS Large Language Model Efficiency Challenge.

Upvote

mosaicml/instruct-v3

Viewer • Updated Oct 2, 2023 • 63k • 209 • 34

Note Ultimately used in my full eval submission, with exclusion of dolly_hhrlhf. Included only in Mistral-7B-sft-v1.
databricks/databricks-dolly-15k

Viewer • Updated Jun 30, 2023 • 15k • 16.9k • 896

Note Used both for Mistral-7B-sft-v0 and Mistral-7B-sft-v1 in my submissions.
kaist-ai/CoT-Collection

Viewer • Updated Oct 14, 2023 • 1.84M • 1.45k • 154

Note Looked promising, but did not have time to explore.
tasksource/icl-symbol-tuning-instruct

Viewer • Updated Jul 26, 2023 • 484k • 203 • 19

Note Considered for improving ICL. Did not have time to explore.
cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 298k • 608

Note Decided against training on MMLU data.
GAIR/lima

Viewer • Updated Jun 8, 2023 • 1.33k • 795 • 452

Note Avoided due to CC BY-NC-SA license, though it would have been allowed for the competition. Likely would have been a good resource otherwise.
grammarly/coedit

Viewer • Updated Oct 21, 2023 • 70.8k • 935 • 83

Note The plan here would be to target robustness metrics by finetuning an expert model to correct perturbations and/or clarify the input. This could have paraphrasing or other text revision tasks if they appeared in the hidden eval. Did not have time to fully explore.
wanyu/IteraTeR_human_sent

Viewer • Updated Oct 24, 2022 • 4.02k • 127

Note Similar use case as coedit.
allenai/social_i_qa

Updated Dec 1, 2025 • 20.5k • 26

Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
lighteval/siqa

Viewer • Updated Oct 7, 2023 • 35.4k • 515 • 8

Note Same as social_i_qa
tau/commonsense_qa

Viewer • Updated Jan 4, 2024 • 12.1k • 50.5k • 125

Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
euirim/goodwiki

Viewer • Updated Sep 11, 2023 • 44.8k • 126 • 53

Note Could have been useful for RAG.
alexfabbri/multi_news

Updated Jan 18, 2024 • 7.02k • 71

Note The thought was this could help with CNN/DM summarization, but some quality and license concerns combined with acceptable performance without it led to its exclusion.
allenai/math_qa

Updated Jan 18, 2024 • 18.6k • 113
allenai/ropes

Viewer • Updated Jan 4, 2024 • 14.3k • 10.1k • 50
allenai/openbookqa

Viewer • Updated Jan 4, 2024 • 11.9k • 94.2k • 120
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 227k • 244
INK-USC/riddle_sense

Updated Jan 18, 2024 • 2.05k • 26
allenai/qasc

Viewer • Updated Jan 4, 2024 • 9.98k • 8.06k • 23
nyu-mll/blimp

Viewer • Updated Jan 23, 2024 • 67k • 15.3k • 37
google/boolq

Viewer • Updated Jan 22, 2024 • 12.7k • 18.5k • 90
corypaik/prost

Viewer • Updated Oct 25, 2022 • 18.7k • 639 • 1
allenai/sciq

Viewer • Updated Jan 4, 2024 • 13.7k • 33.3k • 130
facebook/belebele

Viewer • Updated Aug 12, 2024 • 110k • 15.3k • 121
derek-thomas/ScienceQA

Viewer • Updated Feb 25, 2023 • 21.2k • 18.9k • 203
openlifescienceai/medmcqa

Viewer • Updated Jan 4, 2024 • 193k • 17.2k • 200
embedding-data/QQP_triplets

Viewer • Updated Aug 2, 2022 • 102k • 256 • 8
VMware/open-instruct

Viewer • Updated Jul 12, 2023 • 143k • 75 • 44

Upvote