languagebench / results.json

Commit History

Upload from GitHub Actions: use old results
5102b0a
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17
a0d1624
verified

davidpomerenke commited on

Upload from GitHub Actions: Add auto-translated datasets
c790fdb
verified

davidpomerenke commited on

Upload from GitHub Actions: Update evaluation results
f88768f
verified

davidpomerenke commited on

Upload from GitHub Actions: Update evaluation results
95c4e14
verified

davidpomerenke commited on

Upload from GitHub Actions: ran full evaluation locally
088f96f
verified

davidpomerenke commited on

Upload from GitHub Actions: restored old results.json
9e9d3bd
verified

davidpomerenke commited on

Upload from GitHub Actions: updated and cleaned up scripts for new eval runs
963cb78
verified

davidpomerenke commited on

Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions
8eebb41
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev
7c06aef
verified

davidpomerenke commited on

Upload from GitHub Actions: Get more results, compute average based on all tasks
98c6811
verified

davidpomerenke commited on

Upload from GitHub Actions: Translate MMLU and evaluate
4c5c136
verified

davidpomerenke commited on

Upload from GitHub Actions: Correlation plot
b0aa389
verified

davidpomerenke commited on

Upload from GitHub Actions: Evaluate on autotranslated GSM dataset
f3a09a2
verified

davidpomerenke commited on

Upload from GitHub Actions: Evaluate Google Translate
338dc9b
verified

davidpomerenke commited on

Upload from GitHub Actions: More models and languages
a73f888
verified

davidpomerenke commited on

Upload from GitHub Actions: Results for 50 languages
3dfd880
verified

davidpomerenke commited on

Upload from GitHub Actions: Eavaluate on 40 languages
941d5c5
verified

davidpomerenke commited on

Upload from nightly evaluation run
c3be561
verified

davidpomerenke commited on

Upload from GitHub Actions: Add math benchmarks
549360a
verified

davidpomerenke commited on

Upload from GitHub Actions: More results
52abc5b
verified

davidpomerenke commited on

Upload from nightly evaluation run
4a34e67
verified

davidpomerenke commited on

Upload from GitHub Actions: Update model ranking fetching
f840423
verified

davidpomerenke commited on

Upload from GitHub Actions: Use FLORES+ via Huggingface
913253a
verified

davidpomerenke commited on

Upload from nightly evaluation run
9ee89ef
verified

davidpomerenke commited on

Upload from nightly evaluation run
8a4050a
verified

davidpomerenke commited on

Upload from GitHub Actions: New results
b311dd5
verified

davidpomerenke commited on

Upload from nightly evaluation run
dcb356d
verified

davidpomerenke commited on

Block gemini-2.5-pro-exp-03-25
092c06a

David Pomerenke commited on

Only run tasks for which there is no result yet
2f9dee1

David Pomerenke commited on

Run on 40 languages, additional models
260c1a3

David Pomerenke commited on

Run evals
b0c61ed

David Pomerenke commited on

Run on 15 languages
f8a3dad

David Pomerenke commited on

Add model history plot
f52ec6e

David Pomerenke commited on

Implement MMLU task
a683732

David Pomerenke commited on

Add Global MMLU benchmark
ce2acb0

David Pomerenke commited on

Translation both from and to
731eddd

David Pomerenke commited on

Add OpenRouter metadata to models
9002fc2

David Pomerenke commited on

Run on 100 languages, adjust display
8274634

David Pomerenke commited on

Add Dockerfile
4d13673

David Pomerenke commited on

Language selection checkboxes & filtering in backend
d91b022

David Pomerenke commited on

Basic backend setup with FastApi but without actual filtering
2c21cf7

David Pomerenke commited on

spBLEU tokenizer, run on more languages
eaf2d97

David Pomerenke commited on

Better map tooltip
92b2164

David Pomerenke commited on

Process data for country map
723f963

David Pomerenke commited on

Autonymns and cooler dataset search display
33469f2

David Pomerenke commited on

More models
c5278dd

David Pomerenke commited on

Basic language table
d1a7111

David Pomerenke commited on

Refactor eval code into files
da6e1bc

David Pomerenke commited on

Model table using React
ecf4195

David Pomerenke commited on