Commit History

fix: Put ALL_LANGUAGES and ALL_TASKS at the end
317dd0a
Running

saattrupdan commited on

fix: Use absolute imports
c8f17da

saattrupdan commited on

fix: Add missing datasets
5451e45

saattrupdan commited on

chore: Add makefile
f2e50bb

saattrupdan commited on

fix: Remove hashes from requirements.txt
a013ca7

saattrupdan commited on

fix: Update sdk_version in readme
a0453dc

saattrupdan commited on

feat: Add requirements.txt, update datasets
d288ea3

saattrupdan commited on

fix: Remove breakpoint
83ff0f7

saattrupdan commited on

feat: Update languages and datasets, add compat with eee
da3cf26

saattrupdan commited on

feat: Change BERTScore to ChrF3++
180adc7

saattrupdan commited on

feat: Update languages, and switch to using pyproject.toml
5053774

saattrupdan commited on

fix: Change Gradio version to 4.39.0
a25e7f4
verified

saattrupdan commited on

fix: Fetch data from tar.gz file
a27fbcc

saattrupdan commited on

fix: Separate zero-shot performance from few-shot
376f461

saattrupdan commited on

feat: Add Spanish and Finnish
60d6a88

saattrupdan commited on

style: Re-order
4f27e41

saattrupdan commited on

fix: Do not update last_fetch if nothing in dropdowns
0495bc2

saattrupdan commited on

fix: Tooltip
6fd0cfa

saattrupdan commited on

feat: Use actual ranks on scale
c97530c

saattrupdan commited on

feat: Update datasets
6bdb37f

saattrupdan commited on

feat: Add support for Italian
cdd9094

saattrupdan commited on

style: Rename ScandEval to EuroEval
9fa29df

saattrupdan commited on

style: Rename "reasoning" to "common-sense reasoning"
e5acaa3

saattrupdan commited on

fix: Use all datasets from a task, use ranks instead of log_ranks
637c71d

saattrupdan commited on

feat: Add update colours buttom
4a4e1c1

saattrupdan commited on

fix: Do nothing in update_colour_mapping if model_ids is empty
86e510b

saattrupdan commited on

feat: Add hotter-and-colder-sentiment
ddc1d20

saattrupdan commited on

style: Do not upper case language codes
bef135e

saattrupdan commited on

fix: Use language codes if more than 5 languages are selected
4907a8b

saattrupdan commited on

feat: Optimise colour mapping for visible models only
1d11c02

saattrupdan commited on

fix: Fetch new results properly
64071e4

saattrupdan commited on

fix: Do not use "test"
a1248d7

saattrupdan commited on

fix: Use new URL for results
5e55e05

saattrupdan commited on

Update app.py
fd7fab5
verified

saattrupdan commited on

Update app.py
5c9ed9a
verified

saattrupdan commited on

Update README.md (#1)
c24aee4
verified

saattrupdan commited on

fix: Lower case model sorting
6e9ab8e

saattrupdan commited on

feat: Change UPDATE_FREQUENCY_MINUTES to 5
9e3c3cd

saattrupdan commited on

chore: Small update
3a84b3b

saattrupdan commited on

feat: Update app with log rank scores
5f70754

saattrupdan commited on

fix: Update win ratios to take ranks into account
734648f

saattrupdan commited on

feat: Add update colours button
c34e772

saattrupdan commited on

feat: Update datasets used in ScandEval
8157f53

saattrupdan commited on

feat: Sorting is case-independent
f04c64c

saattrupdan commited on

fix: Sort models and languages in the beginning
995f0f4

saattrupdan commited on

feat: Sort dropdown list of model IDs
27bc6fa

saattrupdan commited on

feat: Change order of tasks, to avoid hiding INFORMATION_EXTRACTION
437ac86

saattrupdan commited on

feat: Fix colour for each model (up to retakes), reduce logging
a73e53c

saattrupdan commited on

chore: Revert last change
576340d

saattrupdan commited on

feat: Use experimental nested t-test to determine statistical significance
ada1f6c

saattrupdan commited on