vitanomin commited on
Commit
f5f7fee
Β·
verified Β·
1 Parent(s): ec3321b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -10
README.md CHANGED
@@ -1,10 +1,45 @@
1
- ---
2
- title: README
3
- emoji: πŸš€
4
- colorFrom: blue
5
- colorTo: blue
6
- sdk: static
7
- pinned: false
8
- ---
9
-
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LILT
2
+
3
+ **We build the multilingual layer for English-first AI.**
4
+ Custom evals, benchmarks, and RL environments across 200+ languages.
5
+
6
+ Most agent and coding benchmarks ship in English. We build the audited
7
+ non-English counterparts β€” and the multilingual environments models train
8
+ on β€” so labs and enterprises can measure and improve what their models
9
+ actually do in the languages their users speak.
10
+
11
+ ### Why we publish here
12
+ Open releases make it easier for the community to stress-test our work,
13
+ reproduce our scores, and extend our benchmarks to new languages. Every
14
+ artifact is paired with a paper, a scoring script, and explicit limitations.
15
+
16
+ ### What you'll find here
17
+ - **Benchmarks & datasets** β€” multilingual evaluations across coding,
18
+ agents, tool use, long context, instruction following, and domain QA.
19
+ Audited splits across our priority languages, scalable to 200+.
20
+ - **RL environments** β€” multilingual training environments for agentic
21
+ and tool-using models, with reproducible scoring.
22
+ - **Leaderboards & scoring** β€” Gradio Spaces with reproducible submission flows.
23
+ - **Baselines** β€” frontier-model scores published with exact prompts,
24
+ decoding params, and dated snapshots.
25
+ - **Papers** β€” methodology, audit workflow, and findings.
26
+
27
+ ### Currently featured
28
+ πŸ“Œ **GAIA-v2-LILT** β€” multilingual agent benchmark across AR / DE / HI / KO / PT-BR.
29
+ +20.7pp average gain post human-audit on frontier agents. Dataset, paper, and
30
+ leaderboard linked in the pinned collection.
31
+
32
+ πŸ› οΈ **LILTBench Hackathon (Jun 15–21, 2026)** β€” one-week community challenge to
33
+ crowdsource non-English coding tasks that break Claude Opus 4.6 in Terminal-Bench.
34
+ Co-hosted with The AI Collective. [Sign up](https://luma.com/55v3wgi9).
35
+
36
+ ### Links
37
+ - Website: <https://lilt.com>
38
+ - Multilingual benchmarks: <https://lilt.com/products/multilingual-benchmarks>
39
+ - AI for Frontier Labs: <https://lilt.com/ai-for-frontier-labs>
40
+ - GitHub: <https://github.com/lilt>
41
+ - Contact (data services): <https://lilt.com/contact/ai-data-services>
42
+
43
+ ### Citation
44
+ If you use one of our datasets or benchmarks, please cite the corresponding paper
45
+ linked on each dataset card.