Spaces:

Lilt-org
/

README

Configuration error

App Files Files Community

vitanomin commited on 26 days ago

Commit

f5f7fee

verified ·

1 Parent(s): ec3321b

Update README.md

Browse files

Files changed (1) hide show

README.md +45 -10

README.md CHANGED Viewed

@@ -1,10 +1,45 @@
----
-title: README
-emoji: 🚀
-colorFrom: blue
-colorTo: blue
-sdk: static
-pinned: false
----
-Edit this `README.md` markdown file to author your organization card.

+# LILT
+**We build the multilingual layer for English-first AI.**
+Custom evals, benchmarks, and RL environments across 200+ languages.
+Most agent and coding benchmarks ship in English. We build the audited
+non-English counterparts — and the multilingual environments models train
+on — so labs and enterprises can measure and improve what their models
+actually do in the languages their users speak.
+### Why we publish here
+Open releases make it easier for the community to stress-test our work,
+reproduce our scores, and extend our benchmarks to new languages. Every
+artifact is paired with a paper, a scoring script, and explicit limitations.
+### What you'll find here
+- **Benchmarks & datasets** — multilingual evaluations across coding,
+  agents, tool use, long context, instruction following, and domain QA.
+  Audited splits across our priority languages, scalable to 200+.
+- **RL environments** — multilingual training environments for agentic
+  and tool-using models, with reproducible scoring.
+- **Leaderboards & scoring** — Gradio Spaces with reproducible submission flows.
+- **Baselines** — frontier-model scores published with exact prompts,
+  decoding params, and dated snapshots.
+- **Papers** — methodology, audit workflow, and findings.
+### Currently featured
+📌 **GAIA-v2-LILT** — multilingual agent benchmark across AR / DE / HI / KO / PT-BR.
++20.7pp average gain post human-audit on frontier agents. Dataset, paper, and
+leaderboard linked in the pinned collection.
+🛠️ **LILTBench Hackathon (Jun 15–21, 2026)** — one-week community challenge to
+crowdsource non-English coding tasks that break Claude Opus 4.6 in Terminal-Bench.
+Co-hosted with The AI Collective. [Sign up](https://luma.com/55v3wgi9).
+### Links
+- Website: <https://lilt.com>
+- Multilingual benchmarks: <https://lilt.com/products/multilingual-benchmarks>
+- AI for Frontier Labs: <https://lilt.com/ai-for-frontier-labs>
+- GitHub: <https://github.com/lilt>
+- Contact (data services): <https://lilt.com/contact/ai-data-services>
+### Citation
+If you use one of our datasets or benchmarks, please cite the corresponding paper
+linked on each dataset card.