Nalandadata

company

AI & ML interests

Verified STEM reasoning data for frontier AI labs. Indian curriculum, RLVR-ready: JEE/NEET benchmarks, multimodal QA, and annotated tables — with open models post-trained on them.

Recent Activity

nalanda-data updated a dataset about 6 hours ago

Nalandadata/NalandaJEENEETBench-sample

nalanda-data updated a dataset about 6 hours ago

Nalandadata/nalanda-image-qa-sample

nalanda-data updated a dataset about 6 hours ago

Nalandadata/DrishtiTable-sample

View all activity

Organization Card

Community About org cards

Nalandadata

Verified, curriculum-aligned Indian STEM data for frontier AI labs

Training · Post-training · Evaluation — across reasoning, multimodal understanding, and document intelligence.

🔗 License our data · 📨 Contact / request access

Nalandadata builds high-quality, curriculum-aligned data sourced from S. Chand — India's largest academic textbook publisher — spanning all subjects, grade levels, and major Indic languages alongside English. Textbook content is structured, expert-authored, and verified, which makes it valuable far beyond education: reasoning chains, scientific diagrams, structured tables, and multilingual content that transfer directly to general-purpose model training and evaluation.

📦 Products

Datasets

Dataset	What it is
NalandaJEENEETBench	116,831 JEE & NEET questions with verified answers + worked solutions. RLVR-ready ground truth.
nalanda-image-qa	22,000+ scientific image Q&A pairs from NCERT diagrams (physics, chemistry, biology).
DrishtiTable	1,421 annotated tables for document AI / table structure recognition — with a full TEDS benchmark + leaderboard.

Models

Model	Result
nalanda-qwen-7b-grpo	Qwen-7B + GRPO on NalandaJEENEETBench: +6.3pp (vs −16pp for naive SFT) — verified answers make RLVR work.
nalanda-image-vl	Multimodal diagram understanding: +9.3pp over zero-shot.
DrishtiTable-Qwen2.5-VL-7B	Table recognition at 83.2% TEDS — beats GPT-4o on our benchmark.

Benchmark & demos

🏆 DrishtiTable Leaderboard — live TSR leaderboard ranked by TEDS.
🔬 Nalanda Live Demos — try our models on STEM text & images.

✅ Why it works

Verified ground truth → every JEE/NEET item has a checkable answer, enabling RLVR / GRPO pipelines that actually improve capability.
Expert-authored, structured source → reasoning chains, diagrams, and tables, not scraped web noise.
Multilingual, curriculum-aligned → English + major Indic languages across all grade levels.