AI & ML interests
Verified STEM reasoning data for frontier AI labs. Indian curriculum, RLVR-ready: JEE/NEET benchmarks, multimodal QA, and annotated tables โ with open models post-trained on them.
Recent Activity
Nalandadata
Verified, curriculum-aligned Indian STEM data for frontier AI labs
Training ยท Post-training ยท Evaluation โ across reasoning, multimodal understanding, and document intelligence.
Nalandadata builds high-quality, curriculum-aligned data sourced from S. Chand โ India's largest academic textbook publisher โ spanning all subjects, grade levels, and major Indic languages alongside English. Textbook content is structured, expert-authored, and verified, which makes it valuable far beyond education: reasoning chains, scientific diagrams, structured tables, and multilingual content that transfer directly to general-purpose model training and evaluation.
๐ฆ Products
Datasets
| Dataset | What it is |
|---|---|
| NalandaJEENEETBench | 116,831 JEE & NEET questions with verified answers + worked solutions. RLVR-ready ground truth. |
| nalanda-image-qa | 22,000+ scientific image Q&A pairs from NCERT diagrams (physics, chemistry, biology). |
| DrishtiTable | 1,421 annotated tables for document AI / table structure recognition โ with a full TEDS benchmark + leaderboard. |
Models
| Model | Result |
|---|---|
| nalanda-qwen-7b-grpo | Qwen-7B + GRPO on NalandaJEENEETBench: +6.3pp (vs โ16pp for naive SFT) โ verified answers make RLVR work. |
| nalanda-image-vl | Multimodal diagram understanding: +9.3pp over zero-shot. |
| DrishtiTable-Qwen2.5-VL-7B | Table recognition at 83.2% TEDS โ beats GPT-4o on our benchmark. |
Benchmark & demos
- ๐ DrishtiTable Leaderboard โ live TSR leaderboard ranked by TEDS.
- ๐ฌ Nalanda Live Demos โ try our models on STEM text & images.
โ Why it works
- Verified ground truth โ every JEE/NEET item has a checkable answer, enabling RLVR / GRPO pipelines that actually improve capability.
- Expert-authored, structured source โ reasoning chains, diagrams, and tables, not scraped web noise.
- Multilingual, curriculum-aligned โ English + major Indic languages across all grade levels.
๐ค Trusted by
Partner & customer logos coming soon. Want to be featured? Get in touch.
๐ Licensing & access
We license datasets for AI training, post-training, and evaluation.