|
|
--- |
|
|
language: |
|
|
- en |
|
|
- hi |
|
|
- te |
|
|
license: apache-2.0 |
|
|
datasets: [] |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- foundation-model |
|
|
- instruction-following |
|
|
- multilingual |
|
|
- investor-preview |
|
|
- placeholder |
|
|
--- |
|
|
# SANCHARI โ v0.1 (Investor Preview) |
|
|
|
|
|
Sanchari is an upcoming instruction-following AI foundation model designed for |
|
|
Indian users, multilingual applications, and next-generation AI assistants. |
|
|
|
|
|
This repository is an **investor preview**. |
|
|
No model weights are uploaded yet. |
|
|
Training begins once project funding is approved. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Vision |
|
|
To build Indiaโs most practical, multilingual AI model optimized for: |
|
|
|
|
|
- Smart assistants |
|
|
- Real-time Q&A |
|
|
- Summarization |
|
|
- Content generation |
|
|
- Business automation |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Current Status (v0.1) |
|
|
- Repository created |
|
|
- Model card published |
|
|
- Demo placeholder will be added |
|
|
- Data licensing & compute setup pending |
|
|
- Training begins after funding |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ง Planned Model Family |
|
|
|
|
|
### **Sanchari-S (200โ350M)** |
|
|
- First lightweight prototype |
|
|
- Fast inference |
|
|
- Suitable for apps & APIs |
|
|
|
|
|
### **Sanchari-M (1โ3B)** |
|
|
- Stronger reasoning |
|
|
- Better instruction-following |
|
|
|
|
|
### **Sanchari-L (7B+)** |
|
|
- Full foundation model |
|
|
- Enterprise-grade multilingual intelligence |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ ๏ธ Roadmap Overview |
|
|
|
|
|
### **Phase 1 (0โ3 months)** |
|
|
- Dataset acquisition |
|
|
- Tokenizer creation |
|
|
- Train Sanchari-S |
|
|
- Publish evaluation & demo |
|
|
|
|
|
### **Phase 2 (3โ9 months)** |
|
|
- Train Sanchari-M |
|
|
- Safety testing |
|
|
- API + product demo |
|
|
|
|
|
### **Phase 3 (9โ18 months)** |
|
|
- Train Sanchari-L |
|
|
- Optimization |
|
|
- Market launch |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Market Opportunity |
|
|
|
|
|
India has 1.4 billion users across dozens of languages, yet most AI models are optimized for Western datasets. |
|
|
Sanchari focuses on: |
|
|
|
|
|
Indian English, Telugu, Hindi |
|
|
|
|
|
Local accents |
|
|
|
|
|
Local knowledge |
|
|
|
|
|
Culturally aligned reasoning |
|
|
|
|
|
Vernacular business workflows |
|
|
|
|
|
|
|
|
Target Markets: |
|
|
|
|
|
Enterprises adopting AI |
|
|
|
|
|
Customer support automation |
|
|
|
|
|
Healthcare conversational assistants |
|
|
|
|
|
FinTech support & KYC automation |
|
|
|
|
|
Education & e-learning |
|
|
|
|
|
Government services (Digital India) |
|
|
|
|
|
|
|
|
Projected TAM (India AI Assistants): $3.5B+ by 2027 |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## โก Competitive Advantage |
|
|
|
|
|
Sanchari is designed specifically for Indian users, unlike global models trained mostly on Western data. |
|
|
|
|
|
Key differentiators: |
|
|
|
|
|
Native support for Telugu + Hindi + Indian English |
|
|
|
|
|
Dataset curated for Indian knowledge, culture, and business workflows |
|
|
|
|
|
Lightweight model versions for on-device and low-compute deployment |
|
|
|
|
|
Faster inference |
|
|
|
|
|
Lower cost for Indian startups |
|
|
|
|
|
Can be embedded into apps & enterprise workflows |
|
|
|
|
|
Privacy-friendly deployment options |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ง Technical Architecture (High-Level) |
|
|
|
|
|
Tokenizer |
|
|
|
|
|
Multilingual tokenizer optimized for Indic languages |
|
|
|
|
|
Handles mixed-script text (Eng + Indic) |
|
|
|
|
|
|
|
|
Model Family |
|
|
|
|
|
Sanchari-S (200โ350M) โ prototype |
|
|
|
|
|
Sanchari-M (1โ3B) โ mid-range |
|
|
|
|
|
Sanchari-L (7B+) โ flagship foundation model |
|
|
|
|
|
|
|
|
Training Stack |
|
|
|
|
|
PyTorch + DeepSpeed |
|
|
|
|
|
FlashAttention |
|
|
|
|
|
LoRA adapters for efficient instruction tuning |
|
|
|
|
|
Multi-GPU distributed training |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฐ Funding Plan (Seed: โน25,00,000) |
|
|
|
|
|
Where the funds go: |
|
|
|
|
|
Category Cost |
|
|
|
|
|
Multilingual licensed datasets โน6,00,000 |
|
|
Compute for training S, M models โน12,00,000 |
|
|
Storage, inference, and deployment โน3,00,000 |
|
|
Evaluation, safety testing โน1,00,000 |
|
|
Team & operations โน3,00,000 |
|
|
|
|
|
|
|
|
Deliverables to Investors: |
|
|
|
|
|
Checkpoints for Sanchari-S and M |
|
|
|
|
|
Evaluation results |
|
|
|
|
|
Demo API |
|
|
|
|
|
Weekly updates |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ค Founder |
|
|
|
|
|
Srikanth B. |
|
|
AI & product innovator focused on practical, multilingual AI solutions for India. |
|
|
Experience across product development, engineering leadership, and AI adoption for scalable business use cases. |
|
|
|
|
|
Email: boorgalasrikanth@gmail.com |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## ๐ฉ Contact |
|
|
Founder: **Srikanth** |
|
|
Email: **boorgalasrikanth@gmail.com** |