Sanchari / README.md
Mike369williams's picture
Update README.md
908b171 verified
---
language:
- en
- hi
- te
license: apache-2.0
datasets: []
pipeline_tag: text-generation
tags:
- foundation-model
- instruction-following
- multilingual
- investor-preview
- placeholder
---
# SANCHARI โ€” v0.1 (Investor Preview)
Sanchari is an upcoming instruction-following AI foundation model designed for
Indian users, multilingual applications, and next-generation AI assistants.
This repository is an **investor preview**.
No model weights are uploaded yet.
Training begins once project funding is approved.
---
## ๐Ÿš€ Vision
To build Indiaโ€™s most practical, multilingual AI model optimized for:
- Smart assistants
- Real-time Q&A
- Summarization
- Content generation
- Business automation
---
## ๐Ÿ“Œ Current Status (v0.1)
- Repository created
- Model card published
- Demo placeholder will be added
- Data licensing & compute setup pending
- Training begins after funding
---
## ๐Ÿง  Planned Model Family
### **Sanchari-S (200โ€“350M)**
- First lightweight prototype
- Fast inference
- Suitable for apps & APIs
### **Sanchari-M (1โ€“3B)**
- Stronger reasoning
- Better instruction-following
### **Sanchari-L (7B+)**
- Full foundation model
- Enterprise-grade multilingual intelligence
---
## ๐Ÿ› ๏ธ Roadmap Overview
### **Phase 1 (0โ€“3 months)**
- Dataset acquisition
- Tokenizer creation
- Train Sanchari-S
- Publish evaluation & demo
### **Phase 2 (3โ€“9 months)**
- Train Sanchari-M
- Safety testing
- API + product demo
### **Phase 3 (9โ€“18 months)**
- Train Sanchari-L
- Optimization
- Market launch
---
## ๐Ÿ“ˆ Market Opportunity
India has 1.4 billion users across dozens of languages, yet most AI models are optimized for Western datasets.
Sanchari focuses on:
Indian English, Telugu, Hindi
Local accents
Local knowledge
Culturally aligned reasoning
Vernacular business workflows
Target Markets:
Enterprises adopting AI
Customer support automation
Healthcare conversational assistants
FinTech support & KYC automation
Education & e-learning
Government services (Digital India)
Projected TAM (India AI Assistants): $3.5B+ by 2027
---
## โšก Competitive Advantage
Sanchari is designed specifically for Indian users, unlike global models trained mostly on Western data.
Key differentiators:
Native support for Telugu + Hindi + Indian English
Dataset curated for Indian knowledge, culture, and business workflows
Lightweight model versions for on-device and low-compute deployment
Faster inference
Lower cost for Indian startups
Can be embedded into apps & enterprise workflows
Privacy-friendly deployment options
---
## ๐Ÿ”ง Technical Architecture (High-Level)
Tokenizer
Multilingual tokenizer optimized for Indic languages
Handles mixed-script text (Eng + Indic)
Model Family
Sanchari-S (200โ€“350M) โ€” prototype
Sanchari-M (1โ€“3B) โ€” mid-range
Sanchari-L (7B+) โ€” flagship foundation model
Training Stack
PyTorch + DeepSpeed
FlashAttention
LoRA adapters for efficient instruction tuning
Multi-GPU distributed training
---
## ๐Ÿ’ฐ Funding Plan (Seed: โ‚น25,00,000)
Where the funds go:
Category Cost
Multilingual licensed datasets โ‚น6,00,000
Compute for training S, M models โ‚น12,00,000
Storage, inference, and deployment โ‚น3,00,000
Evaluation, safety testing โ‚น1,00,000
Team & operations โ‚น3,00,000
Deliverables to Investors:
Checkpoints for Sanchari-S and M
Evaluation results
Demo API
Weekly updates
---
## ๐Ÿ‘ค Founder
Srikanth B.
AI & product innovator focused on practical, multilingual AI solutions for India.
Experience across product development, engineering leadership, and AI adoption for scalable business use cases.
Email: boorgalasrikanth@gmail.com
---
## ๐Ÿ“ฉ Contact
Founder: **Srikanth**
Email: **boorgalasrikanth@gmail.com**