--- language: - en - hi - te license: apache-2.0 datasets: [] pipeline_tag: text-generation tags: - foundation-model - instruction-following - multilingual - investor-preview - placeholder --- # SANCHARI β€” v0.1 (Investor Preview) Sanchari is an upcoming instruction-following AI foundation model designed for Indian users, multilingual applications, and next-generation AI assistants. This repository is an **investor preview**. No model weights are uploaded yet. Training begins once project funding is approved. --- ## πŸš€ Vision To build India’s most practical, multilingual AI model optimized for: - Smart assistants - Real-time Q&A - Summarization - Content generation - Business automation --- ## πŸ“Œ Current Status (v0.1) - Repository created - Model card published - Demo placeholder will be added - Data licensing & compute setup pending - Training begins after funding --- ## 🧠 Planned Model Family ### **Sanchari-S (200–350M)** - First lightweight prototype - Fast inference - Suitable for apps & APIs ### **Sanchari-M (1–3B)** - Stronger reasoning - Better instruction-following ### **Sanchari-L (7B+)** - Full foundation model - Enterprise-grade multilingual intelligence --- ## πŸ› οΈ Roadmap Overview ### **Phase 1 (0–3 months)** - Dataset acquisition - Tokenizer creation - Train Sanchari-S - Publish evaluation & demo ### **Phase 2 (3–9 months)** - Train Sanchari-M - Safety testing - API + product demo ### **Phase 3 (9–18 months)** - Train Sanchari-L - Optimization - Market launch --- ## πŸ“ˆ Market Opportunity India has 1.4 billion users across dozens of languages, yet most AI models are optimized for Western datasets. Sanchari focuses on: Indian English, Telugu, Hindi Local accents Local knowledge Culturally aligned reasoning Vernacular business workflows Target Markets: Enterprises adopting AI Customer support automation Healthcare conversational assistants FinTech support & KYC automation Education & e-learning Government services (Digital India) Projected TAM (India AI Assistants): $3.5B+ by 2027 --- ## ⚑ Competitive Advantage Sanchari is designed specifically for Indian users, unlike global models trained mostly on Western data. Key differentiators: Native support for Telugu + Hindi + Indian English Dataset curated for Indian knowledge, culture, and business workflows Lightweight model versions for on-device and low-compute deployment Faster inference Lower cost for Indian startups Can be embedded into apps & enterprise workflows Privacy-friendly deployment options --- ## πŸ”§ Technical Architecture (High-Level) Tokenizer Multilingual tokenizer optimized for Indic languages Handles mixed-script text (Eng + Indic) Model Family Sanchari-S (200–350M) β€” prototype Sanchari-M (1–3B) β€” mid-range Sanchari-L (7B+) β€” flagship foundation model Training Stack PyTorch + DeepSpeed FlashAttention LoRA adapters for efficient instruction tuning Multi-GPU distributed training --- ## πŸ’° Funding Plan (Seed: β‚Ή25,00,000) Where the funds go: Category Cost Multilingual licensed datasets β‚Ή6,00,000 Compute for training S, M models β‚Ή12,00,000 Storage, inference, and deployment β‚Ή3,00,000 Evaluation, safety testing β‚Ή1,00,000 Team & operations β‚Ή3,00,000 Deliverables to Investors: Checkpoints for Sanchari-S and M Evaluation results Demo API Weekly updates --- ## πŸ‘€ Founder Srikanth B. AI & product innovator focused on practical, multilingual AI solutions for India. Experience across product development, engineering leadership, and AI adoption for scalable business use cases. Email: boorgalasrikanth@gmail.com --- ## πŸ“© Contact Founder: **Srikanth** Email: **boorgalasrikanth@gmail.com**