| <div align="center"> |
|
|
| # 🌍 SamyamLM |
|
|
| ## Satellite-Based Multimodal Data Labeling for Indian Language AI |
|
|
| **Scale AI for India — 59% faster, 100% native Hindi support** |
|
|
| [](LICENSE) |
| [](https://www.makeinindia.com) |
| [](https://www.python.org) |
| [](https://pytorch.org) |
| [](https://huggingface.co/spaces/techindro/SamyamLm-Demo) |
| [](https://huggingface.co/techindro) |
| [](https://samyam-space-labels.vercel.app) |
|
|
| </div> |
|
|
| --- |
|
|
| ## 🚀 Live Demos |
|
|
| | Demo | Link | |
| |------|------| |
| | 🛣️ Indian Road Detector | [Try Now](https://huggingface.co/spaces/techindro/SamyamLm-Demo) | |
| | 🚗 Self Driving Car | [Try Now](https://huggingface.co/spaces/techindro/SamyamLm-SelfDriving) | |
| | 🏥 Health Detector | [Try Now](https://huggingface.co/spaces/techindro/SamyamLm-Health) | |
| | 📚 Education Detector | [Try Now](https://huggingface.co/spaces/techindro/SamyamLm-Education) | |
|
|
| 🌐 Website: [samyam-space-labels.vercel.app](https://samyam-space-labels.vercel.app) |
|
|
| --- |
|
|
| ## 📖 What is SamyamLM? |
|
|
| SamyamLM is a data labeling platform built specifically for Indian languages and Indian geography. It helps create training data for AI models using satellite images, road cameras, and Hindi text. |
|
|
| ### The Name |
|
|
| - **Samyam** (संयम) = Discipline and control in Sanskrit |
| - **LM** = Language Model |
|
|
| So SamyamLM means disciplined, high-quality data labeling for AI systems in India. |
|
|
| ### What Problem Does It Solve? |
|
|
| Most AI labeling companies like Scale AI, Labelbox, and Appen were built for Western countries. They don't work well for India because: |
|
|
| 1. They don't support Hindi or other Indian scripts |
| 2. They don't understand Indian road conditions (auto-rickshaws, cattle, potholes) |
| 3. They can't process satellite images of Indian geography |
| 4. They fail in Indian weather (monsoon, dust, night driving) |
|
|
| ### How Does SamyamLM Work? |
|
|
| The platform has six parts that work together: |
|
|
| | Part | What It Does | |
| |------|---------------| |
| | Satellite Imagery | Takes pictures from ISRO satellites (5m to 30m resolution) | |
| | Ground Cameras | Records video from cameras on Indian roads | |
| | Hindi Text | Reads and understands Hindi language inputs | |
| | AI Pre-labeling | Does 58% of the work automatically using AI models | |
| | Human Review | Lets people check and fix labels using Hindi keyboard | |
| | Quality Check | Runs 3 tests to ensure labels are correct | |
|
|
| ### What Makes It Different? |
|
|
| SamyamLM can detect 47 objects that other platforms miss completely: |
|
|
| - Auto-rickshaws, cycle-rickshaws, tractors, bullock carts |
| - Cattle, stray dogs, buffalo, camels, elephants |
| - Kutcha roads, potholes, speed breakers |
| - Monsoon rain, dust haze, night driving conditions |
|
|
| ### How Well Does It Perform? |
|
|
| Compared to Scale AI (the industry leader): |
|
|
| - **59% faster** annotation speed |
| - **15.6% better** at answering Hindi questions about images |
| - **19.7% better** at detecting Indian road objects |
| - **58% cheaper** per label |
|
|
| ### Who Is It For? |
|
|
| - Self-driving car companies working on Indian roads |
| - AI companies that want Hindi language models |
| - Government agencies doing disaster response or crop monitoring |
| - Satellite imaging companies |
|
|
| ### What Has Been Built So Far? |
|
|
| The current version includes: |
| - 275,000 labeled samples |
| - 4.5 million individual annotations |
| - A working web interface in Hindi |
| - Open source code on GitHub |
| - 4 Live AI Demos on Hugging Face |
|
|
| ### What's Next? |
|
|
| - Support for all 22 Indian languages |
| - Real-time satellite data processing |
| - API for companies to use |
| - Expansion to other countries like Indonesia and Nigeria |
|
|
| ### The Big Picture |
|
|
| SamyamLM's goal is simple: make AI that actually understands India. Not as an afterthought, but built from the ground up for Indian languages, Indian roads, Indian weather, and Indian geography. |
|
|
| --- |
|
|
| **SamyamLM** is the world's first satellite-based multimodal data labeling platform built specifically for Indian languages and geographies. |
|
|
| ### The Name |
|
|
| **Samyam** (संयम) = Discipline + Control in Sanskrit |
| **LM** = Language Model |
|
|
| Together, **SamyamLM** represents disciplined, controlled, and high-quality data labeling for AI systems serving India. |
|
|
| ### What Does It Do? |
|
|
| SamyamLM helps companies and researchers create training data for AI models by combining: |
|
|
| | Component | What It Does | |
| |-----------|---------------| |
| | 🛰️ **Satellite Imagery** | Processes ISRO and commercial satellite feeds (5m-30m resolution) | |
| | 📷 **Ground Cameras** | Analyzes dashcam footage from Indian roads | |
| | 📝 **Hindi Text** | Understands and annotates Hindi and other Indic languages | |
| | 🤖 **AI Pre-labeling** | Reduces human effort by 58% using CLIP-based models | |
| | 👨💻 **Human Review** | Hindi-first interface with Devanagari keyboard | |
| | ✅ **Quality Assurance** | 3-stage QA with Cohen's κ > 0.75 | |
|
|
| ### Why SamyamLM? |
|
|
| Most AI labeling platforms are built for English and Western data. They don't understand: |
| - Hindi sentences and grammar |
| - Indian road conditions (auto-rickshaws, cattle, potholes) |
| - Satellite imagery for Indian geography |
| - Monsoon, dust haze, and night driving in India |
|
|
| **SamyamLM fixes all of this.** It's AI training data that actually understands India. |
|
|
| --- |
|
|
| ## 📊 Key Results at a Glance |
|
|
| | Metric | SamyamLM | Industry Average | Improvement | |
| |--------|----------|------------------|-------------| |
| | Annotation Throughput | 510 labels/hour | 320 labels/hour | **+59%** | |
| | Hindi VQA Accuracy | 67.4% | 51.8% | **+15.6%** | |
| | India-Specific Object Detection | 58.3% mAP | 38.6% mAP | **+19.7%** | |
| | Cost per Label | $0.12 | $0.29 | **-58%** | |
|
|
| --- |
|
|
| ## 🎯 The Problem |
|
|
| **Global AI training data ignores 1.4 billion Indian voices.** |
|
|
| Existing platforms like Scale AI, Labelbox, and Appen were built for Western markets: |
|
|
| | Limitation | Consequence | |
| |------------|-------------| |
| | No Indic script support | Cannot annotate in Hindi, Tamil, Telugu, Bengali | |
| | No Indian semantic understanding | Models fail on cultural context | |
| | No satellite geospatial integration | Disaster response AI is blind | |
| | No Indian road objects | Self-driving cars miss auto-rickshaws and cattle | |
|
|
| **The result:** AI models that work perfectly in San Francisco but fail in Mumbai, Delhi, and Chennai. |
|
|
| --- |
|
|
| ## 🚀 The Solution |
|
|
| SamyamLM is the first data labeling platform purpose-built for India's linguistic and geographic diversity. |
|
|
| ### Comparison with Existing Platforms |
|
|
| | Feature | Scale AI | Labelbox | Appen | SamyamLM | |
| |---------|----------|----------|-------|----------| |
| | Hindi Language Support | ❌ | ❌ | Partial | ✅ Native | |
| | Devanagari Script UI | ❌ | ❌ | ❌ | ✅ Yes | |
| | Satellite Imagery Input | ❌ | ❌ | ❌ | ✅ Yes | |
| | India-Specific Objects | ❌ | ❌ | ❌ | ✅ 47 classes | |
| | Indian Road Conditions | ❌ | ❌ | ❌ | ✅ Yes | |
| | Adverse Weather (Monsoon) | ❌ | ❌ | ❌ | ✅ Yes | |
| | Cost per Label | $0.29 | $0.27 | $0.25 | $0.12 | |
|
|
| --- |
|
|
| ## 📊 Benchmark Results |
|
|
| ### Hindi Visual Question Answering (IndicVQA Benchmark) |
|
|
| | Model | Accuracy | |
| |-------|----------| |
| | SamyamLM-VL (ours) | **67.4%** | |
| | MuRIL-VL | 51.8% | |
| | Flamingo-9B | 34.1% | |
| | CLIP (zero-shot) | 28.7% | |
|
|
| **SamyamLM improvement: +15.6% over best baseline** |
|
|
| ### Indian Road Object Detection (mAP@0.5) |
|
|
| | Model | mAP | |
| |-------|-----| |
| | SamyamLM fine-tuned (ours) | **58.3%** | |
| | Scale AI fine-tuned | 38.6% | |
| | YOLOv8 (COCO) | 31.2% | |
|
|
| **SamyamLM improvement: +19.7% over Scale AI on India-specific classes** |
|
|
| ### Annotation Throughput (labels per hour) |
|
|
| | Platform | Labels/Hour | |
| |----------|-------------| |
| | SamyamLM (ours) | **510** | |
| | Scale AI | 320 | |
| | Labelbox | 280 | |
| | Appen | 260 | |
|
|
| **SamyamLM advantage: 59% faster than Scale AI** |
|
|
| --- |
|
|
| ## 🛰️ India-Specific Object Classes (47) |
|
|
| SamyamLM detects objects that other platforms completely miss: |
|
|
| | Category | Examples | |
| |----------|----------| |
| | **Vehicles** | Auto-rickshaw (ऑटो-रिक्शा), Cycle-rickshaw (साइकिल-रिक्शा), Tractor (ट्रैक्टर), Tempo (टेंपो), Bullock cart (बैलगाड़ी) | |
| | **Animals** | Cattle (मवेशी), Stray dog (आवारा कुत्ता), Buffalo (भैंस), Camel (ऊंट), Elephant (हाथी) | |
| | **Road Conditions** | Kutcha road (कच्ची सड़क), Pothole (गड्ढा), Speed breaker (स्पीड ब्रेकर), Missing signage (गायब साइनेज) | |
| | **Adverse Weather** | Monsoon rain (मानसून बारिश), Dust haze (धूल भरी आंधी), Night driving (रात में ड्राइविंग), Dense fog (घना कोहरा) | |
|
|
| --- |
|
|
| ## 📁 Dataset v1.0 Statistics |
|
|
| | Split | Modality | Samples | Annotated Labels | |
| |-------|----------|---------|------------------| |
| | Train | Satellite | 120,000 | 1,840,000 | |
| | Val | Satellite | 15,000 | 230,000 | |
| | Train | Ground Driving | 80,000 | 2,100,000 | |
| | Val | Ground Driving | 10,000 | 260,000 | |
| | Train | Hindi VQA | 45,000 | 90,000 | |
| | Val | Hindi VQA | 5,000 | 10,000 | |
| | **Total** | **All** | **275,000** | **4,530,000** | |
|
|
| --- |
|
|
| ## 🏗️ Technology Stack |
|
|
| | Layer | Technologies | |
| |-------|--------------| |
| | Vision-Language Model | CLIP (ViT-B/32), Fine-tuned checkpoint | |
| | Deep Learning | PyTorch 2.0+, HuggingFace Transformers | |
| | Geospatial | GDAL, Rasterio, ISRO Resourcesat-2A API | |
| | Backend | FastAPI, PostgreSQL, Redis | |
| | Frontend | React, Devanagari keyboard integration | |
| | Infrastructure | AWS S3, EC2, CloudFront | |
|
|
| --- |
|
|
| ## 📜 License |
|
|
| MIT — Free to use, modify, and distribute. |
|
|
| --- |
|
|
| ## 🤝 Contributing |
|
|
| PRs welcome! let's Build the future of Bharat 🇮🇳 |
|
|