# TerraSyncra AI – Product & System Overview ## 1. Product Introduction **TerraSyncra** is a multilingual agricultural intelligence agent designed specifically for Nigerian (and African) farmers. It provides comprehensive agricultural support through AI-powered assistance. **Key Capabilities:** - **Agricultural Q&A**: Answers questions about crops, livestock, soil, weather, pests, and diseases in multiple languages - **Soil Analysis**: Provides expert soil health assessments from lab reports and field data using Gemini 3 Flash - **Disease Detection**: Identifies plant and animal diseases from images, text descriptions, or voice input using Gemini 2.5 Flash - **Live Agricultural Updates**: Delivers real-time weather information and agricultural news through RAG (Retrieval-Augmented Generation) - **Live Voice Interaction**: Supports real-time voice conversations via WebSocket in local languages (Igbo, Hausa, Yoruba, English) **Developer**: Ifeanyi Amogu Shalom **Target Users**: Farmers, agronomists, agricultural extension officers, and agricultural support workers in Nigeria and similar contexts --- ## 2. Problem Statement Nigerian smallholder farmers face significant challenges: ### 2.1 Limited Access to Agricultural Experts - **Scarcity of agronomists and veterinarians** relative to the large farming population - **Geographic barriers** preventing farmers from accessing expert advice - **High consultation costs** that many smallholder farmers cannot afford - **Long waiting times** for professional consultations, especially during critical periods (disease outbreaks, planting seasons) ### 2.2 Language Barriers - Most agricultural information and resources are in **English**, while many farmers primarily speak **Hausa, Igbo, or Yoruba** - **Technical terminology** is not easily accessible in local languages - **Translation services** are often unavailable or unreliable ### 2.3 Fragmented Information Sources - Weather data, soil reports, disease information, and market prices are scattered across different platforms - **No unified system** to integrate and interpret multiple data sources - **Information overload** without proper context or prioritization ### 2.4 Time-Sensitive Decision Making - **Disease outbreaks** require immediate identification and treatment - **Weather changes** affect planting, harvesting, and irrigation decisions - **Pest attacks** can devastate crops if not addressed quickly - **Delayed responses** lead to significant economic losses ### 2.5 Solution Approach TerraSyncra addresses these challenges by providing: - **Fast, AI-powered responses** available 24/7 - **Multilingual support** (English, Igbo, Hausa, Yoruba) - **Integrated intelligence** combining expert models, RAG, and live data - **Accessible interface** via text, voice, and image inputs - **Professional consultation reminders** to ensure farmers seek expert confirmation when needed --- ## 3. System Architecture & Request Flows ### 3.1 General Agricultural Q&A – `POST /ask` **Step-by-Step Process:** 1. **Input Reception** - User sends `query` (text) with optional `session_id` for conversation continuity 2. **Language Detection** - FastText model (`facebook/fasttext-language-identification`) detects input language - Supports: English, Igbo, Hausa, Yoruba 3. **Translation (if needed)** - If language ≠ English, translates to English using NLLB (`drrobot9/nllb-ig-yo-ha-finetuned`) - Preserves original language for back-translation 4. **Intent Detection** - Classifies query into categories: - **Weather question**: Requests weather information (with/without Nigerian state) - **Live update**: Requests current agricultural news or updates - **Normal question**: General agricultural Q&A - **Low confidence**: Falls back to RAG when intent is unclear 5. **Context Building** - **Weather intent**: Calls WeatherAPI for state-specific weather data, embeds summary into context - **Live update intent**: Queries live FAISS vectorstore index for latest agricultural documents - **Low confidence**: Falls back to static FAISS index for safer, more general responses 6. **Conversation Memory** - Loads per-session history from `MemoryStore` (TTL cache, 1-hour expiration) - Trims to `MAX_HISTORY_MESSAGES` (default: 30) to prevent context overflow 7. **Expert Model Generation** - Uses **Qwen/Qwen1.5-1.8B** (finetuned for Nigerian agriculture) - Loaded lazily via `model_manager` (CPU-optimized, first-use loading) - Builds chat messages: system prompt + conversation history + current user message + context - System prompt restricts responses to **agriculture/farming topics only** - Generates bounded-length answer (reduced token limit: 400 tokens for general, 256 for weather) - Cleans response to remove any "Human: / Assistant:" style example continuations 8. **Back-Translation** - If original language ≠ English, translates answer back to user's language using NLLB 9. **Response** - Returns JSON: `{ query, answer, session_id, detected_language }` **Safety & Focus:** - System prompt enforces agriculture-only topic handling - Unrelated questions are redirected back to farming topics - Response cleaning prevents off-topic example continuations --- ### 3.2 Soil Analysis – `POST /analyze-soil` **Step-by-Step Process:** 1. **Input Reception** - `report_data`: Text description of soil report or lab results (required) - Optional fields: `location`, `crop_type`, `field_size`, `previous_crops`, `additional_notes` 2. **Agent Processing** - `soil_agent.analyze_soil()` builds comprehensive prompt with: - Soil report data - Field information (location, crop type, size, history) - Regional context (Nigerian states, climate patterns) 3. **Gemini API Call** - Model: `GEMINI_SOIL_MODEL = "gemini-3-flash-preview"` - Prompt style: Brief, direct, actionable - Focuses on: - Current soil condition (short summary) - Key nutrient issues (deficiencies or excesses) - 1–3 best crops for this soil type - Clear fertilizer and amendment recommendations - Simple soil improvement steps 4. **Output** - JSON response: `{ success, analysis, model_used }` **Important Note:** > Soil analysis is **advisory only** – not a formal agronomy diagnosis. The UI should encourage farmers to confirm with a local agronomist or extension officer for critical decisions. --- ### 3.3 Disease Detection #### 3.3.1 Image-Based Detection – `POST /detect-disease-image` **Step-by-Step Process:** 1. **Input Reception** - Image file (JPEG, PNG, etc.) - Optional `query`: Text description or question 2. **Agent Processing** - `disease_agent.classify_disease_from_image()` processes: - Image bytes + MIME type - User query (if provided) - Builds structured prompt for Gemini 3. **Gemini API Call** - Model: `GEMINI_DISEASE_MODEL = "gemini-2.5-flash"` - Prompt instructs Gemini to provide: - Disease name (scientific + common name) in 1 short line - **Threat level: Low / Moderate / High / Uncertain** (MANDATORY) - 2–3 key symptoms visible in image - 2–3 clear treatment steps (bullets) - 1–2 simple prevention tips - Brief, direct language with short sentences 4. **Backend Safety Enforcement** - Backend **always appends** disclaimer: > "IMPORTANT: This threat level is an estimate based only on the image/description. For an accurate diagnosis and treatment plan, please consult a qualified agronomist, veterinary doctor, or local agricultural extension officer." 5. **Output** - JSON response: `{ success, classification, model_used, input_type }` #### 3.3.2 Text/Voice-Based Detection – `POST /detect-disease-text` **Step-by-Step Process:** 1. **Input Reception** - `description`: Text description of disease symptoms or condition - `language`: Language code (en, ig, ha, yo) 2. **Agent Processing** - `disease_agent.classify_disease_from_text()` processes: - Text description - Language context - Builds structured prompt for Gemini 3. **Gemini API Call** - Same model and prompt structure as image-based detection - Threat level assessment based on described symptoms 4. **Backend Safety Enforcement** - Same disclaimer appended as image-based detection 5. **Output** - JSON response: `{ success, classification, model_used, input_type }` **Threat Level Guidelines:** - **Low**: Mild or early-stage issue, unlikely to cause major losses if addressed soon - **Moderate**: Noticeable risk that can reduce yield/health if not treated - **High**: Serious or fast-spreading issue that can cause major losses or death (use cautiously, only when clearly severe) - **Uncertain**: Insufficient or ambiguous data; model cannot safely rate risk (encouraged when not confident) --- ### 3.4 Live Voice Interaction – `WS /live-voice` & `POST /live-voice-start` **Step-by-Step Process:** 1. **WebSocket Connection** - Client connects to `/live-voice` endpoint - Optional: Send image as JSON (base64 encoded) at session start - Audio chunks streamed as raw PCM bytes (16kHz, mono, 16-bit) 2. **Agent Processing** - `live_voice_agent.handle_live_voice_websocket()` manages: - WebSocket connection lifecycle - Image context (if provided) - Audio streaming to Gemini Live API - Audio response streaming back to client 3. **Gemini Live API** - Model: `gemini-2.5-flash` via Gemini Live API - System prompt: Brief, clear, focused on "what to do next" (2–4 key steps) - Supports: Disease detection, soil analysis, general farming, weather - Prefers short sentences and bullet points 4. **Response Streaming** - Audio responses streamed back as PCM bytes - Optional JSON messages for status/transcripts 5. **Safety Expectations** - Same professional advice principle applies - Frontends should display clear "not a replacement for a professional" banner --- ## 4. Technologies Used ### 4.1 Backend Framework & Infrastructure - **FastAPI**: Modern Python web framework for building REST APIs and WebSocket endpoints - **Uvicorn**: ASGI server for running FastAPI applications - **Python 3.10**: Programming language - **Docker**: Containerization for deployment - **Hugging Face Spaces**: Deployment platform (Docker runtime, CPU-only environment) ### 4.2 Core Language Models #### 4.2.1 Expert Model: Qwen/Qwen1.5-1.8B - **Model**: `Qwen/Qwen1.5-1.8B` (via Hugging Face Transformers) - **Purpose**: Primary agricultural Q&A and conversation - **Specialization**: **Finetuned/specialized** for Nigerian agricultural context through: - Custom system prompts focused on Nigerian farming practices - Domain-specific training data integration - Response formatting optimized for agricultural advice - **Optimization**: - Lazy loading via `model_manager` (loads on first use) - CPU-optimized inference (float32, device_map="cpu") - Reduced token limits to prevent over-generation #### 4.2.2 Gemini Models (Google AI) - **google-genai**: Official Python client for Google's Gemini API - **gemini-3-flash-preview**: Used for soil analysis - **gemini-2.5-flash**: Used for disease detection and live voice interaction - **API Version**: v1alpha for advanced features (disease detection, live voice) ### 4.3 Retrieval-Augmented Generation (RAG) - **LangChain**: Framework for building LLM applications - **LangChain Community**: Community integrations and tools - **SentenceTransformers**: - Model: `paraphrase-multilingual-MiniLM-L12-v2` - Purpose: Text embeddings for semantic search - **FAISS (Facebook AI Similarity Search)**: - Vector database for efficient similarity search - Two indices: Static (general knowledge) and Live (current updates) - **APScheduler**: Background job scheduler for periodic RAG updates ### 4.4 Language Processing - **FastText**: - Model: `facebook/fasttext-language-identification` - Purpose: Language detection (English, Igbo, Hausa, Yoruba) - **NLLB (No Language Left Behind)**: - Model: `drrobot9/nllb-ig-yo-ha-finetuned` - Purpose: Translation between English and Nigerian languages (Hausa, Igbo, Yoruba) - Bidirectional translation support ### 4.5 External APIs & Data Sources - **WeatherAPI**: - Provides state-level weather data for Nigerian states - Real-time weather information integration - **AgroNigeria / HarvestPlus**: - Agricultural news feeds for RAG updates - News scraping and processing ### 4.6 Additional Libraries - **transformers**: Hugging Face library for loading and using transformer models - **torch**: PyTorch (CPU-optimized version) - **numpy**: Numerical computing - **requests**: HTTP library for API calls - **beautifulsoup4**: Web scraping for news aggregation - **python-multipart**: File upload support for FastAPI - **python-dotenv**: Environment variable management --- ## 5. Threat Level & Safety Policy ### 5.1 Domain Scope - **Plant and animal diseases only** – **NOT human health** - Focuses on agricultural and veterinary contexts - Does not provide medical advice for humans ### 5.2 Threat Level Categories #### Low - **Definition**: Mild or early-stage issue, unlikely to cause major losses if addressed soon - **Characteristics**: - Localized symptoms - Slow progression - Easily manageable with standard treatments - **Example**: Minor leaf spots, early nutrient deficiency #### Moderate - **Definition**: Noticeable risk that can reduce yield/health if not treated - **Characteristics**: - Moderate spread or impact - Requires timely intervention - Can cause economic losses if ignored - **Example**: Moderate pest infestation, developing fungal infection #### High - **Definition**: Serious or fast-spreading issue that can cause major losses or death - **Characteristics**: - Rapid spread or severe symptoms - High potential for significant economic impact - May require immediate professional intervention - **Example**: Severe bacterial blight, fast-spreading viral disease - **Usage Caution**: Only assigned when signs are **clearly severe** or fast-spreading #### Uncertain - **Definition**: Insufficient or ambiguous data; model cannot safely rate risk - **Characteristics**: - Unclear symptoms - Multiple possible diagnoses - Poor image quality or vague description - **Usage**: Encouraged when model is not confident – **better to be uncertain than wrong** ### 5.3 Accuracy & Caution Approach **Threat Level Assessment:** - Based **only** on image + description – **no lab tests or physical examination** - Prompts instruct Gemini to be **conservative and cautious** - Model encouraged to use `Uncertain` when not clearly sure - Final responses always embed a strong "consult professionals" reminder **Professional Consultation Reminder:** - Backend **always appends** disclaimer to disease detection responses - Frontends should visually emphasize: "This is not a medical/veterinary/agronomic diagnosis" - System is a **decision-support tool**, not a definitive diagnostic engine **Important Note:** > **This system is a decision-support tool, not a definitive diagnosis engine.** > All disease/threat outputs must be treated as preliminary guidance only. > Farmers should always consult qualified professionals for critical decisions. --- ## 6. Limitations & Issues Faced ### 6.1 Diagnostic Limitations #### Input Quality Dependencies - **Image Quality**: Blurry, poorly lit, or low-resolution images reduce accuracy - **Description Clarity**: Vague or incomplete symptom descriptions limit diagnostic precision - **Context Missing**: Lack of field history, crop variety, or environmental conditions affects recommendations #### Inherent Limitations - **No Physical Examination**: Cannot inspect internal plant structures or perform lab tests - **No Real-Time Monitoring**: Cannot track disease progression over time - **Regional Variations**: Some regional diseases may be under-represented in training data - **Seasonal Factors**: Disease presentation may vary by season, which may not always be captured ### 6.2 Language & Translation Challenges #### Translation Accuracy - **NLLB Limitations**: Can misread slang, mixed-language (e.g., Pidgin + Hausa), or regional dialects - **Technical Terminology**: Agricultural terms may not have direct translations, leading to approximations - **Context Loss**: Subtle meaning can be lost across translation steps (user language → English → user language) #### Language Detection - **FastText Edge Cases**: May misclassify mixed-language inputs or code-switching - **Dialect Variations**: Regional variations within languages may not be fully captured ### 6.3 Model Behavior Issues #### Hallucination Risk - **Qwen/Gemini Limitations**: Can generate confident but incorrect answers - **Mitigations Applied**: - Stricter system prompts with domain restrictions - Shorter output limits (400 tokens for general, 256 for weather) - Response cleaning to remove example continuations - Topic redirection for unrelated questions - **Not Bulletproof**: Hallucination can still occur, especially for edge cases #### Response Drift - **Off-Topic Continuations**: Models may continue with example conversations or unrelated content - **Mitigation**: Response cleaning logic removes "Human: / Assistant:" patterns and unrelated content ### 6.4 Latency & Compute Constraints #### First-Request Latency - **Model Loading**: First Qwen/NLLB call is slower due to model + weights loading on CPU - **Cold Start**: ~5-10 seconds for first request after deployment - **Subsequent Requests**: Faster due to cached models in memory #### CPU-Only Environment - **Inference Speed**: CPU inference is slower than GPU (acceptable for Hugging Face Spaces CPU tier) - **Memory Constraints**: Limited RAM requires careful model management (lazy loading, model caching) ### 6.5 External Dependencies #### WeatherAPI Issues - **Outages**: WeatherAPI downtime affects weather-related responses - **Rate Limits**: API quota limits may restrict frequent requests - **Data Accuracy**: Weather data quality depends on third-party provider #### News Source Reliability - **Scraping Fragility**: News sources may change HTML structure, breaking scrapers - **Update Frequency**: RAG updates are scheduled; failures can cause stale information - **Content Quality**: News article quality and relevance vary ### 6.6 RAG & Data Freshness #### Update Scheduling - **Periodic Updates**: RAG indices updated on schedule (not real-time) - **Job Failures**: If update job fails, index can lag behind real-world events - **Index Rebuilding**: Full index rebuilds can be time-consuming #### Vectorstore Limitations - **Embedding Quality**: Semantic search quality depends on embedding model performance - **Retrieval Accuracy**: Retrieved documents may not always be most relevant - **Context Window**: Limited context window may truncate important information ### 6.7 Deployment & Infrastructure #### Hugging Face Spaces Constraints - **CPU-Only**: No GPU acceleration available - **Memory Limits**: Limited RAM requires optimization (lazy loading, model size reduction) - **Build Time**: Docker builds can be slow, especially with large dependencies - **Cold Starts**: Spaces may spin down after inactivity, causing cold start delays #### Docker Build Issues - **Dependency Conflicts**: Some Python packages may conflict (e.g., pyaudio requiring system libraries) - **Build Timeouts**: Long build times may cause deployment failures - **Cache Management**: Docker layer caching can be inconsistent --- ## 7. Recommended UX & Safety Reminders ### 7.1 Visual Disclaimers **Always display a clear banner near disease/soil results:** > "⚠️ **This is AI-generated guidance. Always confirm with a local agronomist, veterinary doctor, or agricultural extension officer before taking major actions.**" ### 7.2 Threat Level Display - **Visual Highlighting**: Display threat level prominently with color coding: - 🟢 **Low**: Green - 🟡 **Moderate**: Yellow - 🔴 **High**: Red - ⚪ **Uncertain**: Gray - **Tooltips**: Provide explanations for each threat level - **Always Pair with Disclaimer**: Never show threat level without the professional consultation reminder ### 7.3 Call-to-Action Buttons Provide quick access to professional help: - **"Contact an Extension Officer"** button/link - **"Find a Vet/Agronomist Near You"** button/link - **"Schedule a Consultation"** option (if available) ### 7.4 Response Quality Indicators - Show **confidence indicators** when available (e.g., "High confidence" vs "Uncertain") - Display **input quality warnings** (e.g., "Image quality may affect accuracy") - Provide **feedback mechanisms** for users to report incorrect diagnoses ### 7.5 Language Support - Clearly indicate **detected language** in responses - Provide **language switcher** for users to change language preference - Show **translation quality warnings** if translation may be approximate --- ## 8. System Summary ### 8.1 Problem Addressed Nigerian smallholder farmers face critical challenges: - **Limited access to agricultural experts** (agronomists, veterinarians) - **Language barriers** (most resources in English, farmers speak Hausa/Igbo/Yoruba) - **Fragmented information sources** (weather, soil, disease data scattered) - **Time-sensitive decision making** (disease outbreaks, weather changes, pest attacks) ### 8.2 Solution Provided TerraSyncra combines multiple AI technologies to provide: - **Fast, 24/7 AI-powered responses** in multiple languages - **Integrated intelligence**: - **Finetuned Qwen 1.8B** expert model for agricultural Q&A - **Gemini 3/2.5 Flash** for soil analysis and disease detection - **RAG + Weather + News** for live, contextual information - **CPU-optimized, multilingual backend** (FastAPI on Hugging Face Spaces) - **Multiple input modalities**: Text, voice, and image support ### 8.3 Safety & Professional Consultation **Every disease assessment includes:** - Explicit **Threat level** (Low / Moderate / High / Uncertain) - Clear **professional consultation reminder** - Emphasis that threat levels are **estimates**, not definitive diagnoses ### 8.4 Key Technologies - **Expert Model**: Qwen/Qwen1.5-1.8B (finetuned for Nigerian agriculture) - **Gemini Models**: gemini-3-flash-preview (soil), gemini-2.5-flash (disease, voice) - **RAG**: LangChain + FAISS + SentenceTransformers - **Language Processing**: FastText (detection) + NLLB (translation) - **Backend**: FastAPI + Uvicorn + Docker - **Deployment**: Hugging Face Spaces (CPU-optimized) ### 8.5 Developer & Credits **Developer**: Ifeanyi Amogu Shalom **Intended Users**: Farmers, agronomists, agricultural extension officers, and agricultural support workers in Nigeria and similar contexts --- ## 9. Future Improvements & Roadmap ### 9.1 Potential Enhancements - **Model Fine-tuning**: Further fine-tune Qwen on Nigerian agricultural datasets - **Multi-modal RAG**: Integrate images into RAG for visual similarity search - **Offline Mode**: Support for offline operation in areas with poor connectivity - **Mobile App**: Native mobile applications for better user experience - **Expert Network Integration**: Direct connection to network of agronomists/veterinarians - **Historical Tracking**: Track disease progression and treatment outcomes over time ### 9.2 Technical Improvements - **Response Caching**: Cache common queries to reduce latency - **Model Quantization**: Further optimize models for CPU inference - **Better Error Handling**: More robust error messages and fallback mechanisms - **Monitoring & Analytics**: Track system performance and user feedback --- **Last Updated**: 2026 **Version**: 1.0 **Status**: Production (Hugging Face Spaces)