---
title: Samarth
emoji: 🧑‍💻
colorFrom: indigo
colorTo: purple
sdk: streamlit
pinned: false
short_description: 'Ask questions about Agriculture — Samarth gives answer from '
sdk_version: 1.51.0
---

# 📊 Samarth: Data-Aware Question Answering System

An AI-powered **Question Answering System** built using **LangChain** and **Streamlit**, designed to intelligently answer user queries based on multiple government data sources.

---

## 🧠 Project Overview

This system enables natural language querying over diverse datasets.  
Each dataset is first analyzed to automatically generate structured **metadata** (including summary, columns, and use cases) using an LLM.  
These metadata representations are embedded into a **vector database** for semantic similarity search.

When a user asks a question:

1. The system finds the **most relevant dataset** using semantic search.  
2. It uses an **LLM to generate an appropriate SQL query** for that dataset.  
3. The query is executed on the dataset, and the **result is interpreted into a human-readable answer** by another LLM.  
4. If the dataset lacks relevant information, the system responds gracefully that no answer is available.

---

## ⚙️ How to Use

1. **Run the Streamlit App**
    > streamlit run app.py


2. **Upload or choose your dataset(s)**
    - The app supports multiple tabular datasets (CSV).

3. **Ask a natural language question**
   Example:

   > “What was the average annual rainfall in Telangana in 2000?”

4. **View the result**

   * The system automatically identifies the most relevant dataset.
   * It generates, executes, and interprets a SQL query.
   * You receive a concise, natural answer with a verified data source link.


## 🗂️ Adding New Datasets

To include new datasets in the system, follow these simple steps:

1. **Download and place your dataset file**
   - Save the new dataset (CSV format) inside the `/datasets` folder.

2. **Update the metadata generation script**
   - Open `generate_metadata.py`.
   - Add the dataset details in the respective lists:
     - `dataset_links` → the dataset’s source link  
     - `dataset_names` → a short descriptive name for the dataset  
     - `datasets_list` → the filename of the dataset (as saved in the `/datasets` folder)

3. **Generate metadata**

  - Run the following command to generate structured metadata for all datasets:
      > python generate_metadata.py

4. **Restart the Streamlit app**

  - Once metadata is generated, rerun the app:
      > streamlit run app.py

---

## 🧩 Tech Stack

* **LangChain** – for prompt orchestration and LLM integration
* **Sentence Transformers + FAISS** – for vector similarity search
* **Streamlit** – for interactive web UI
* **Google Gemini / Generative AI** – for SQL and natural-language generation

---

## 📁 Project Flow Summary


User Query
   ↓
Semantic Search (Vector DB)
   ↓
SQL Query Generation (LLM)
   ↓
SQL Execution on Dataset
   ↓
Natural Language Answer (LLM)
   ↓
Final Answer + Source Link

---