Spaces:
Sleeping
Sleeping
| title: Samarth | |
| emoji: π§βπ» | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: streamlit | |
| pinned: false | |
| short_description: 'Ask questions about Agriculture β Samarth gives answer from ' | |
| sdk_version: 1.51.0 | |
| # π Samarth: Data-Aware Question Answering System | |
| An AI-powered **Question Answering System** built using **LangChain** and **Streamlit**, designed to intelligently answer user queries based on multiple government data sources. | |
| --- | |
| ## π§ Project Overview | |
| This system enables natural language querying over diverse datasets. | |
| Each dataset is first analyzed to automatically generate structured **metadata** (including summary, columns, and use cases) using an LLM. | |
| These metadata representations are embedded into a **vector database** for semantic similarity search. | |
| When a user asks a question: | |
| 1. The system finds the **most relevant dataset** using semantic search. | |
| 2. It uses an **LLM to generate an appropriate SQL query** for that dataset. | |
| 3. The query is executed on the dataset, and the **result is interpreted into a human-readable answer** by another LLM. | |
| 4. If the dataset lacks relevant information, the system responds gracefully that no answer is available. | |
| --- | |
| ## βοΈ How to Use | |
| 1. **Run the Streamlit App** | |
| > streamlit run app.py | |
| 2. **Upload or choose your dataset(s)** | |
| - The app supports multiple tabular datasets (CSV). | |
| 3. **Ask a natural language question** | |
| Example: | |
| > βWhat was the average annual rainfall in Telangana in 2000?β | |
| 4. **View the result** | |
| * The system automatically identifies the most relevant dataset. | |
| * It generates, executes, and interprets a SQL query. | |
| * You receive a concise, natural answer with a verified data source link. | |
| ## ποΈ Adding New Datasets | |
| To include new datasets in the system, follow these simple steps: | |
| 1. **Download and place your dataset file** | |
| - Save the new dataset (CSV format) inside the `/datasets` folder. | |
| 2. **Update the metadata generation script** | |
| - Open `generate_metadata.py`. | |
| - Add the dataset details in the respective lists: | |
| - `dataset_links` β the datasetβs source link | |
| - `dataset_names` β a short descriptive name for the dataset | |
| - `datasets_list` β the filename of the dataset (as saved in the `/datasets` folder) | |
| 3. **Generate metadata** | |
| - Run the following command to generate structured metadata for all datasets: | |
| > python generate_metadata.py | |
| 4. **Restart the Streamlit app** | |
| - Once metadata is generated, rerun the app: | |
| > streamlit run app.py | |
| --- | |
| ## π§© Tech Stack | |
| * **LangChain** β for prompt orchestration and LLM integration | |
| * **Sentence Transformers + FAISS** β for vector similarity search | |
| * **Streamlit** β for interactive web UI | |
| * **Google Gemini / Generative AI** β for SQL and natural-language generation | |
| --- | |
| ## π Project Flow Summary | |
| User Query | |
| β | |
| Semantic Search (Vector DB) | |
| β | |
| SQL Query Generation (LLM) | |
| β | |
| SQL Execution on Dataset | |
| β | |
| Natural Language Answer (LLM) | |
| β | |
| Final Answer + Source Link | |
| --- |