Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.54.0
title: Samarth
emoji: π§βπ»
colorFrom: indigo
colorTo: purple
sdk: streamlit
pinned: false
short_description: 'Ask questions about Agriculture β Samarth gives answer from '
sdk_version: 1.51.0
π Samarth: Data-Aware Question Answering System
An AI-powered Question Answering System built using LangChain and Streamlit, designed to intelligently answer user queries based on multiple government data sources.
π§ Project Overview
This system enables natural language querying over diverse datasets.
Each dataset is first analyzed to automatically generate structured metadata (including summary, columns, and use cases) using an LLM.
These metadata representations are embedded into a vector database for semantic similarity search.
When a user asks a question:
- The system finds the most relevant dataset using semantic search.
- It uses an LLM to generate an appropriate SQL query for that dataset.
- The query is executed on the dataset, and the result is interpreted into a human-readable answer by another LLM.
- If the dataset lacks relevant information, the system responds gracefully that no answer is available.
βοΈ How to Use
Run the Streamlit App
streamlit run app.py
Upload or choose your dataset(s)
- The app supports multiple tabular datasets (CSV).
Ask a natural language question Example:
βWhat was the average annual rainfall in Telangana in 2000?β
View the result
- The system automatically identifies the most relevant dataset.
- It generates, executes, and interprets a SQL query.
- You receive a concise, natural answer with a verified data source link.
ποΈ Adding New Datasets
To include new datasets in the system, follow these simple steps:
Download and place your dataset file
- Save the new dataset (CSV format) inside the
/datasetsfolder.
- Save the new dataset (CSV format) inside the
Update the metadata generation script
- Open
generate_metadata.py. - Add the dataset details in the respective lists:
dataset_linksβ the datasetβs source linkdataset_namesβ a short descriptive name for the datasetdatasets_listβ the filename of the dataset (as saved in the/datasetsfolder)
- Open
Generate metadata
- Run the following command to generate structured metadata for all datasets:
python generate_metadata.py
- Restart the Streamlit app
- Once metadata is generated, rerun the app:
streamlit run app.py
π§© Tech Stack
- LangChain β for prompt orchestration and LLM integration
- Sentence Transformers + FAISS β for vector similarity search
- Streamlit β for interactive web UI
- Google Gemini / Generative AI β for SQL and natural-language generation
π Project Flow Summary
User Query β Semantic Search (Vector DB) β SQL Query Generation (LLM) β SQL Execution on Dataset β Natural Language Answer (LLM) β Final Answer + Source Link