Spaces:

Himanshu2003
/

Samarth

Sleeping

App Files Files Community

Samarth / README.md

Himanshu2003

Update README.md

4fa78cf verified 4 months ago

preview code

raw

history blame contribute delete

3.06 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

metadata

title: Samarth
emoji: 🧑‍💻
colorFrom: indigo
colorTo: purple
sdk: streamlit
pinned: false
short_description: 'Ask questions about Agriculture — Samarth gives answer from '
sdk_version: 1.51.0

📊 Samarth: Data-Aware Question Answering System

An AI-powered Question Answering System built using LangChain and Streamlit, designed to intelligently answer user queries based on multiple government data sources.

🧠 Project Overview

This system enables natural language querying over diverse datasets.
Each dataset is first analyzed to automatically generate structured metadata (including summary, columns, and use cases) using an LLM.
These metadata representations are embedded into a vector database for semantic similarity search.

When a user asks a question:

The system finds the most relevant dataset using semantic search.
It uses an LLM to generate an appropriate SQL query for that dataset.
The query is executed on the dataset, and the result is interpreted into a human-readable answer by another LLM.
If the dataset lacks relevant information, the system responds gracefully that no answer is available.

⚙️ How to Use

Run the Streamlit App

streamlit run app.py
Upload or choose your dataset(s)
- The app supports multiple tabular datasets (CSV).
Ask a natural language question Example:

“What was the average annual rainfall in Telangana in 2000?”
View the result
- The system automatically identifies the most relevant dataset.
- It generates, executes, and interprets a SQL query.
- You receive a concise, natural answer with a verified data source link.

🗂️ Adding New Datasets

To include new datasets in the system, follow these simple steps:

Download and place your dataset file
- Save the new dataset (CSV format) inside the /datasets folder.
Update the metadata generation script
- Open generate_metadata.py.
- Add the dataset details in the respective lists:
  - dataset_links → the dataset’s source link
  - dataset_names → a short descriptive name for the dataset
  - datasets_list → the filename of the dataset (as saved in the /datasets folder)
Generate metadata

Run the following command to generate structured metadata for all datasets:

python generate_metadata.py

Restart the Streamlit app

Once metadata is generated, rerun the app:

streamlit run app.py

🧩 Tech Stack

LangChain – for prompt orchestration and LLM integration
Sentence Transformers + FAISS – for vector similarity search
Streamlit – for interactive web UI
Google Gemini / Generative AI – for SQL and natural-language generation

📁 Project Flow Summary

User Query ↓ Semantic Search (Vector DB) ↓ SQL Query Generation (LLM) ↓ SQL Execution on Dataset ↓ Natural Language Answer (LLM) ↓ Final Answer + Source Link