Spaces:

CaffeinatedCoding
/

nyayasetu

Running

App Files Files Community

nyayasetu / README.md

CaffeinatedCoding

Upload folder using huggingface_hub

70b94cb verified 12 days ago

preview code

raw

history blame contribute delete

19.8 kB

	---
	title: NyayaSetu
	emoji: ⚖️
	colorFrom: indigo
	colorTo: blue
	sdk: docker
	pinned: false
	---

	# NyayaSetu — Indian Legal RAG Agent

	Ask questions about Indian Supreme Court judgments (1950–2024).

	Live API: POST `/query` with `{"query": "your legal question"}`

	> Not legal advice. Always consult a qualified advocate.


	# NyayaSetu — Indian Legal RAG Agent

	> Retrieval-Augmented Generation over 26,688 Supreme Court of India judgments (1950–2024).
	> Ask a legal question. Get a cited answer grounded in real case law.
	> 1,025,764 chunks indexed (SC judgments, HC judgments, bare acts, constitution, legal references)
	> V2 agent with 3-pass reasoning loop and conversation memory

	[![Live Demo](https://img.shields.io/badge/🤗%20HuggingFace-Live%20Demo-blue)](https://huggingface.co/spaces/CaffeinatedCoding/nyayasetu)
	[![GitHub Actions](https://github.com/devangmishra1424/nyayasetu/actions/workflows/ci.yml/badge.svg)](https://github.com/devangmishra1424/nyayasetu/actions)
	![Python](https://img.shields.io/badge/python-3.11-blue)
	![Version](https://img.shields.io/badge/version-1.0-green)

	---

	> NOT legal advice. This is a portfolio project. Always consult a qualified advocate.

	---

	## What It Does

	A user types a legal question. The system:

	1. Runs Named Entity Recognition (fine-tuned DistilBERT) to extract legal entities — judges, statutes, provisions, case numbers
	2. Augments the query with extracted entities and embeds it using MiniLM (384-dim)
	3. Searches a FAISS index of 443,598 judgment chunks for the most relevant excerpts
	4. Assembles 1024-token context windows from the parent judgments around each matched chunk
	5. Makes a single LLM call (Groq — Llama-3.3-70b) with a strict "answer only from provided excerpts" prompt
	6. Runs deterministic citation verification — checks whether quoted phrases in the answer appear verbatim in the retrieved context

	---

	## Architecture

	```
	User Query
	│
	▼
	┌─────────────────────────────────────────┐
	│ NER Layer (DistilBERT fine-tuned) │
	│ Extracts: JUDGE, COURT, STATUTE, │
	│ PROVISION, CASE_NUMBER, DATE │
	└──────────────────┬──────────────────────┘
	│ augmented query
	▼
	┌─────────────────────────────────────────┐
	│ Embedding Layer (MiniLM-L6-v2) │
	│ 384-dim sentence embedding │
	└──────────────────┬──────────────────────┘
	│ query vector
	▼
	┌─────────────────────────────────────────┐
	│ FAISS Retrieval (IndexFlatL2) │
	│ 443,598 chunks — 26,688 SC judgments │
	│ Memory-mapped — index never fully │
	│ loaded into RAM │
	└──────────────────┬──────────────────────┘
	│ top-5 chunks + parent context
	▼
	┌─────────────────────────────────────────┐
	│ LLM Generation (Groq — Llama-3.3-70b) │
	│ Single call, strict grounding prompt │
	│ Gemini as fallback │
	└──────────────────┬──────────────────────┘
	│ answer
	▼
	┌─────────────────────────────────────────┐
	│ Citation Verification (deterministic) │
	│ Verified ✓ / ⚠ Unverified │
	└──────────────────┬──────────────────────┘
	│
	▼
	JSON Response
	```

	Deployment: Docker container on HuggingFace Spaces (port 7860). Models downloaded from HF Hub at startup — not bundled in the image.

	---

	## Technical Decisions

	Why no LangChain?
	I built the chunking pipeline, FAISS retrieval, agent loop, and citation verification from scratch in plain Python. This means I can debug each component independently and explain exactly what each one does. I know what LangChain abstracts because I built what it abstracts. I am fully prepared to use LangChain or LangGraph in a team setting.

	Why DistilBERT for NER?
	DistilBERT is 40% smaller and 60% faster than BERT with 97% of its performance. For a token classification task like NER, this tradeoff is correct — the speed matters at inference time and the accuracy loss is negligible for legal entity types.

	Why FAISS IndexFlatL2?
	Exact nearest neighbour search over 443,598 vectors. Approximate methods (HNSW, IVF) trade accuracy for speed — unnecessary at this corpus size. Memory mapping keeps the 650MB index off RAM until a query needs it.

	Why MiniLM for embeddings?
	`all-MiniLM-L6-v2` is designed specifically for semantic similarity tasks. 384 dimensions gives a good balance between retrieval quality and index size. Runs entirely on CPU — no GPU dependency at inference time.

	Why a single LLM call per query?
	Multi-step chains add latency, introduce more failure points, and make hallucination harder to trace. One call with a strict grounding prompt is simpler, faster, and easier to debug. The citation verifier is the safety layer, not a second LLM call.

	Why deterministic citation verification?
	NLI-based verification requires loading a second model (~500MB) and adds ~300ms latency per query. For a portfolio project on a free tier, deterministic substring matching after normalisation gives 80% of the value at 0% of the cost. The limitation (paraphrases pass as verified) is documented.

	Why parent document retrieval?
	Chunks are 256 tokens — good for retrieval precision. But 256 tokens is often mid-sentence with no surrounding context. The LLM needs more. The system retrieves a 1024-token window centred on each matched chunk from the full parent judgment, giving the LLM enough context to answer correctly.

	---

	## Performance

	\| Metric \| Value \|
	\|---\|---\|
	\| NER F1 (overall) \| 0.777 \|
	\| Index size \| 443,598 chunks from 26,688 judgments \|
	\| FAISS index size on disk \| ~650MB \|
	\| Embedding dimensions \| 384 \|
	\| Typical query latency \| 1,000–1,800ms \|
	\| LLM \| Groq Llama-3.3-70b-versatile \|
	\| Deployment \| HuggingFace Spaces, CPU only, free tier \|

	Latency breakdown: ~5ms FAISS search, ~50ms NER + embedding, ~900–1500ms Groq API call, ~10ms citation verification.

	---

	## Live Query Examples

	Health check:
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/health"

	status service version
	------ ------- -------
	ok NyayaSetu 1.0.0
	```

	---

	Query: Fundamental rights under the Indian Constitution
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
	-Method POST -ContentType "application/json" `
	-Body '{"query": "What are the fundamental rights guaranteed under the Indian Constitution?"}'

	query : What are the fundamental rights guaranteed under the Indian Constitution?
	answer : The fundamental rights guaranteed under the Indian Constitution are divided
	into seven categories:
	"right to equality - arts. 14 to 18;
	right to freedom - arts. 19 to 22;
	right against exploitation - arts. 23 and 24;
	right to freedom of religion arts. 25 to 28;
	cultural and educational rights arts. 29 and 30;
	right to property - arts. 31, 31 a and 31b;
	and right to constitutional remedies arts. 32 to 35" (SC_1958_9972).
	These fundamental rights are "still reserved to the people after the
	delegation of rights by the people to the institutions of government"
	(SC_1958_9972).
	The Constitution "confirms their existence and gives them protection"
	(SC_2017_2363).

	NOTE: This is not legal advice. Consult a qualified advocate.

	sources : SC_2017_2363 (Justice K S Puttaswamy Retd And Anr vs Union Of India, 2017)
	SC_1958_9972 (Basheshar Nath vs The Commissioner Of Income Tax Delhi, 1958)
	SC_1992_25797 (Life Insurance Corpn Of India vs Prof Manubhai D Shah, 1992)
	SC_1962_10537 (Prem Chand Garg vs Excise Commissioner U P Allahabad, 1962)
	verification_status : Unverified
	entities : STATUTE
	num_sources : 5
	truncated : False
	latency_ms : 1768.34
	```

	---

	Query: Right to privacy
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
	-Method POST -ContentType "application/json" `
	-Body '{"query": "What is the right to privacy in India and how did the Supreme Court rule on it?"}'

	query : What is the right to privacy in India and how did the Supreme Court rule on it?
	answer : The right to privacy in India is "not absolute" and is "subject to certain
	reasonable restrictions on the basis of compelling social, moral and public
	interest" as stated in Justice K S Puttaswamy Retd And Anr vs Union Of India
	And Ors (ID: SC_2017_2363). According to the same judgment, "the right to
	privacy has been implied in articles 19 (1) (a) and (d) and article 21" of
	the Constitution.

	As noted in Distt Registrar Collector vs Canara Bank Etc (ID: SC_2004_4562),
	"the right to privacy has been widely accepted as implied in our constitution"
	and is "the right to be let alone".

	The Supreme Court has ruled that the right to privacy is a fundamental right
	emanating from Article 21 of the Constitution, as stated in Justice K S
	Puttaswamy Retd And Anr vs Union Of India And Ors (ID: SC_2017_2363).

	NOTE: This is not legal advice. Consult a qualified advocate.

	sources : SC_2017_2363 (Justice K S Puttaswamy Retd And Anr vs Union Of India, 2017)
	SC_2018_24210 (Justice K S Puttaswamy Retd vs Union Of India, 2018)
	SC_2004_4562 (Distt Registrar Collector vs Canara Bank Etc, 2004)
	verification_status : Unverified
	entities : GPE, COURT
	num_sources : 5
	truncated : False
	latency_ms : 1051.71
	```

	---

	Query: Doctrine of proportionality
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
	-Method POST -ContentType "application/json" `
	-Body '{"query": "What is the doctrine of proportionality and how is it applied in fundamental rights cases?"}'

	query : What is the doctrine of proportionality and how is it applied in
	fundamental rights cases?
	answer : The doctrine of proportionality is a principle that guides the limitation of
	fundamental rights. As stated in Anuradha Bhasin vs Union Of India
	(ID: SC_2020_1572), "the proportionality principle, can be easily summarized
	by lord diplock's aphorism — you must not use a steam hammer to crack a nut,
	if a nutcracker would do?"

	According to Justice K S Puttaswamy Retd vs Union Of India (ID: SC_2018_24210),
	the proportionality test involves four stages: "a legitimate goal stage";
	"a suitability or rational connection stage"; "a necessity stage"; and
	"a balancing stage".

	In Modern Dental College Res Cen Ors vs State Of Madhya Pradesh Ors
	(ID: SC_2016_19144), "when a law limits a constitutional right, such a
	limitation is constitutional if it is proportional".

	NOTE: This is not legal advice. Consult a qualified advocate.

	sources : SC_2020_1572 (Anuradha Bhasin vs Union Of India, 2020)
	SC_2018_24210 (Justice K S Puttaswamy Retd vs Union Of India, 2018)
	SC_2016_19144 (Modern Dental College Res Cen vs State Of Madhya Pradesh, 2016)
	SC_2023_16817 (Ramesh Chandra Sharma vs The State Of Uttar Pradesh, 2023)
	verification_status : Unverified
	entities : (none extracted)
	num_sources : 5
	truncated : False
	latency_ms : 1511.71
	```

	---

	Validation — query too short (fails fast, model never called):
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
	-Method POST -ContentType "application/json" `
	-Body '{"query": "help"}'

	Invoke-RestMethod : {"detail":"Query too short — minimum 10 characters"}
	StatusCode : 400
	```

	---

	Out-of-domain query — LLM correctly refuses:
	```
	PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
	-Method POST -ContentType "application/json" `
	-Body '{"query": "Who won the IPL cricket tournament this year?"}'

	answer : The provided Supreme Court judgment excerpts do not contain any information
	about the IPL cricket tournament or its winners. The excerpts appear to be
	court judgments with case information, judge names, and dates, but they do
	not mention the IPL or any related topics.
	verification_status : No verifiable claims
	entities : ORG
	num_sources : 5
	latency_ms : 571.68
	```

	---

	## API

	POST /query
	```json
	{
	"query": "What is the doctrine of proportionality in fundamental rights cases?"
	}
	```

	Response:
	```json
	{
	"query": "...",
	"answer": "The doctrine of proportionality... (SC_2018_24210)",
	"sources": [
	{
	"judgment_id": "SC_2018_24210",
	"title": "Justice K S Puttaswamy Retd vs Union Of India",
	"year": "2018",
	"similarity_score": 0.689,
	"excerpt": "..."
	}
	],
	"verification_status": "Verified",
	"unverified_quotes": [],
	"entities": {"COURT": ["Supreme Court"]},
	"num_sources": 5,
	"truncated": false,
	"latency_ms": 1511.71
	}
	```

	GET /health — `{"status": "ok", "service": "NyayaSetu", "version": "1.0.0"}`

	GET / — app info and endpoint list

	---

	## Project Structure

	```
	NyayaSetu/
	├── preprocessing/
	│ ├── clean.py ← text cleaning, OCR error fixing
	│ ├── chunk.py ← recursive splitter, 256 tokens, 50 overlap
	│ ├── embed.py ← MiniLM batch embedding
	│ └── build_index.py ← FAISS IndexFlatL2 construction
	├── src/
	│ ├── ner.py ← DistilBERT NER inference
	│ ├── retrieval.py ← FAISS search + parent context assembly
	│ ├── agent.py ← single-pass query pipeline
	│ ├── llm.py ← Groq API call + tenacity retry
	│ └── verify.py ← deterministic citation verification
	├── api/
	│ ├── main.py ← FastAPI, 3 endpoints, model download at startup
	│ └── schemas.py ← Pydantic request/response models
	├── tests/
	│ ├── test_retriever.py
	│ ├── test_agent.py
	│ ├── test_verify.py
	│ └── test_api.py
	├── .github/workflows/ci.yml ← pytest → lint → docker build → HF deploy → smoke test
	└── docker/Dockerfile


	```

	## V2 Agent Architecture

	Pass 1 — Analyse: LLM call to understand the message, detect tone/stage,
	build structured fact web, update hypotheses, form targeted FAISS queries.

	Pass 2 — Retrieve: Parallel FAISS search across 3 queries. No LLM call. ~5ms.

	Pass 3 — Respond: Dynamically assembled prompt based on tone, stage, and
	format needs + full case state + retrieved context.

	Conversation Memory: Each session maintains a compressed summary + structured
	fact web (parties, events, documents, amounts, hypotheses) updated every turn.

	---

	## Setup & Reproduction

	```bash
	git clone https://github.com/devangmishra1424/nyayasetu
	cd nyayasetu

	pip install -r requirements.txt

	# Set environment variables
	export GROQ_API_KEY=your_key_here
	export HF_TOKEN=your_token_here

	# Models (~2.7GB) download automatically from HF Hub at startup
	uvicorn api.main:app --host 0.0.0.0 --port 7860
	```

	---

	## Limitations

	Data scope: Supreme Court of India judgments only, 1950–2024. No High Court judgments, no legislation, no legal commentary.

	Citation verification: The verifier does exact substring matching after normalisation. LLM paraphrases pass as Verified even when the underlying claim is correct. Full paraphrase detection would require NLI inference — out of scope for v1.

	Out-of-domain queries: The similarity threshold blocks most irrelevant queries. Queries that share vocabulary with legal text may still pass through to the LLM, which will correctly report no relevant information found.

	Not a legal database: This system cannot be used as a substitute for Westlaw, SCC Online, or Indian Kanoon. It is a portfolio demonstration of RAG pipeline engineering.

	v1 — planned improvements:
	- Gradio frontend for non-technical users
	- MLflow experiment tracking for NER training runs
	- Evidently drift monitoring on query logs
	- High Court judgment coverage
	- Re-ranking layer (cross-encoder) between FAISS retrieval and LLM call

	---

	## Bug Log

	Bug 1 — `snapshot_download` with `allow_patterns` fetching 0 files
	The FAISS index files were uploaded to HuggingFace Hub under a `faiss_index/` subfolder. The `snapshot_download` call with `allow_patterns="faiss_index/*"` returned 0 files — it couldn't match the pattern against the subfolder structure. Fixed by switching to `hf_hub_download` with explicit `filename` paths per file. Lesson: `snapshot_download` pattern matching behaves differently for nested paths than expected.

	Bug 2 — L2 distance threshold logic inverted
	The similarity threshold in `retrieval.py` used `if best_score < SIMILARITY_THRESHOLD: return []`. This is correct for cosine similarity (higher = better) but wrong for L2 distance (lower = better). The condition was blocking good legal queries and letting through out-of-domain queries. Fixed by flipping to `if best_score > SIMILARITY_THRESHOLD` and setting threshold to 0.85. Lesson: always verify which direction your distance metric runs before writing threshold logic.

	Bug 3 — `api/__init__.py` contained a shell command
	The `api/__init__.py` file contained `echo ""` — a leftover from a PowerShell command accidentally piped into the file. Python threw a syntax error at startup. Fixed by overwriting with an empty string. Lesson: on Windows, `echo "" > file` writes the shell command into the file. Use `"" \| Out-File -FilePath file -Encoding utf8` instead.