Spaces:
Running
Running
File size: 19,758 Bytes
f37d3b7 e4526f9 b2d0640 e4526f9 ae22613 e4526f9 ae22613 e4526f9 ae22613 0214972 6372870 e4526f9 70b94cb e4526f9 70b94cb e4526f9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | ---
title: NyayaSetu
emoji: ⚖️
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---
# NyayaSetu — Indian Legal RAG Agent
Ask questions about Indian Supreme Court judgments (1950–2024).
**Live API:** POST `/query` with `{"query": "your legal question"}`
> Not legal advice. Always consult a qualified advocate.
# NyayaSetu — Indian Legal RAG Agent
> Retrieval-Augmented Generation over 26,688 Supreme Court of India judgments (1950–2024).
> Ask a legal question. Get a cited answer grounded in real case law.
> 1,025,764 chunks indexed (SC judgments, HC judgments, bare acts, constitution, legal references)
> V2 agent with 3-pass reasoning loop and conversation memory
[](https://huggingface.co/spaces/CaffeinatedCoding/nyayasetu)
[](https://github.com/devangmishra1424/nyayasetu/actions)


---
> **NOT legal advice.** This is a portfolio project. Always consult a qualified advocate.
---
## What It Does
A user types a legal question. The system:
1. Runs **Named Entity Recognition** (fine-tuned DistilBERT) to extract legal entities — judges, statutes, provisions, case numbers
2. Augments the query with extracted entities and embeds it using **MiniLM** (384-dim)
3. Searches a **FAISS index** of 443,598 judgment chunks for the most relevant excerpts
4. Assembles **1024-token context windows** from the parent judgments around each matched chunk
5. Makes a **single LLM call** (Groq — Llama-3.3-70b) with a strict "answer only from provided excerpts" prompt
6. Runs **deterministic citation verification** — checks whether quoted phrases in the answer appear verbatim in the retrieved context
---
## Architecture
```
User Query
│
▼
┌─────────────────────────────────────────┐
│ NER Layer (DistilBERT fine-tuned) │
│ Extracts: JUDGE, COURT, STATUTE, │
│ PROVISION, CASE_NUMBER, DATE │
└──────────────────┬──────────────────────┘
│ augmented query
▼
┌─────────────────────────────────────────┐
│ Embedding Layer (MiniLM-L6-v2) │
│ 384-dim sentence embedding │
└──────────────────┬──────────────────────┘
│ query vector
▼
┌─────────────────────────────────────────┐
│ FAISS Retrieval (IndexFlatL2) │
│ 443,598 chunks — 26,688 SC judgments │
│ Memory-mapped — index never fully │
│ loaded into RAM │
└──────────────────┬──────────────────────┘
│ top-5 chunks + parent context
▼
┌─────────────────────────────────────────┐
│ LLM Generation (Groq — Llama-3.3-70b) │
│ Single call, strict grounding prompt │
│ Gemini as fallback │
└──────────────────┬──────────────────────┘
│ answer
▼
┌─────────────────────────────────────────┐
│ Citation Verification (deterministic) │
│ Verified ✓ / ⚠ Unverified │
└──────────────────┬──────────────────────┘
│
▼
JSON Response
```
**Deployment:** Docker container on HuggingFace Spaces (port 7860). Models downloaded from HF Hub at startup — not bundled in the image.
---
## Technical Decisions
**Why no LangChain?**
I built the chunking pipeline, FAISS retrieval, agent loop, and citation verification from scratch in plain Python. This means I can debug each component independently and explain exactly what each one does. I know what LangChain abstracts because I built what it abstracts. I am fully prepared to use LangChain or LangGraph in a team setting.
**Why DistilBERT for NER?**
DistilBERT is 40% smaller and 60% faster than BERT with 97% of its performance. For a token classification task like NER, this tradeoff is correct — the speed matters at inference time and the accuracy loss is negligible for legal entity types.
**Why FAISS IndexFlatL2?**
Exact nearest neighbour search over 443,598 vectors. Approximate methods (HNSW, IVF) trade accuracy for speed — unnecessary at this corpus size. Memory mapping keeps the 650MB index off RAM until a query needs it.
**Why MiniLM for embeddings?**
`all-MiniLM-L6-v2` is designed specifically for semantic similarity tasks. 384 dimensions gives a good balance between retrieval quality and index size. Runs entirely on CPU — no GPU dependency at inference time.
**Why a single LLM call per query?**
Multi-step chains add latency, introduce more failure points, and make hallucination harder to trace. One call with a strict grounding prompt is simpler, faster, and easier to debug. The citation verifier is the safety layer, not a second LLM call.
**Why deterministic citation verification?**
NLI-based verification requires loading a second model (~500MB) and adds ~300ms latency per query. For a portfolio project on a free tier, deterministic substring matching after normalisation gives 80% of the value at 0% of the cost. The limitation (paraphrases pass as verified) is documented.
**Why parent document retrieval?**
Chunks are 256 tokens — good for retrieval precision. But 256 tokens is often mid-sentence with no surrounding context. The LLM needs more. The system retrieves a 1024-token window centred on each matched chunk from the full parent judgment, giving the LLM enough context to answer correctly.
---
## Performance
| Metric | Value |
|---|---|
| NER F1 (overall) | 0.777 |
| Index size | 443,598 chunks from 26,688 judgments |
| FAISS index size on disk | ~650MB |
| Embedding dimensions | 384 |
| Typical query latency | 1,000–1,800ms |
| LLM | Groq Llama-3.3-70b-versatile |
| Deployment | HuggingFace Spaces, CPU only, free tier |
Latency breakdown: ~5ms FAISS search, ~50ms NER + embedding, ~900–1500ms Groq API call, ~10ms citation verification.
---
## Live Query Examples
**Health check:**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/health"
status service version
------ ------- -------
ok NyayaSetu 1.0.0
```
---
**Query: Fundamental rights under the Indian Constitution**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
-Method POST -ContentType "application/json" `
-Body '{"query": "What are the fundamental rights guaranteed under the Indian Constitution?"}'
query : What are the fundamental rights guaranteed under the Indian Constitution?
answer : The fundamental rights guaranteed under the Indian Constitution are divided
into seven categories:
"right to equality - arts. 14 to 18;
right to freedom - arts. 19 to 22;
right against exploitation - arts. 23 and 24;
right to freedom of religion arts. 25 to 28;
cultural and educational rights arts. 29 and 30;
right to property - arts. 31, 31 a and 31b;
and right to constitutional remedies arts. 32 to 35" (SC_1958_9972).
These fundamental rights are "still reserved to the people after the
delegation of rights by the people to the institutions of government"
(SC_1958_9972).
The Constitution "confirms their existence and gives them protection"
(SC_2017_2363).
NOTE: This is not legal advice. Consult a qualified advocate.
sources : SC_2017_2363 (Justice K S Puttaswamy Retd And Anr vs Union Of India, 2017)
SC_1958_9972 (Basheshar Nath vs The Commissioner Of Income Tax Delhi, 1958)
SC_1992_25797 (Life Insurance Corpn Of India vs Prof Manubhai D Shah, 1992)
SC_1962_10537 (Prem Chand Garg vs Excise Commissioner U P Allahabad, 1962)
verification_status : Unverified
entities : STATUTE
num_sources : 5
truncated : False
latency_ms : 1768.34
```
---
**Query: Right to privacy**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
-Method POST -ContentType "application/json" `
-Body '{"query": "What is the right to privacy in India and how did the Supreme Court rule on it?"}'
query : What is the right to privacy in India and how did the Supreme Court rule on it?
answer : The right to privacy in India is "not absolute" and is "subject to certain
reasonable restrictions on the basis of compelling social, moral and public
interest" as stated in Justice K S Puttaswamy Retd And Anr vs Union Of India
And Ors (ID: SC_2017_2363). According to the same judgment, "the right to
privacy has been implied in articles 19 (1) (a) and (d) and article 21" of
the Constitution.
As noted in Distt Registrar Collector vs Canara Bank Etc (ID: SC_2004_4562),
"the right to privacy has been widely accepted as implied in our constitution"
and is "the right to be let alone".
The Supreme Court has ruled that the right to privacy is a fundamental right
emanating from Article 21 of the Constitution, as stated in Justice K S
Puttaswamy Retd And Anr vs Union Of India And Ors (ID: SC_2017_2363).
NOTE: This is not legal advice. Consult a qualified advocate.
sources : SC_2017_2363 (Justice K S Puttaswamy Retd And Anr vs Union Of India, 2017)
SC_2018_24210 (Justice K S Puttaswamy Retd vs Union Of India, 2018)
SC_2004_4562 (Distt Registrar Collector vs Canara Bank Etc, 2004)
verification_status : Unverified
entities : GPE, COURT
num_sources : 5
truncated : False
latency_ms : 1051.71
```
---
**Query: Doctrine of proportionality**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
-Method POST -ContentType "application/json" `
-Body '{"query": "What is the doctrine of proportionality and how is it applied in fundamental rights cases?"}'
query : What is the doctrine of proportionality and how is it applied in
fundamental rights cases?
answer : The doctrine of proportionality is a principle that guides the limitation of
fundamental rights. As stated in Anuradha Bhasin vs Union Of India
(ID: SC_2020_1572), "the proportionality principle, can be easily summarized
by lord diplock's aphorism — you must not use a steam hammer to crack a nut,
if a nutcracker would do?"
According to Justice K S Puttaswamy Retd vs Union Of India (ID: SC_2018_24210),
the proportionality test involves four stages: "a legitimate goal stage";
"a suitability or rational connection stage"; "a necessity stage"; and
"a balancing stage".
In Modern Dental College Res Cen Ors vs State Of Madhya Pradesh Ors
(ID: SC_2016_19144), "when a law limits a constitutional right, such a
limitation is constitutional if it is proportional".
NOTE: This is not legal advice. Consult a qualified advocate.
sources : SC_2020_1572 (Anuradha Bhasin vs Union Of India, 2020)
SC_2018_24210 (Justice K S Puttaswamy Retd vs Union Of India, 2018)
SC_2016_19144 (Modern Dental College Res Cen vs State Of Madhya Pradesh, 2016)
SC_2023_16817 (Ramesh Chandra Sharma vs The State Of Uttar Pradesh, 2023)
verification_status : Unverified
entities : (none extracted)
num_sources : 5
truncated : False
latency_ms : 1511.71
```
---
**Validation — query too short (fails fast, model never called):**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
-Method POST -ContentType "application/json" `
-Body '{"query": "help"}'
Invoke-RestMethod : {"detail":"Query too short — minimum 10 characters"}
StatusCode : 400
```
---
**Out-of-domain query — LLM correctly refuses:**
```
PS> Invoke-RestMethod -Uri "https://caffeinatedcoding-nyayasetu.hf.space/query" `
-Method POST -ContentType "application/json" `
-Body '{"query": "Who won the IPL cricket tournament this year?"}'
answer : The provided Supreme Court judgment excerpts do not contain any information
about the IPL cricket tournament or its winners. The excerpts appear to be
court judgments with case information, judge names, and dates, but they do
not mention the IPL or any related topics.
verification_status : No verifiable claims
entities : ORG
num_sources : 5
latency_ms : 571.68
```
---
## API
**POST /query**
```json
{
"query": "What is the doctrine of proportionality in fundamental rights cases?"
}
```
Response:
```json
{
"query": "...",
"answer": "The doctrine of proportionality... (SC_2018_24210)",
"sources": [
{
"judgment_id": "SC_2018_24210",
"title": "Justice K S Puttaswamy Retd vs Union Of India",
"year": "2018",
"similarity_score": 0.689,
"excerpt": "..."
}
],
"verification_status": "Verified",
"unverified_quotes": [],
"entities": {"COURT": ["Supreme Court"]},
"num_sources": 5,
"truncated": false,
"latency_ms": 1511.71
}
```
**GET /health** — `{"status": "ok", "service": "NyayaSetu", "version": "1.0.0"}`
**GET /** — app info and endpoint list
---
## Project Structure
```
NyayaSetu/
├── preprocessing/
│ ├── clean.py ← text cleaning, OCR error fixing
│ ├── chunk.py ← recursive splitter, 256 tokens, 50 overlap
│ ├── embed.py ← MiniLM batch embedding
│ └── build_index.py ← FAISS IndexFlatL2 construction
├── src/
│ ├── ner.py ← DistilBERT NER inference
│ ├── retrieval.py ← FAISS search + parent context assembly
│ ├── agent.py ← single-pass query pipeline
│ ├── llm.py ← Groq API call + tenacity retry
│ └── verify.py ← deterministic citation verification
├── api/
│ ├── main.py ← FastAPI, 3 endpoints, model download at startup
│ └── schemas.py ← Pydantic request/response models
├── tests/
│ ├── test_retriever.py
│ ├── test_agent.py
│ ├── test_verify.py
│ └── test_api.py
├── .github/workflows/ci.yml ← pytest → lint → docker build → HF deploy → smoke test
└── docker/Dockerfile
```
## V2 Agent Architecture
**Pass 1 — Analyse:** LLM call to understand the message, detect tone/stage,
build structured fact web, update hypotheses, form targeted FAISS queries.
**Pass 2 — Retrieve:** Parallel FAISS search across 3 queries. No LLM call. ~5ms.
**Pass 3 — Respond:** Dynamically assembled prompt based on tone, stage, and
format needs + full case state + retrieved context.
**Conversation Memory:** Each session maintains a compressed summary + structured
fact web (parties, events, documents, amounts, hypotheses) updated every turn.
---
## Setup & Reproduction
```bash
git clone https://github.com/devangmishra1424/nyayasetu
cd nyayasetu
pip install -r requirements.txt
# Set environment variables
export GROQ_API_KEY=your_key_here
export HF_TOKEN=your_token_here
# Models (~2.7GB) download automatically from HF Hub at startup
uvicorn api.main:app --host 0.0.0.0 --port 7860
```
---
## Limitations
**Data scope:** Supreme Court of India judgments only, 1950–2024. No High Court judgments, no legislation, no legal commentary.
**Citation verification:** The verifier does exact substring matching after normalisation. LLM paraphrases pass as Verified even when the underlying claim is correct. Full paraphrase detection would require NLI inference — out of scope for v1.
**Out-of-domain queries:** The similarity threshold blocks most irrelevant queries. Queries that share vocabulary with legal text may still pass through to the LLM, which will correctly report no relevant information found.
**Not a legal database:** This system cannot be used as a substitute for Westlaw, SCC Online, or Indian Kanoon. It is a portfolio demonstration of RAG pipeline engineering.
**v1 — planned improvements:**
- Gradio frontend for non-technical users
- MLflow experiment tracking for NER training runs
- Evidently drift monitoring on query logs
- High Court judgment coverage
- Re-ranking layer (cross-encoder) between FAISS retrieval and LLM call
---
## Bug Log
**Bug 1 — `snapshot_download` with `allow_patterns` fetching 0 files**
The FAISS index files were uploaded to HuggingFace Hub under a `faiss_index/` subfolder. The `snapshot_download` call with `allow_patterns="faiss_index/*"` returned 0 files — it couldn't match the pattern against the subfolder structure. Fixed by switching to `hf_hub_download` with explicit `filename` paths per file. Lesson: `snapshot_download` pattern matching behaves differently for nested paths than expected.
**Bug 2 — L2 distance threshold logic inverted**
The similarity threshold in `retrieval.py` used `if best_score < SIMILARITY_THRESHOLD: return []`. This is correct for cosine similarity (higher = better) but wrong for L2 distance (lower = better). The condition was blocking good legal queries and letting through out-of-domain queries. Fixed by flipping to `if best_score > SIMILARITY_THRESHOLD` and setting threshold to 0.85. Lesson: always verify which direction your distance metric runs before writing threshold logic.
**Bug 3 — `api/__init__.py` contained a shell command**
The `api/__init__.py` file contained `echo ""` — a leftover from a PowerShell command accidentally piped into the file. Python threw a syntax error at startup. Fixed by overwriting with an empty string. Lesson: on Windows, `echo "" > file` writes the shell command into the file. Use `"" | Out-File -FilePath file -Encoding utf8` instead.
|