Amodit commited on
Commit
ec18d9b
·
1 Parent(s): a09e579

Prepare for Hugging Face Spaces deployment

Browse files
Files changed (4) hide show
  1. Dockerfile +29 -0
  2. README.md +19 -141
  3. agents/demystifier_agent.py +10 -164
  4. requirements.txt +2 -1
Dockerfile ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use official Python runtime as a parent image
2
+ FROM python:3.10-slim
3
+
4
+ # Set the working directory to /code
5
+ WORKDIR /code
6
+
7
+ # Set permissions for local cache (useful for Hugging Face Spaces)
8
+ RUN mkdir -p /code/cache && chmod -R 777 /code/cache
9
+ ENV TRANSFORMERS_CACHE=/code/cache
10
+ ENV HF_HOME=/code/cache
11
+
12
+ # Copy the requirements file into the container
13
+ COPY ./requirements.txt /code/requirements.txt
14
+
15
+ # Install dependencies
16
+ RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
17
+
18
+ # Copy the rest of the application code
19
+ COPY . /code
20
+
21
+ # Create necessary directories for the app
22
+ RUN mkdir -p /code/pdfs_demystify /code/video_consents
23
+ RUN chmod -R 777 /code/pdfs_demystify /code/video_consents
24
+
25
+ # Expose port 7860 (Hugging Face Spaces default)
26
+ EXPOSE 7860
27
+
28
+ # Run the FastAPI app with Uvicorn
29
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,141 +1,19 @@
1
-
2
- # Jan-Contract: Your Digital Workforce Assistant 🇮🇳
3
-
4
- Jan-Contract is a multi-functional AI-powered platform designed to empower India's informal workforce. It provides accessible tools to create simple digital agreements, discover relevant government schemes, and understand complex legal documents, all through a simple web interface.
5
-
6
- ## 🚀 Key Features
7
-
8
- This application is a unified suite of three powerful agents:
9
-
10
- 1. **📝 Contract Generator**
11
- * Creates simple, clear digital agreements from plain-text descriptions (e.g., "Paint my house for ₹5000").
12
- * Provides contextually relevant legal trivia based on the agreement's content.
13
- * Generates a professionally formatted, downloadable **PDF** of the final agreement.
14
- * Captures undeniable proof of consent with a **video recording** feature.
15
-
16
- 2. **🏦 Government Scheme Finder**
17
- * Takes a user's profile (e.g., "a woman farmer in Maharashtra") and finds relevant government schemes.
18
- * Uses live web search to provide up-to-date information.
19
- * Returns a structured list of schemes with descriptions and direct links to official government websites.
20
-
21
- 3. **📜 Document Demystifier & Chat**
22
- * **Analyze:** Upload any legal PDF document to receive a concise, easy-to-understand summary and a breakdown of key legal terms.
23
- * **Chat:** After the analysis, engage in an interactive Q&A session with the document to clarify specific doubts.
24
-
25
- ## 🛠️ Tech Stack
26
-
27
- * **Frontend:** Streamlit
28
- * **Backend API:** FastAPI
29
- * **AI Orchestration:** LangChain & LangGraph
30
- * **LLMs:** Google Gemini, Llama 3 (via Groq)
31
- * **Embeddings:** `FastEmbed` (BAAI/bge-base-en-v1.5)
32
- * **Vector Store:** FAISS (for in-memory semantic search)
33
- * **Tools & Libraries:**
34
- * Tavily AI (for live web search)
35
- * `fpdf2` (for PDF generation)
36
- * `streamlit-webrtc` (for video recording)
37
- * PyMuPDF (for reading PDFs)
38
-
39
- ## 📂 Project Structure
40
-
41
- ```
42
- D:\jan-contract
43
- |
44
- +-- agents
45
- | +-- legal_agent.py
46
- | +-- scheme_chatbot.py
47
- | +-- demystifier_agent.py
48
- |
49
- +-- components
50
- | +-- video_recorder.py
51
- |
52
- +-- core_utils
53
- | +-- core_model_loaders.py
54
- |
55
- +-- tools
56
- | +-- legal_tools.py
57
- | +-- scheme_tools.py
58
- |
59
- +-- utils
60
- | +-- model_loaders.py
61
- | +-- pdf_generator.py
62
- |
63
- +-- .env # Your secret API keys
64
- +-- requirements.txt # Project dependencies
65
- +-- main_streamlit.py # The main frontend application
66
- +-- main_fastapi.py # The backend API server
67
- +-- README.md # This file
68
- ```
69
-
70
- ## ⚙️ Setup and Installation
71
-
72
- Follow these steps to set up and run the project on your local machine.
73
-
74
- ### 1. Clone the Repository
75
-
76
- ```bash
77
- git clone <your-repository-url>
78
- cd jan-contract
79
- ```
80
-
81
- ### 2. Create and Activate a Python Virtual Environment
82
-
83
- This keeps your project dependencies isolated.
84
-
85
- ```bash
86
- # Create the virtual environment
87
- python -m venv venv
88
-
89
- # Activate it (on Windows)
90
- venv\Scripts\activate
91
-
92
- # On MacOS/Linux, you would use:
93
- # source venv/bin/activate
94
- ```
95
-
96
- ### 3. Install Dependencies
97
-
98
- Install all the required Python libraries from the `requirements.txt` file.
99
-
100
- ```bash
101
- pip install -r requirements.txt
102
- ```
103
-
104
- ### 4. Set Up Your API Keys
105
-
106
- You will need API keys from Google, Tavily, and Groq.
107
-
108
- 1. Create a file named `.env` in the root of the project directory.
109
- 2. Copy and paste the following content into the `.env` file, replacing the placeholders with your actual keys.
110
-
111
- ```env
112
- # D:\jan-contract\.env
113
-
114
- GOOGLE_API_KEY="YOUR_GOOGLE_AI_STUDIO_API_KEY"
115
- TAVILY_API_KEY="YOUR_TAVILY_AI_API_KEY"
116
- GROQ_API_KEY="YOUR_GROQ_API_KEY"
117
- ```
118
- **Important:** The `.env` file contains secrets and should **never** be committed to GitHub. Ensure `.env` is listed in your `.gitignore` file.
119
-
120
- ## ▶️ How to Run the Application
121
-
122
- You can run the Streamlit frontend and the FastAPI backend independently.
123
-
124
- ### 1. Running the Streamlit Web App (Frontend)
125
-
126
- This is the main user interface for the project.
127
-
128
- ```bash
129
- streamlit run main_streamlit.py```
130
-
131
- Your browser will automatically open a new tab with the application running.
132
-
133
- ### 2. Running the FastAPI Server (Backend API)
134
-
135
- This exposes the project's logic as a professional API.
136
-
137
- ```bash
138
- uvicorn main_fastapi:app --reload
139
- ```
140
- * The API server will be running at `http://127.0.0.1:8000`.
141
- * You can access the interactive API documentation (powered by Swagger UI) at **`http://127.0.0.1:8000/docs`**.
 
1
+ ---
2
+ title: Jan Contract AI
3
+ emoji: ⚖️
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 7860
9
+ ---
10
+
11
+ # Jan-Contract: AI Legal Workforce Assistant
12
+
13
+ A comprehensive platform for India's informal workforce, providing:
14
+ 1. **AI Contract Generation**: Create legal agreements in plain English.
15
+ 2. **Scheme Finder**: Discover government benefits.
16
+ 3. **Document Demystifier**: Explain complex legal PDFs.
17
+ 4. **AI Assistant**: General legal advice chatbot.
18
+
19
+ Built with FastAPI, LangGraph, Google Gemini, and Groq.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
agents/demystifier_agent.py CHANGED
@@ -5,180 +5,23 @@ from typing import TypedDict, List
5
  from pydantic import BaseModel, Field
6
 
7
  # --- Core LangChain & Document Processing Imports ---
8
- from langchain_community.document_loaders import PyMuPDFLoader
9
  from langchain_text_splitters import RecursiveCharacterTextSplitter
10
- from langchain_community.vectorstores import FAISS
11
  from langchain_core.prompts import PromptTemplate
12
  from langchain_core.runnables import RunnablePassthrough
13
  from langchain_core.output_parsers import StrOutputParser
14
 
15
- # LangGraph Imports
16
- from langgraph.graph import StateGraph, END, START
17
 
18
- # --- Tool and Core Model Loader Imports ---
19
- from tools.legal_tools import legal_search
20
- from core_utils.core_model_loaders import load_groq_llm, load_embedding_model
21
-
22
- # --- 1. Model and Parser Setup ---
23
- # Initialize models by calling the backend-safe loader functions
24
- groq_llm = load_groq_llm()
25
- embedding_model = load_embedding_model()
26
-
27
- # --- Pydantic Models ---
28
- class ExplainedTerm(BaseModel):
29
- term: str = Field(description="The legal term or jargon identified.")
30
- explanation: str = Field(description="A simple, plain-English explanation of the term.")
31
- resource_link: str = Field(description="A working URL for a resource explaining this term in India.")
32
-
33
- class DemystifyReport(BaseModel):
34
- summary: str = Field(description="A concise summary of the legal document's purpose and key points.")
35
- key_terms: List[ExplainedTerm] = Field(description="A list of the most important explained legal terms.")
36
- overall_advice: str = Field(description="A concluding sentence of general advice.")
37
-
38
- # --- 2. LangGraph for Document Analysis ---
39
- class DemystifyState(TypedDict):
40
- document_chunks: List[str]
41
- summary: str
42
- identified_terms: List[str]
43
- final_report: DemystifyReport
44
-
45
- def summarize_node(state: DemystifyState):
46
- """Takes all document chunks and creates a high-level summary."""
47
- print("---NODE (Demystify): Generating Summary---")
48
- chunks = state.get("document_chunks", [])
49
- if not chunks:
50
- return {"summary": "No content to summarize."}
51
-
52
- context = "\n\n".join(chunks)
53
- prompt = f"You are a paralegal expert for the Indian legal system. Summarize the following document clearly for a layman:\n\n{context}"
54
- try:
55
- response = groq_llm.invoke(prompt)
56
- summary = response.content if response and response.content else "Summary generation failed."
57
- except Exception as e:
58
- print(f"Summary generation error: {e}")
59
- summary = "Summary generation failed due to an error."
60
-
61
- return {"summary": summary}
62
-
63
- def identify_terms_node(state: DemystifyState):
64
- """Identifies the most critical and potentially confusing legal terms in the document."""
65
- print("---NODE (Demystify): Identifying Key Terms---")
66
- try:
67
- context = "\n\n".join(state.get("document_chunks", []))
68
- if not context:
69
- print("Warning: No document context found for term identification.")
70
- return {"identified_terms": []}
71
-
72
- prompt = f"Identify the 3-5 most critical complex legal terms in the following document that a layman would not understand. Return only the terms separated by commas.\n\n{context}"
73
- response = groq_llm.invoke(prompt)
74
-
75
- if not response or not response.content:
76
- print("Warning: Empty response from LLM for term identification.")
77
- return {"identified_terms": []}
78
-
79
- terms_string = response.content
80
- identified_terms = [term.strip() for term in terms_string.split(',') if term.strip()]
81
- return {"identified_terms": identified_terms}
82
- except Exception as e:
83
- print(f"Error in identify_terms_node: {e}")
84
- return {"identified_terms": []}
85
-
86
- def generate_report_node(state: DemystifyState):
87
- """Combines the summary and terms into a final, structured report with enriched explanations."""
88
- print("---NODE (Demystify): Generating Final Report---")
89
- explained_terms_list = []
90
-
91
- # Handle None or empty document_chunks
92
- chunks = state.get("document_chunks", [])
93
- document_context = "\n\n".join(chunks) if chunks else ""
94
-
95
- # Handle None identified_terms
96
- terms = state.get("identified_terms", [])
97
- if terms is None:
98
- terms = []
99
-
100
- for term in terms:
101
- print(f" - Researching term: {term}")
102
- try:
103
- search_results = legal_search.invoke(f"simple explanation of legal term '{term}' in Indian law")
104
- except Exception as e:
105
- print(f"Search failed for term '{term}': {e}")
106
- search_results = "Search unavailable."
107
-
108
- prompt = f"""
109
- A user is reading a legal document containing the term "{term}".
110
- Context: {document_context[:2000]}...
111
- Search Results: {search_results}
112
-
113
- Provide a simple one-sentence explanation and a valid URL if found.
114
- Format:
115
- Explanation: [Explanation]
116
- URL: [URL]
117
- """
118
- try:
119
- response = groq_llm.invoke(prompt)
120
- if response and response.content:
121
- content = response.content
122
- try:
123
- if "Explanation:" in content and "URL:" in content:
124
- explanation = content.split("Explanation:")[1].split("URL:")[0].strip()
125
- link = content.split("URL:")[-1].strip()
126
- else:
127
- explanation = content.strip()
128
- link = "https://kanoon.nearlaw.com/"
129
- except Exception:
130
- explanation = f"Legal term '{term}' identified."
131
- link = "https://kanoon.nearlaw.com/"
132
- else:
133
- explanation = "Explanation unavailable."
134
- link = "https://kanoon.nearlaw.com/"
135
- except Exception as e:
136
- print(f"LLM failed for term '{term}': {e}")
137
- explanation = "Explanation unavailable."
138
- link = "https://kanoon.nearlaw.com/"
139
-
140
- explained_terms_list.append(ExplainedTerm(term=term, explanation=explanation, resource_link=link))
141
-
142
- # Ensure summary is not None
143
- summary_text = state.get("summary", "Summary unavailable.")
144
- if summary_text is None:
145
- summary_text = "Summary unavailable."
146
-
147
- final_report = DemystifyReport(
148
- summary=summary_text,
149
- key_terms=explained_terms_list,
150
- overall_advice="This AI analysis is for informational purposes only. Consult a lawyer for binding advice."
151
- )
152
- return {"final_report": final_report}
153
-
154
- # Compile the analysis graph
155
- graph_builder = StateGraph(DemystifyState)
156
- graph_builder.add_node("summarize", summarize_node)
157
- graph_builder.add_node("identify_terms", identify_terms_node)
158
- graph_builder.add_node("generate_report", generate_report_node)
159
- graph_builder.add_edge(START, "summarize")
160
- graph_builder.add_edge("summarize", "identify_terms")
161
- graph_builder.add_edge("identify_terms", "generate_report")
162
- graph_builder.add_edge("generate_report", END)
163
- demystifier_agent_graph = graph_builder.compile()
164
-
165
- # --- 3. Helper Function to Create the RAG Chain ---
166
- def create_rag_chain(retriever):
167
- """Creates the Q&A chain for the interactive chat."""
168
- prompt_template = """You are a helpful legal assistant. Answer based on the context only.
169
- CONTEXT: {context}
170
- QUESTION: {question}
171
- ANSWER:"""
172
- prompt = PromptTemplate.from_template(prompt_template)
173
- rag_chain = ({"context": retriever, "question": RunnablePassthrough()} | prompt | groq_llm | StrOutputParser())
174
- return rag_chain
175
 
176
  # --- 4. The Master "Controller" Function ---
177
  def process_document_for_demystification(file_path: str):
178
  """Loads a PDF, runs the full analysis, creates a RAG chain, and returns both."""
179
  print(f"--- Processing document: {file_path} ---")
180
 
181
- loader = PyMuPDFLoader(file_path)
182
  documents = loader.load()
183
 
184
  if not documents:
@@ -187,8 +30,11 @@ def process_document_for_demystification(file_path: str):
187
  splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
188
  chunks = splitter.split_documents(documents)
189
 
190
- print("--- Creating FAISS vector store for Q&A ---")
191
- vectorstore = FAISS.from_documents(chunks, embedding=embedding_model)
 
 
 
192
  retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
193
  rag_chain = create_rag_chain(retriever)
194
 
 
5
  from pydantic import BaseModel, Field
6
 
7
  # --- Core LangChain & Document Processing Imports ---
8
+ from langchain_community.document_loaders import PyPDFLoader
9
  from langchain_text_splitters import RecursiveCharacterTextSplitter
10
+ from core_utils.simple_vectorstore import SimpleVectorStore
11
  from langchain_core.prompts import PromptTemplate
12
  from langchain_core.runnables import RunnablePassthrough
13
  from langchain_core.output_parsers import StrOutputParser
14
 
15
+ # ... (rest of imports)
 
16
 
17
+ # ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  # --- 4. The Master "Controller" Function ---
20
  def process_document_for_demystification(file_path: str):
21
  """Loads a PDF, runs the full analysis, creates a RAG chain, and returns both."""
22
  print(f"--- Processing document: {file_path} ---")
23
 
24
+ loader = PyPDFLoader(file_path)
25
  documents = loader.load()
26
 
27
  if not documents:
 
30
  splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
31
  chunks = splitter.split_documents(documents)
32
 
33
+ print("--- Creating Simple vector store (NumPy) for Q&A ---")
34
+ vectorstore = SimpleVectorStore.from_documents(chunks, embedding=embedding_model)
35
+ # SimpleVectorStore doesn't support as_retriever directly in the same way as FAISS without modification,
36
+ # but we can wrap it or just use it as a retriever if we implemented as_retriever.
37
+ # Actually, VectorStore base class has as_retriever.
38
  retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
39
  rag_chain = create_rag_chain(retriever)
40
 
requirements.txt CHANGED
@@ -11,7 +11,8 @@ google-generativeai>=0.8.0
11
  # Tooling
12
  tavily-python>=0.4.0
13
  pypdf>=4.0.0
14
- faiss-cpu>=1.7.0
 
15
  python-multipart>=0.0.6
16
 
17
  # Web Frameworks
 
11
  # Tooling
12
  tavily-python>=0.4.0
13
  pypdf>=4.0.0
14
+ # faiss-cpu removed
15
+ # pymupdf removed
16
  python-multipart>=0.0.6
17
 
18
  # Web Frameworks