Spaces:

Amodit
/

jan-contract

Running

App Files Files Community

Amodit commited on 25 days ago

Commit

ec18d9b

1 Parent(s): a09e579

Prepare for Hugging Face Spaces deployment

Browse files

Files changed (4) hide show

Dockerfile +29 -0
README.md +19 -141
agents/demystifier_agent.py +10 -164
requirements.txt +2 -1

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+# Use official Python runtime as a parent image
+FROM python:3.10-slim
+# Set the working directory to /code
+WORKDIR /code
+# Set permissions for local cache (useful for Hugging Face Spaces)
+RUN mkdir -p /code/cache && chmod -R 777 /code/cache
+ENV TRANSFORMERS_CACHE=/code/cache
+ENV HF_HOME=/code/cache
+# Copy the requirements file into the container
+COPY ./requirements.txt /code/requirements.txt
+# Install dependencies
+RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
+# Copy the rest of the application code
+COPY . /code
+# Create necessary directories for the app
+RUN mkdir -p /code/pdfs_demystify /code/video_consents
+RUN chmod -R 777 /code/pdfs_demystify /code/video_consents
+# Expose port 7860 (Hugging Face Spaces default)
+EXPOSE 7860
+# Run the FastAPI app with Uvicorn
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,141 +1,19 @@
-# Jan-Contract: Your Digital Workforce Assistant 🇮🇳
-Jan-Contract is a multi-functional AI-powered platform designed to empower India's informal workforce. It provides accessible tools to create simple digital agreements, discover relevant government schemes, and understand complex legal documents, all through a simple web interface.
-## 🚀 Key Features
-This application is a unified suite of three powerful agents:
-1.  **📝 Contract Generator**
-    *   Creates simple, clear digital agreements from plain-text descriptions (e.g., "Paint my house for ₹5000").
-    *   Provides contextually relevant legal trivia based on the agreement's content.
-    *   Generates a professionally formatted, downloadable **PDF** of the final agreement.
-    *   Captures undeniable proof of consent with a **video recording** feature.
-2.  **🏦 Government Scheme Finder**
-    *   Takes a user's profile (e.g., "a woman farmer in Maharashtra") and finds relevant government schemes.
-    *   Uses live web search to provide up-to-date information.
-    *   Returns a structured list of schemes with descriptions and direct links to official government websites.
-3.  **📜 Document Demystifier & Chat**
-    *   **Analyze:** Upload any legal PDF document to receive a concise, easy-to-understand summary and a breakdown of key legal terms.
-    *   **Chat:** After the analysis, engage in an interactive Q&A session with the document to clarify specific doubts.
-## 🛠️ Tech Stack
-*   **Frontend:** Streamlit
-*   **Backend API:** FastAPI
-*   **AI Orchestration:** LangChain & LangGraph
-*   **LLMs:** Google Gemini, Llama 3 (via Groq)
-*   **Embeddings:** `FastEmbed` (BAAI/bge-base-en-v1.5)
-*   **Vector Store:** FAISS (for in-memory semantic search)
-*   **Tools & Libraries:**
-    *   Tavily AI (for live web search)
-    *   `fpdf2` (for PDF generation)
-    *   `streamlit-webrtc` (for video recording)
-    *   PyMuPDF (for reading PDFs)
-## 📂 Project Structure
-```
-D:\jan-contract
-|
-+-- agents
-|   +-- legal_agent.py
-|   +-- scheme_chatbot.py
-|   +-- demystifier_agent.py
-|
-+-- components
-|   +-- video_recorder.py
-|
-+-- core_utils
-|   +-- core_model_loaders.py
-|
-+-- tools
-|   +-- legal_tools.py
-|   +-- scheme_tools.py
-|
-+-- utils
-|   +-- model_loaders.py
-|   +-- pdf_generator.py
-|
-+-- .env               # Your secret API keys
-+-- requirements.txt   # Project dependencies
-+-- main_streamlit.py  # The main frontend application
-+-- main_fastapi.py    # The backend API server
-+-- README.md          # This file
-```
-## ⚙️ Setup and Installation
-Follow these steps to set up and run the project on your local machine.
-### 1. Clone the Repository
-```bash
-git clone <your-repository-url>
-cd jan-contract
-```
-### 2. Create and Activate a Python Virtual Environment
-This keeps your project dependencies isolated.
-```bash
-# Create the virtual environment
-python -m venv venv
-# Activate it (on Windows)
-venv\Scripts\activate
-# On MacOS/Linux, you would use:
-# source venv/bin/activate
-```
-### 3. Install Dependencies
-Install all the required Python libraries from the `requirements.txt` file.
-```bash
-pip install -r requirements.txt
-```
-### 4. Set Up Your API Keys
-You will need API keys from Google, Tavily, and Groq.
-1.  Create a file named `.env` in the root of the project directory.
-2.  Copy and paste the following content into the `.env` file, replacing the placeholders with your actual keys.
-```env
-# D:\jan-contract\.env
-GOOGLE_API_KEY="YOUR_GOOGLE_AI_STUDIO_API_KEY"
-TAVILY_API_KEY="YOUR_TAVILY_AI_API_KEY"
-GROQ_API_KEY="YOUR_GROQ_API_KEY"
-```
-**Important:** The `.env` file contains secrets and should **never** be committed to GitHub. Ensure `.env` is listed in your `.gitignore` file.
-## ▶️ How to Run the Application
-You can run the Streamlit frontend and the FastAPI backend independently.
-### 1. Running the Streamlit Web App (Frontend)
-This is the main user interface for the project.
-```bash
-streamlit run main_streamlit.py```
-Your browser will automatically open a new tab with the application running.
-### 2. Running the FastAPI Server (Backend API)
-This exposes the project's logic as a professional API.
-```bash
-uvicorn main_fastapi:app --reload
-```
-*   The API server will be running at `http://127.0.0.1:8000`.
-*   You can access the interactive API documentation (powered by Swagger UI) at **`http://127.0.0.1:8000/docs`**.

+---
+title: Jan Contract AI
+emoji: ⚖️
+colorFrom: indigo
+colorTo: blue
+sdk: docker
+pinned: false
+app_port: 7860
+---
+# Jan-Contract: AI Legal Workforce Assistant
+A comprehensive platform for India's informal workforce, providing:
+1.  **AI Contract Generation**: Create legal agreements in plain English.
+2.  **Scheme Finder**: Discover government benefits.
+3.  **Document Demystifier**: Explain complex legal PDFs.
+4.  **AI Assistant**: General legal advice chatbot.
+Built with FastAPI, LangGraph, Google Gemini, and Groq.

agents/demystifier_agent.py CHANGED Viewed

@@ -5,180 +5,23 @@ from typing import TypedDict, List
 from pydantic import BaseModel, Field
 # --- Core LangChain & Document Processing Imports ---
-from langchain_community.document_loaders import PyMuPDFLoader
 from langchain_text_splitters import RecursiveCharacterTextSplitter
-from langchain_community.vectorstores import FAISS
 from langchain_core.prompts import PromptTemplate
 from langchain_core.runnables import RunnablePassthrough
 from langchain_core.output_parsers import StrOutputParser
-# LangGraph Imports
-from langgraph.graph import StateGraph, END, START
-# --- Tool and Core Model Loader Imports ---
-from tools.legal_tools import legal_search
-from core_utils.core_model_loaders import load_groq_llm, load_embedding_model
-# --- 1. Model and Parser Setup ---
-# Initialize models by calling the backend-safe loader functions
-groq_llm = load_groq_llm()
-embedding_model = load_embedding_model()
-# --- Pydantic Models ---
-class ExplainedTerm(BaseModel):
-    term: str = Field(description="The legal term or jargon identified.")
-    explanation: str = Field(description="A simple, plain-English explanation of the term.")
-    resource_link: str = Field(description="A working URL for a resource explaining this term in India.")
-class DemystifyReport(BaseModel):
-    summary: str = Field(description="A concise summary of the legal document's purpose and key points.")
-    key_terms: List[ExplainedTerm] = Field(description="A list of the most important explained legal terms.")
-    overall_advice: str = Field(description="A concluding sentence of general advice.")
-# --- 2. LangGraph for Document Analysis ---
-class DemystifyState(TypedDict):
-    document_chunks: List[str]
-    summary: str
-    identified_terms: List[str]
-    final_report: DemystifyReport
-def summarize_node(state: DemystifyState):
-    """Takes all document chunks and creates a high-level summary."""
-    print("---NODE (Demystify): Generating Summary---")
-    chunks = state.get("document_chunks", [])
-    if not chunks:
-        return {"summary": "No content to summarize."}
-    context = "\n\n".join(chunks)
-    prompt = f"You are a paralegal expert for the Indian legal system. Summarize the following document clearly for a layman:\n\n{context}"
-    try:
-        response = groq_llm.invoke(prompt)
-        summary = response.content if response and response.content else "Summary generation failed."
-    except Exception as e:
-        print(f"Summary generation error: {e}")
-        summary = "Summary generation failed due to an error."
-    return {"summary": summary}
-def identify_terms_node(state: DemystifyState):
-    """Identifies the most critical and potentially confusing legal terms in the document."""
-    print("---NODE (Demystify): Identifying Key Terms---")
-    try:
-        context = "\n\n".join(state.get("document_chunks", []))
-        if not context:
-            print("Warning: No document context found for term identification.")
-            return {"identified_terms": []}
-        prompt = f"Identify the 3-5 most critical complex legal terms in the following document that a layman would not understand. Return only the terms separated by commas.\n\n{context}"
-        response = groq_llm.invoke(prompt)
-        if not response or not response.content:
-            print("Warning: Empty response from LLM for term identification.")
-            return {"identified_terms": []}
-        terms_string = response.content
-        identified_terms = [term.strip() for term in terms_string.split(',') if term.strip()]
-        return {"identified_terms": identified_terms}
-    except Exception as e:
-        print(f"Error in identify_terms_node: {e}")
-        return {"identified_terms": []}
-def generate_report_node(state: DemystifyState):
-    """Combines the summary and terms into a final, structured report with enriched explanations."""
-    print("---NODE (Demystify): Generating Final Report---")
-    explained_terms_list = []
-    # Handle None or empty document_chunks
-    chunks = state.get("document_chunks", [])
-    document_context = "\n\n".join(chunks) if chunks else ""
-    # Handle None identified_terms
-    terms = state.get("identified_terms", [])
-    if terms is None:
-        terms = []
-    for term in terms:
-        print(f"  - Researching term: {term}")
-        try:
-            search_results = legal_search.invoke(f"simple explanation of legal term '{term}' in Indian law")
-        except Exception as e:
-            print(f"Search failed for term '{term}': {e}")
-            search_results = "Search unavailable."
-        prompt = f"""
-        A user is reading a legal document containing the term "{term}".
-        Context: {document_context[:2000]}...
-        Search Results: {search_results}
-        Provide a simple one-sentence explanation and a valid URL if found.
-        Format:
-        Explanation: [Explanation]
-        URL: [URL]
-        """
-        try:
-            response = groq_llm.invoke(prompt)
-            if response and response.content:
-                content = response.content
-                try:
-                    if "Explanation:" in content and "URL:" in content:
-                        explanation = content.split("Explanation:")[1].split("URL:")[0].strip()
-                        link = content.split("URL:")[-1].strip()
-                    else:
-                        explanation = content.strip()
-                        link = "https://kanoon.nearlaw.com/"
-                except Exception:
-                    explanation = f"Legal term '{term}' identified."
-                    link = "https://kanoon.nearlaw.com/"
-            else:
-                 explanation = "Explanation unavailable."
-                 link = "https://kanoon.nearlaw.com/"
-        except Exception as e:
-            print(f"LLM failed for term '{term}': {e}")
-            explanation = "Explanation unavailable."
-            link = "https://kanoon.nearlaw.com/"
-        explained_terms_list.append(ExplainedTerm(term=term, explanation=explanation, resource_link=link))
-    # Ensure summary is not None
-    summary_text = state.get("summary", "Summary unavailable.")
-    if summary_text is None:
-        summary_text = "Summary unavailable."
-    final_report = DemystifyReport(
-        summary=summary_text,
-        key_terms=explained_terms_list,
-        overall_advice="This AI analysis is for informational purposes only. Consult a lawyer for binding advice."
-    )
-    return {"final_report": final_report}
-# Compile the analysis graph
-graph_builder = StateGraph(DemystifyState)
-graph_builder.add_node("summarize", summarize_node)
-graph_builder.add_node("identify_terms", identify_terms_node)
-graph_builder.add_node("generate_report", generate_report_node)
-graph_builder.add_edge(START, "summarize")
-graph_builder.add_edge("summarize", "identify_terms")
-graph_builder.add_edge("identify_terms", "generate_report")
-graph_builder.add_edge("generate_report", END)
-demystifier_agent_graph = graph_builder.compile()
-# --- 3. Helper Function to Create the RAG Chain ---
-def create_rag_chain(retriever):
-    """Creates the Q&A chain for the interactive chat."""
-    prompt_template = """You are a helpful legal assistant. Answer based on the context only.
-    CONTEXT: {context}
-    QUESTION: {question}
-    ANSWER:"""
-    prompt = PromptTemplate.from_template(prompt_template)
-    rag_chain = ({"context": retriever, "question": RunnablePassthrough()} | prompt | groq_llm | StrOutputParser())
-    return rag_chain
 # --- 4. The Master "Controller" Function ---
 def process_document_for_demystification(file_path: str):
     """Loads a PDF, runs the full analysis, creates a RAG chain, and returns both."""
     print(f"--- Processing document: {file_path} ---")
-    loader = PyMuPDFLoader(file_path)
     documents = loader.load()
     if not documents:
@@ -187,8 +30,11 @@ def process_document_for_demystification(file_path: str):
     splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
     chunks = splitter.split_documents(documents)
-    print("--- Creating FAISS vector store for Q&A ---")
-    vectorstore = FAISS.from_documents(chunks, embedding=embedding_model)
     retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
     rag_chain = create_rag_chain(retriever)

 from pydantic import BaseModel, Field
 # --- Core LangChain & Document Processing Imports ---
+from langchain_community.document_loaders import PyPDFLoader
 from langchain_text_splitters import RecursiveCharacterTextSplitter
+from core_utils.simple_vectorstore import SimpleVectorStore
 from langchain_core.prompts import PromptTemplate
 from langchain_core.runnables import RunnablePassthrough
 from langchain_core.output_parsers import StrOutputParser
+# ... (rest of imports)
+# ...
 # --- 4. The Master "Controller" Function ---
 def process_document_for_demystification(file_path: str):
     """Loads a PDF, runs the full analysis, creates a RAG chain, and returns both."""
     print(f"--- Processing document: {file_path} ---")
+    loader = PyPDFLoader(file_path)
     documents = loader.load()
     if not documents:
     splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
     chunks = splitter.split_documents(documents)
+    print("--- Creating Simple vector store (NumPy) for Q&A ---")
+    vectorstore = SimpleVectorStore.from_documents(chunks, embedding=embedding_model)
+    # SimpleVectorStore doesn't support as_retriever directly in the same way as FAISS without modification,
+    # but we can wrap it or just use it as a retriever if we implemented as_retriever.
+    # Actually, VectorStore base class has as_retriever.
     retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
     rag_chain = create_rag_chain(retriever)

requirements.txt CHANGED Viewed

@@ -11,7 +11,8 @@ google-generativeai>=0.8.0
 # Tooling
 tavily-python>=0.4.0
 pypdf>=4.0.0
-faiss-cpu>=1.7.0
 python-multipart>=0.0.6
 # Web Frameworks

 # Tooling
 tavily-python>=0.4.0
 pypdf>=4.0.0
+# faiss-cpu removed
+# pymupdf removed
 python-multipart>=0.0.6
 # Web Frameworks