Spaces:
Configuration error
Configuration error
| # Long-Context Document Semantic Analysis System | |
| This intelligent AI system analyzes long documents to automatically detect duplicates, contradictions, and inconsistencies using state-of-the-art Natural Language Processing (NLP) techniques. | |
| ## Features | |
| - **Duplicate Detection**: Identifies semantically identical or near-identical text segments using SBERT embeddings and FAISS vector search. | |
| - **Contradiction Detection**: Uses a Cross-Encoder Natural Language Inference (NLI) model to flag logically conflicting statements. | |
| - **Holistic Analysis**: Processes multiple documents (PDF, TXT) to find inconsistencies across the entire corpus. | |
| - **Evidence-Based Reporting**: Generates a downloadable Markdown report with source references and confidence scores. | |
| ## Architecture | |
| 1. **Document Processing**: Extracts text from PDFs/TXTs and chunks it into overlapping segments. | |
| 2. **Embedding Generation**: `sentence-transformers/all-MiniLM-L6-v2` maps chunks to dense vector space. | |
| 3. **Similarity Search**: `FAISS` efficiently finds potential duplicate candidates. | |
| 4. **Logical Inference**: `cross-encoder/nli-distilroberta-base` verifies logical relationships (Contradiction/Entailment) between similar chunks. | |
| ## Installation | |
| 1. **Create a Virtual Environment** (Recommended): | |
| ```bash | |
| python3 -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| 2. **Install Dependencies**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| *Note: PyTorch installation might take a few minutes.* | |
| ## Usage | |
| 1. **Start the Application**: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| OR using the venv directly: | |
| ```bash | |
| ./venv/bin/streamlit run app.py | |
| ``` | |
| 2. **Navigate to the UI**: | |
| Open your browser at `http://localhost:8501`. | |
| 3. **Analyze**: | |
| - Upload PDF or TXT files via the sidebar. | |
| - Click "Analyze Documents". | |
| - View results on the dashboard and download the report. | |
| ## Verification | |
| To verify the core logic without the UI: | |
| ```bash | |
| ./venv/bin/python verify_backend.py | |
| ``` | |
| This generates sample contradictory documents and checks if the system flags them correctly. | |