Spaces:
Configuration error
Configuration error
File size: 2,129 Bytes
253246d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | # Long-Context Document Semantic Analysis System
This intelligent AI system analyzes long documents to automatically detect duplicates, contradictions, and inconsistencies using state-of-the-art Natural Language Processing (NLP) techniques.
## Features
- **Duplicate Detection**: Identifies semantically identical or near-identical text segments using SBERT embeddings and FAISS vector search.
- **Contradiction Detection**: Uses a Cross-Encoder Natural Language Inference (NLI) model to flag logically conflicting statements.
- **Holistic Analysis**: Processes multiple documents (PDF, TXT) to find inconsistencies across the entire corpus.
- **Evidence-Based Reporting**: Generates a downloadable Markdown report with source references and confidence scores.
## Architecture
1. **Document Processing**: Extracts text from PDFs/TXTs and chunks it into overlapping segments.
2. **Embedding Generation**: `sentence-transformers/all-MiniLM-L6-v2` maps chunks to dense vector space.
3. **Similarity Search**: `FAISS` efficiently finds potential duplicate candidates.
4. **Logical Inference**: `cross-encoder/nli-distilroberta-base` verifies logical relationships (Contradiction/Entailment) between similar chunks.
## Installation
1. **Create a Virtual Environment** (Recommended):
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
*Note: PyTorch installation might take a few minutes.*
## Usage
1. **Start the Application**:
```bash
streamlit run app.py
```
OR using the venv directly:
```bash
./venv/bin/streamlit run app.py
```
2. **Navigate to the UI**:
Open your browser at `http://localhost:8501`.
3. **Analyze**:
- Upload PDF or TXT files via the sidebar.
- Click "Analyze Documents".
- View results on the dashboard and download the report.
## Verification
To verify the core logic without the UI:
```bash
./venv/bin/python verify_backend.py
```
This generates sample contradictory documents and checks if the system flags them correctly.
|