File size: 2,898 Bytes
3625193
dd487ef
 
 
 
 
 
 
 
d60dbcf
 
 
 
3625193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155dd44
3625193
 
 
 
 
155dd44
 
 
3625193
 
 
 
 
155dd44
3625193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132


---
title: Semantic Integrity Analysis
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: streamlit
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
=======
# Semantic Integrity Analysis

Legal document analysis web app with authentication, upload, line-level issue detection, and final narrative summary.

## Current Architecture

- `backend/`: Flask API + SQLite auth + document analysis pipeline
- `frontend/`: Multi-page static UI
- `ui/`: Streamlit path (separate from current web flow)
- `analysis/`: Core analyzer logic

## Active User Flow

1. `index.html` -> Login / Sign up
2. `upload.html` -> Upload up to 2 reference files + final file, then run cross-verification analysis
3. `issues.html` -> Line-level issue analysis (duplication, inconsistency, contradiction)
4. `summary.html` ->
   - Detailed document summary (Page 1, Page 2, ... style)
   - Page-wise summary cards
   - Top findings
5. `dashboard.html` ->
   - Line error table (exact page/line)
   - Reference vs Final mismatch explanation + rectify action

## Features

- Auth endpoints (`register`, `login`) with SQLite
- Upload support: `PDF`, `DOCX`, `TXT`
- Cross verification: optional 1-2 reference documents + required final document
- Detection categories:
  - Duplication
  - Inconsistency
  - Contradiction
- Vendor/Vendee extraction
- Narrative `detailedSummary` + page summaries + line-level dashboard

## Backend Setup

```bash
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 app.py
```

Backend default: `http://127.0.0.1:5000`

## Frontend Setup

```bash
cd frontend
python3 -m http.server 8080
```

Open: `http://127.0.0.1:8080/index.html`

## API Endpoints

- `GET /api/health`
- `POST /api/register`
- `POST /api/login`
- `POST /api/analyze`

Alias routes also available:

- `GET /health`
- `POST /register`
- `POST /login`
- `POST /analyze`

## Analyze Response (important keys)

- `summary`
- `pageSummaries`
- `detailedSummary`
- `findings`
- `lineIssues`

## Deployment (GitHub + Render)

### 1) Push repository

```bash
git add .
git commit -m "Project setup and web flow"
git branch -M main
git remote add origin https://github.com/<your-username>/<your-repo>.git
git push -u origin main
```

### 2) Deploy backend on Render (Web Service)

- Root directory: `backend`
- Build command:

```bash
pip install -r requirements.txt
```

- Start command:

```bash
gunicorn app:app
```

### 3) Deploy frontend (static)

- Option A: Render Static Site (root `frontend`)
- Option B: GitHub Pages for `frontend/`

## Notes

- Current `frontend + backend` flow does **not** require `merged_tinyllama_instruction`.
- Streamlit path under `ui/` may use local TinyLlama model path.
- If analysis output changes are not visible, restart backend and re-run upload.
 (Create README.md)