File size: 3,156 Bytes
ff024d2
 
 
 
 
 
 
 
 
 
 
ae91091
 
 
 
3773a26
ae91091
 
3773a26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae91091
 
3773a26
 
 
 
ae91091
3773a26
ae91091
3773a26
ae91091
3773a26
 
 
 
ae91091
3773a26
 
 
 
ae91091
 
 
 
 
 
 
 
 
 
3773a26
 
ae91091
 
 
 
 
 
3773a26
 
 
 
 
 
 
 
 
ae91091
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

---
title: NLP Intelligence
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# NLP Intelligence β€” Social Monitoring Web Application

Hexagonal (Ports & Adapters) architecture for Mongolian social media content analysis.

## Repository Structure

```
NLP-intelligence/
β”œβ”€β”€ nlp_core/              # Domain Core β€” NER, sentiment, topic modeling, preprocessing (pure Python)
β”œβ”€β”€ adapters/
β”‚   β”œβ”€β”€ api/               # FastAPI REST adapter (routers, schemas, services)
β”‚   β”œβ”€β”€ ner_mongolian/     # Fine-tuned NER model config/tokenizer (weights on HF Hub)
β”‚   └── sumbee/            # Future Sumbee.mn integration
β”œβ”€β”€ frontend/              # Next.js dashboard & admin panel
β”œβ”€β”€ Data/                  # Training data & reference datasets (NOT used at runtime)
β”‚   β”œβ”€β”€ data/              # CoNLL-format training/validation/test files (v1 pipeline)
β”‚   β”œβ”€β”€ datav2/            # JSONL character-offset training data + scripts (v2 pipeline)
β”‚   └── NER-dataset/       # Reference data (locations.json, abbreviations, names)
β”œβ”€β”€ eval/                  # Model evaluation scripts
β”œβ”€β”€ Dockerfile             # Multi-stage production build
β”œβ”€β”€ nginx.conf             # Reverse proxy config (port 7860)
β”œβ”€β”€ start.sh               # Docker entrypoint
└── requirements.txt
```

**Production code:** `nlp_core/`, `adapters/api/`, `frontend/` β€” included in Docker image.
**ML development:** `Data/`, `eval/` β€” excluded from Docker. See [Data/README.md](Data/README.md) for details.

## Model

The NER model is hosted on HuggingFace Hub: `Nomio4640/ner-mongolian`. It is downloaded automatically during Docker build and at runtime (if not cached locally). Model weights are NOT stored in git.

To version a new model after training:
```bash
git tag model-v1.0 -m "F1: 0.XX, trained on train_final.conll"
```

## Quick Start

### Local Development

```bash
# Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd adapters/api
PYTHONPATH=../../ uvicorn main:app --reload --host 0.0.0.0 --port 8000
```

API docs: http://localhost:8000/docs

```bash
# Frontend
cd frontend
npm install
npm run dev
```

Dashboard: http://localhost:3000

### Docker

```bash
docker build -t nlp-intelligence .
docker run -p 7860:7860 nlp-intelligence
```

App: http://localhost:7860

### Usage

1. Open http://localhost:3000
2. Upload a CSV file with a `text` or `Text` column
3. View NER, sentiment, and network analysis results
4. Go to `/admin` to manage the knowledge base, labels, and stopwords

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| POST | /api/upload | Upload CSV for analysis |
| POST | /api/analyze | Analyze single text |
| POST | /api/analyze/batch | Analyze batch of texts |
| POST | /api/network | Get network graph data |
| POST | /api/insights | Get analysis insights |
| GET/POST | /api/admin/knowledge | Knowledge base CRUD |
| GET/POST | /api/admin/labels | Custom label mapping |
| GET/POST/DELETE | /api/admin/stopwords | Stopword management |