check
Browse files
README.md
CHANGED
|
@@ -1,180 +1,157 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
```plaintext
|
| 16 |
.
|
| 17 |
-
├── Dockerfile #
|
| 18 |
-
├── README.md
|
| 19 |
-
├── fly.toml
|
| 20 |
-
├── main.py
|
| 21 |
-
├── models
|
| 22 |
-
│ ├── XLM-RoBERTa.ipynb
|
| 23 |
-
│ ├── mBERT.ipynb
|
| 24 |
-
│ ├── push_to_HF.py
|
| 25 |
-
│ └── train
|
| 26 |
-
├── requirements.txt
|
| 27 |
-
├── static
|
| 28 |
-
│ ├── app.js
|
| 29 |
-
│ └── style.css
|
| 30 |
-
└── templates
|
| 31 |
-
└── index.html
|
| 32 |
```
|
| 33 |
|
| 34 |
-
##
|
| 35 |
-
|
| 36 |
-
- **Dataset**: [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
|
| 37 |
-
- **mBERT Model**: [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner)
|
| 38 |
-
- **XLM-RoBERTa Model**: [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner)
|
| 39 |
-
- **XLM-RoBERTa Large Model**: [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
| 40 |
-
- **Azeri-Turkish-BERT-NER**: [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner)
|
| 41 |
|
|
|
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
##
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|-------|---------------|----------------|-----------|----------|----------|----------|
|
| 53 |
-
| 1 | 0.295200 | 0.265711 | 0.715424 | 0.622853 | 0.665937 | 0.919136 |
|
| 54 |
-
| 2 | 0.248600 | 0.252083 | 0.721036 | 0.637979 | 0.676970 | 0.921439 |
|
| 55 |
-
| 3 | 0.206800 | 0.253372 | 0.704872 | 0.650684 | 0.676695 | 0.920898 |
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|-------|---------------|----------------|-----------|----------|----------|
|
| 61 |
-
| 1 | 0.323100 | 0.275503 | 0.775799 | 0.694886 | 0.733117 |
|
| 62 |
-
| 2 | 0.272500 | 0.262481 | 0.739266 | 0.739900 | 0.739583 |
|
| 63 |
-
| 3 | 0.248600 | 0.252498 | 0.751478 | 0.741152 | 0.746280 |
|
| 64 |
-
| 4 | 0.236800 | 0.249968 | 0.754882 | 0.741449 | 0.748105 |
|
| 65 |
-
| 5 | 0.223800 | 0.252187 | 0.764390 | 0.740460 | 0.752235 |
|
| 66 |
-
| 6 | 0.218600 | 0.249887 | 0.756352 | 0.741646 | 0.748927 |
|
| 67 |
-
| 7 | 0.209700 | 0.250748 | 0.760696 | 0.739438 | 0.749916 |
|
| 68 |
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|-------|---------------|----------------|-----------|----------|----------|
|
| 73 |
-
| 1 | 0.407500 | 0.253823 | 0.768923 | 0.721350 | 0.744377 |
|
| 74 |
-
| 2 | 0.255600 | 0.249694 | 0.783549 | 0.724464 | 0.752849 |
|
| 75 |
-
| 3 | 0.214400 | 0.248773 | 0.750857 | 0.748900 | 0.749877 |
|
| 76 |
-
| 4 | 0.193400 | 0.257051 | 0.768623 | 0.740371 | 0.754232 |
|
| 77 |
-
| 5 | 0.169800 | 0.275679 | 0.745789 | 0.753740 | 0.749743 |
|
| 78 |
-
| 6 | 0.152600 | 0.288074 | 0.783131 | 0.728423 | 0.754787 |
|
| 79 |
-
| 7 | 0.144300 | 0.303378 | 0.758504 | 0.738069 | 0.748147 |
|
| 80 |
-
| 8 | 0.126800 | 0.311300 | 0.745589 | 0.750863 | 0.748217 |
|
| 81 |
-
| 9 | 0.119400 | 0.331631 | 0.739316 | 0.749475 | 0.744361 |
|
| 82 |
-
| 10 | 0.109400 | 0.344823 | 0.754268 | 0.737189 | 0.745631 |
|
| 83 |
-
| 11 | 0.102900 | 0.354887 | 0.751948 | 0.741285 | 0.746578 |
|
| 84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
-
### Azeri-Turkish-BERT
|
| 87 |
|
| 88 |
-
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1
|
| 89 |
-
|-------|---------------|-----------------|-----------|--------|-------|
|
| 90 |
-
| 1
|
| 91 |
-
|
|
| 92 |
-
|
|
| 93 |
-
|
|
| 94 |
-
| 5 | 0.214800 | 0.278477 | 0.756051 | 0.710996 | 0.732832 |
|
| 95 |
-
| 6 | 0.199200 | 0.286102 | 0.755068 | 0.717012 | 0.735548 |
|
| 96 |
-
| 7 | 0.192800 | 0.297157 | 0.742326 | 0.725802 | 0.733971 |
|
| 97 |
-
| 8 | 0.178900 | 0.304510 | 0.743206 | 0.723930 | 0.733442 |
|
| 98 |
-
| 9 | 0.171700 | 0.313845 | 0.743145 | 0.725535 | 0.734234 |
|
| 99 |
|
|
|
|
| 100 |
|
| 101 |
-
##
|
| 102 |
|
| 103 |
-
1. **Clone the repository**
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
2. **Create and activate a virtual environment**:
|
| 110 |
-
```bash
|
| 111 |
-
python3 -m venv .venv
|
| 112 |
-
source .venv/bin/activate
|
| 113 |
-
|
| 114 |
-
# On Windows use: .venv\Scripts\activate
|
| 115 |
-
```
|
| 116 |
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
|
|
|
| 126 |
|
| 127 |
-
|
| 128 |
-
|
|
|
|
| 129 |
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
-
|
| 133 |
|
| 134 |
-
|
| 135 |
-
If you haven't already, install the Fly.io CLI:
|
| 136 |
```bash
|
|
|
|
| 137 |
curl -L https://fly.io/install.sh | sh
|
| 138 |
```
|
| 139 |
|
| 140 |
-
|
| 141 |
-
Log in to your Fly.io account:
|
| 142 |
```bash
|
|
|
|
| 143 |
fly auth login
|
| 144 |
-
```
|
| 145 |
|
| 146 |
-
#
|
| 147 |
-
Run the following command in the root directory of your project:
|
| 148 |
-
```bash
|
| 149 |
fly launch
|
| 150 |
-
```
|
| 151 |
-
During the launch process:
|
| 152 |
-
- Fly will ask you for a unique app name.
|
| 153 |
-
- It will detect your `Dockerfile` automatically.
|
| 154 |
-
- Accept default region recommendations or specify your preferred region.
|
| 155 |
|
| 156 |
-
#
|
| 157 |
-
Increase memory allocation for running the model. For example, to set the memory to 2 GB:
|
| 158 |
-
```bash
|
| 159 |
fly scale memory 2048
|
| 160 |
```
|
| 161 |
|
| 162 |
-
|
| 163 |
-
Once configured, deploy the app with:
|
| 164 |
```bash
|
| 165 |
fly deploy
|
| 166 |
-
```
|
| 167 |
|
| 168 |
-
#
|
| 169 |
-
To check logs and ensure the app is running correctly:
|
| 170 |
-
```bash
|
| 171 |
fly logs
|
| 172 |
```
|
| 173 |
|
| 174 |
-
Access your deployed app at the Fly.io-provided URL (e.g., `https://your-app-name.fly.dev`).
|
| 175 |
-
|
| 176 |
## Usage
|
| 177 |
|
| 178 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
|
| 180 |
-
This
|
|
|
|
| 1 |
+
# Named Entity Recognition for Azerbaijani Language
|
| 2 |
|
| 3 |
+
A custom Named Entity Recognition (NER) model specifically designed for the Azerbaijani language. This project includes a FastAPI application for model deployment and a user-friendly frontend interface for testing and visualizing NER results.
|
| 4 |
|
| 5 |
+
## Demo
|
| 6 |
|
| 7 |
+
Try the live demo: [Named Entity Recognition Demo](https://named-entity-recognition.fly.dev/)
|
| 8 |
|
| 9 |
+
**Note:** The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup.
|
| 10 |
|
| 11 |
+
## Project Structure
|
| 12 |
|
| 13 |
+
```
|
|
|
|
|
|
|
| 14 |
.
|
| 15 |
+
├── Dockerfile # Docker image configuration
|
| 16 |
+
├── README.md # Project documentation
|
| 17 |
+
├── fly.toml # Fly.io deployment configuration
|
| 18 |
+
├── main.py # FastAPI application entry point
|
| 19 |
+
├── models/ # Model-related files
|
| 20 |
+
│ ├── XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook
|
| 21 |
+
│ ├── mBERT.ipynb # mBERT training notebook
|
| 22 |
+
│ ├── push_to_HF.py # Hugging Face upload script
|
| 23 |
+
│ └── train.parquet # Training data
|
| 24 |
+
├── requirements.txt # Python dependencies
|
| 25 |
+
├── static/ # Frontend assets
|
| 26 |
+
│ ├── app.js # Frontend logic
|
| 27 |
+
│ └── style.css # UI styling
|
| 28 |
+
└── templates/ # HTML templates
|
| 29 |
+
└── index.html # Main UI template
|
| 30 |
```
|
| 31 |
|
| 32 |
+
## Models & Dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
### Available Models
|
| 35 |
|
| 36 |
+
- [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner)
|
| 37 |
+
- [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner)
|
| 38 |
+
- [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
| 39 |
+
- [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner)
|
| 40 |
|
| 41 |
+
### Dataset
|
| 42 |
+
- [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
|
| 43 |
|
| 44 |
+
**Note:** All models were fine-tuned on an A100 GPU using Google Colab Pro+. The XLM-RoBERTa base model is currently deployed in production.
|
| 45 |
|
| 46 |
+
## Model Performance
|
| 47 |
|
| 48 |
+
### mBERT Performance
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|
| 51 |
+
|-------|---------------|-----------------|-----------|---------|-------|-----------|
|
| 52 |
+
| 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 |
|
| 53 |
+
| 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 |
|
| 54 |
+
| 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 |
|
| 55 |
|
| 56 |
+
### XLM-RoBERTa Base Performance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|
| 59 |
+
|-------|---------------|-----------------|-----------|---------|-------|
|
| 60 |
+
| 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 |
|
| 61 |
+
| 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 |
|
| 62 |
+
| 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 |
|
| 63 |
+
| 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 |
|
| 64 |
|
| 65 |
+
### XLM-RoBERTa Large Performance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
+
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|
| 68 |
+
|-------|---------------|-----------------|-----------|---------|-------|
|
| 69 |
+
| 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 |
|
| 70 |
+
| 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 |
|
| 71 |
+
| 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 |
|
| 72 |
+
| 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 |
|
| 73 |
|
| 74 |
+
### Azeri-Turkish-BERT Performance
|
| 75 |
|
| 76 |
+
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|
| 77 |
+
|-------|---------------|-----------------|-----------|---------|-------|
|
| 78 |
+
| 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 |
|
| 79 |
+
| 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 |
|
| 80 |
+
| 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 |
|
| 81 |
+
| 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
## Setup Instructions
|
| 84 |
|
| 85 |
+
### Local Development
|
| 86 |
|
| 87 |
+
1. **Clone the repository**
|
| 88 |
+
```bash
|
| 89 |
+
git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git
|
| 90 |
+
cd Named_Entity_Recognition
|
| 91 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
+
2. **Set up Python environment**
|
| 94 |
+
```bash
|
| 95 |
+
# Create virtual environment
|
| 96 |
+
python -m venv .venv
|
| 97 |
|
| 98 |
+
# Activate virtual environment
|
| 99 |
+
# On Unix/macOS:
|
| 100 |
+
source .venv/bin/activate
|
| 101 |
+
# On Windows:
|
| 102 |
+
.venv\Scripts\activate
|
| 103 |
|
| 104 |
+
# Install dependencies
|
| 105 |
+
pip install -r requirements.txt
|
| 106 |
+
```
|
| 107 |
|
| 108 |
+
3. **Run the application**
|
| 109 |
+
```bash
|
| 110 |
+
uvicorn main:app --host 0.0.0.0 --port 8080
|
| 111 |
+
```
|
| 112 |
|
| 113 |
+
### Fly.io Deployment
|
| 114 |
|
| 115 |
+
1. **Install Fly CLI**
|
|
|
|
| 116 |
```bash
|
| 117 |
+
# On Unix/macOS
|
| 118 |
curl -L https://fly.io/install.sh | sh
|
| 119 |
```
|
| 120 |
|
| 121 |
+
2. **Configure deployment**
|
|
|
|
| 122 |
```bash
|
| 123 |
+
# Login to Fly.io
|
| 124 |
fly auth login
|
|
|
|
| 125 |
|
| 126 |
+
# Initialize app
|
|
|
|
|
|
|
| 127 |
fly launch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
+
# Configure memory (minimum 2GB recommended)
|
|
|
|
|
|
|
| 130 |
fly scale memory 2048
|
| 131 |
```
|
| 132 |
|
| 133 |
+
3. **Deploy application**
|
|
|
|
| 134 |
```bash
|
| 135 |
fly deploy
|
|
|
|
| 136 |
|
| 137 |
+
# Monitor deployment
|
|
|
|
|
|
|
| 138 |
fly logs
|
| 139 |
```
|
| 140 |
|
|
|
|
|
|
|
| 141 |
## Usage
|
| 142 |
|
| 143 |
+
1. Access the application:
|
| 144 |
+
- Local: http://localhost:8080
|
| 145 |
+
- Production: https://named-entity-recognition.fly.dev
|
| 146 |
+
|
| 147 |
+
2. Enter Azerbaijani text in the input field
|
| 148 |
+
3. Click "Process" to view the named entities
|
| 149 |
+
4. Results will display recognized entities highlighted in different colors
|
| 150 |
+
|
| 151 |
+
## Contributing
|
| 152 |
+
|
| 153 |
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
| 154 |
+
|
| 155 |
+
## License
|
| 156 |
|
| 157 |
+
This project is open source and available under the MIT License.
|