IsmatS commited on
Commit
072ecf9
·
1 Parent(s): 5260e25
Files changed (1) hide show
  1. README.md +105 -128
README.md CHANGED
@@ -1,180 +1,157 @@
1
- # Named_Entity_Recognition
2
 
3
- ### Custom Named Entity Recognition (NER) Model for Azerbaijani Language
4
 
5
- This project provides a custom Named Entity Recognition (NER) model tailored for the Azerbaijani language. It includes a FastAPI application for deploying the model, as well as a frontend interface to test and view the NER results.
6
 
7
- ### Demo
8
 
9
- You can try out the deployed model here: [Named Entity Recognition Demo](https://named-entity-recognition.fly.dev/)
10
 
11
- **Note:** The server is hosted on a free tier, so it may take 1-2 minutes to wake up if it’s inactive when you access it. Please be patient as the server starts up.
12
 
13
- ## File Structure
14
-
15
- ```plaintext
16
  .
17
- ├── Dockerfile # Defines instructions for building the Docker image
18
- ├── README.md # Project overview, setup, and usage instructions
19
- ├── fly.toml # Configuration file for Fly.io deployment
20
- ├── main.py # Main FastAPI app file handling API endpoints and model loading
21
- ├── models # Contains model-related notebooks, scripts, and data
22
- │ ├── XLM-RoBERTa.ipynb # Notebook for XLM-RoBERTa model training/testing
23
- │ ├── mBERT.ipynb # Notebook for mBERT model training/testing
24
- │ ├── push_to_HF.py # Script to push model to Hugging Face hub
25
- │ └── train-00000-of-00001.parquet # Parquet file with model training/evaluation data
26
- ├── requirements.txt # Lists all Python dependencies for the project
27
- ├── static # Contains frontend assets (JavaScript, CSS)
28
- │ ├── app.js # JavaScript for handling frontend functionality
29
- │ └── style.css # CSS for styling the frontend interface
30
- └── templates # HTML templates for rendering the frontend interface
31
- └── index.html # Main HTML file for the user interface
32
  ```
33
 
34
- ## Data and Model Links
35
-
36
- - **Dataset**: [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
37
- - **mBERT Model**: [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner)
38
- - **XLM-RoBERTa Model**: [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner)
39
- - **XLM-RoBERTa Large Model**: [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
40
- - **Azeri-Turkish-BERT-NER**: [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner)
41
 
 
42
 
43
- All four models were fine-tuned on a premium A100 GPU in Google Colab for optimized training performance.
 
 
 
44
 
45
- **Note**: The XLM-RoBERTa base model was selected for deployment.
 
46
 
47
- ## Model Performance Metrics
48
 
49
- ### mBERT Model
50
 
51
- | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
52
- |-------|---------------|----------------|-----------|----------|----------|----------|
53
- | 1 | 0.295200 | 0.265711 | 0.715424 | 0.622853 | 0.665937 | 0.919136 |
54
- | 2 | 0.248600 | 0.252083 | 0.721036 | 0.637979 | 0.676970 | 0.921439 |
55
- | 3 | 0.206800 | 0.253372 | 0.704872 | 0.650684 | 0.676695 | 0.920898 |
56
 
57
- ### XLM-RoBERTa Base Model
 
 
 
 
58
 
59
- | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
60
- |-------|---------------|----------------|-----------|----------|----------|
61
- | 1 | 0.323100 | 0.275503 | 0.775799 | 0.694886 | 0.733117 |
62
- | 2 | 0.272500 | 0.262481 | 0.739266 | 0.739900 | 0.739583 |
63
- | 3 | 0.248600 | 0.252498 | 0.751478 | 0.741152 | 0.746280 |
64
- | 4 | 0.236800 | 0.249968 | 0.754882 | 0.741449 | 0.748105 |
65
- | 5 | 0.223800 | 0.252187 | 0.764390 | 0.740460 | 0.752235 |
66
- | 6 | 0.218600 | 0.249887 | 0.756352 | 0.741646 | 0.748927 |
67
- | 7 | 0.209700 | 0.250748 | 0.760696 | 0.739438 | 0.749916 |
68
 
69
- ### XLM-RoBERTa Large Model
 
 
 
 
 
70
 
71
- | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
72
- |-------|---------------|----------------|-----------|----------|----------|
73
- | 1 | 0.407500 | 0.253823 | 0.768923 | 0.721350 | 0.744377 |
74
- | 2 | 0.255600 | 0.249694 | 0.783549 | 0.724464 | 0.752849 |
75
- | 3 | 0.214400 | 0.248773 | 0.750857 | 0.748900 | 0.749877 |
76
- | 4 | 0.193400 | 0.257051 | 0.768623 | 0.740371 | 0.754232 |
77
- | 5 | 0.169800 | 0.275679 | 0.745789 | 0.753740 | 0.749743 |
78
- | 6 | 0.152600 | 0.288074 | 0.783131 | 0.728423 | 0.754787 |
79
- | 7 | 0.144300 | 0.303378 | 0.758504 | 0.738069 | 0.748147 |
80
- | 8 | 0.126800 | 0.311300 | 0.745589 | 0.750863 | 0.748217 |
81
- | 9 | 0.119400 | 0.331631 | 0.739316 | 0.749475 | 0.744361 |
82
- | 10 | 0.109400 | 0.344823 | 0.754268 | 0.737189 | 0.745631 |
83
- | 11 | 0.102900 | 0.354887 | 0.751948 | 0.741285 | 0.746578 |
84
 
 
 
 
 
 
 
85
 
86
- ### Azeri-Turkish-BERT-NER
87
 
88
- | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
89
- |-------|---------------|-----------------|-----------|--------|-------|
90
- | 1 | 0.433100 | 0.306711 | 0.739000 | 0.693282 | 0.715412 |
91
- | 2 | 0.292700 | 0.275796 | 0.781565 | 0.688937 | 0.732334 |
92
- | 3 | 0.250600 | 0.275115 | 0.758261 | 0.709425 | 0.733031 |
93
- | 4 | 0.233700 | 0.273087 | 0.756184 | 0.716277 | 0.735689 |
94
- | 5 | 0.214800 | 0.278477 | 0.756051 | 0.710996 | 0.732832 |
95
- | 6 | 0.199200 | 0.286102 | 0.755068 | 0.717012 | 0.735548 |
96
- | 7 | 0.192800 | 0.297157 | 0.742326 | 0.725802 | 0.733971 |
97
- | 8 | 0.178900 | 0.304510 | 0.743206 | 0.723930 | 0.733442 |
98
- | 9 | 0.171700 | 0.313845 | 0.743145 | 0.725535 | 0.734234 |
99
 
 
100
 
101
- ## Setup and Usage
102
 
103
- 1. **Clone the repository**:
104
- ```bash
105
- git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git
106
- cd named-entity-recognition
107
- ```
108
-
109
- 2. **Create and activate a virtual environment**:
110
- ```bash
111
- python3 -m venv .venv
112
- source .venv/bin/activate
113
-
114
- # On Windows use: .venv\Scripts\activate
115
- ```
116
 
117
- 3. Install dependencies:
118
- ```bash
119
- pip install -r requirements.txt
120
- ```
121
 
122
- 4. **Run the FastAPI app**:
123
- ```bash
124
- uvicorn main:app --host 0.0.0.0 --port 8080
125
- ```
 
126
 
127
- 5. **Deploy on Fly.io**:
128
- Use the following steps to deploy the app on Fly.io.
 
129
 
130
- ## Fly.io Deployment
 
 
 
131
 
132
- To deploy this FastAPI app on Fly.io, follow these steps:
133
 
134
- ### Step 1: Install Fly CLI
135
- If you haven't already, install the Fly.io CLI:
136
  ```bash
 
137
  curl -L https://fly.io/install.sh | sh
138
  ```
139
 
140
- ### Step 2: Authenticate with Fly.io
141
- Log in to your Fly.io account:
142
  ```bash
 
143
  fly auth login
144
- ```
145
 
146
- ### Step 3: Initialize Fly.io App
147
- Run the following command in the root directory of your project:
148
- ```bash
149
  fly launch
150
- ```
151
- During the launch process:
152
- - Fly will ask you for a unique app name.
153
- - It will detect your `Dockerfile` automatically.
154
- - Accept default region recommendations or specify your preferred region.
155
 
156
- ### Step 4: Scale Resources
157
- Increase memory allocation for running the model. For example, to set the memory to 2 GB:
158
- ```bash
159
  fly scale memory 2048
160
  ```
161
 
162
- ### Step 5: Deploy the App
163
- Once configured, deploy the app with:
164
  ```bash
165
  fly deploy
166
- ```
167
 
168
- ### Step 6: Monitor and Test
169
- To check logs and ensure the app is running correctly:
170
- ```bash
171
  fly logs
172
  ```
173
 
174
- Access your deployed app at the Fly.io-provided URL (e.g., `https://your-app-name.fly.dev`).
175
-
176
  ## Usage
177
 
178
- Access the web interface through the Fly.io URL or `http://localhost:8080` (if running locally) to test the NER model and view recognized entities.
 
 
 
 
 
 
 
 
 
 
 
 
179
 
180
- This application leverages the XLM-RoBERTa Large model fine-tuned on Azerbaijani language data for high-accuracy named entity recognition.
 
1
+ # Named Entity Recognition for Azerbaijani Language
2
 
3
+ A custom Named Entity Recognition (NER) model specifically designed for the Azerbaijani language. This project includes a FastAPI application for model deployment and a user-friendly frontend interface for testing and visualizing NER results.
4
 
5
+ ## Demo
6
 
7
+ Try the live demo: [Named Entity Recognition Demo](https://named-entity-recognition.fly.dev/)
8
 
9
+ **Note:** The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup.
10
 
11
+ ## Project Structure
12
 
13
+ ```
 
 
14
  .
15
+ ├── Dockerfile # Docker image configuration
16
+ ├── README.md # Project documentation
17
+ ├── fly.toml # Fly.io deployment configuration
18
+ ├── main.py # FastAPI application entry point
19
+ ├── models/ # Model-related files
20
+ │ ├── XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook
21
+ │ ├── mBERT.ipynb # mBERT training notebook
22
+ │ ├── push_to_HF.py # Hugging Face upload script
23
+ │ └── train.parquet # Training data
24
+ ├── requirements.txt # Python dependencies
25
+ ├── static/ # Frontend assets
26
+ │ ├── app.js # Frontend logic
27
+ │ └── style.css # UI styling
28
+ └── templates/ # HTML templates
29
+ └── index.html # Main UI template
30
  ```
31
 
32
+ ## Models & Dataset
 
 
 
 
 
 
33
 
34
+ ### Available Models
35
 
36
+ - [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner)
37
+ - [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner)
38
+ - [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
39
+ - [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner)
40
 
41
+ ### Dataset
42
+ - [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
43
 
44
+ **Note:** All models were fine-tuned on an A100 GPU using Google Colab Pro+. The XLM-RoBERTa base model is currently deployed in production.
45
 
46
+ ## Model Performance
47
 
48
+ ### mBERT Performance
 
 
 
 
49
 
50
+ | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
51
+ |-------|---------------|-----------------|-----------|---------|-------|-----------|
52
+ | 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 |
53
+ | 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 |
54
+ | 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 |
55
 
56
+ ### XLM-RoBERTa Base Performance
 
 
 
 
 
 
 
 
57
 
58
+ | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
59
+ |-------|---------------|-----------------|-----------|---------|-------|
60
+ | 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 |
61
+ | 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 |
62
+ | 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 |
63
+ | 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 |
64
 
65
+ ### XLM-RoBERTa Large Performance
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
+ | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
68
+ |-------|---------------|-----------------|-----------|---------|-------|
69
+ | 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 |
70
+ | 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 |
71
+ | 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 |
72
+ | 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 |
73
 
74
+ ### Azeri-Turkish-BERT Performance
75
 
76
+ | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
77
+ |-------|---------------|-----------------|-----------|---------|-------|
78
+ | 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 |
79
+ | 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 |
80
+ | 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 |
81
+ | 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 |
 
 
 
 
 
82
 
83
+ ## Setup Instructions
84
 
85
+ ### Local Development
86
 
87
+ 1. **Clone the repository**
88
+ ```bash
89
+ git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git
90
+ cd Named_Entity_Recognition
91
+ ```
 
 
 
 
 
 
 
 
92
 
93
+ 2. **Set up Python environment**
94
+ ```bash
95
+ # Create virtual environment
96
+ python -m venv .venv
97
 
98
+ # Activate virtual environment
99
+ # On Unix/macOS:
100
+ source .venv/bin/activate
101
+ # On Windows:
102
+ .venv\Scripts\activate
103
 
104
+ # Install dependencies
105
+ pip install -r requirements.txt
106
+ ```
107
 
108
+ 3. **Run the application**
109
+ ```bash
110
+ uvicorn main:app --host 0.0.0.0 --port 8080
111
+ ```
112
 
113
+ ### Fly.io Deployment
114
 
115
+ 1. **Install Fly CLI**
 
116
  ```bash
117
+ # On Unix/macOS
118
  curl -L https://fly.io/install.sh | sh
119
  ```
120
 
121
+ 2. **Configure deployment**
 
122
  ```bash
123
+ # Login to Fly.io
124
  fly auth login
 
125
 
126
+ # Initialize app
 
 
127
  fly launch
 
 
 
 
 
128
 
129
+ # Configure memory (minimum 2GB recommended)
 
 
130
  fly scale memory 2048
131
  ```
132
 
133
+ 3. **Deploy application**
 
134
  ```bash
135
  fly deploy
 
136
 
137
+ # Monitor deployment
 
 
138
  fly logs
139
  ```
140
 
 
 
141
  ## Usage
142
 
143
+ 1. Access the application:
144
+ - Local: http://localhost:8080
145
+ - Production: https://named-entity-recognition.fly.dev
146
+
147
+ 2. Enter Azerbaijani text in the input field
148
+ 3. Click "Process" to view the named entities
149
+ 4. Results will display recognized entities highlighted in different colors
150
+
151
+ ## Contributing
152
+
153
+ Contributions are welcome! Please feel free to submit a Pull Request.
154
+
155
+ ## License
156
 
157
+ This project is open source and available under the MIT License.