AdilHayat173 commited on
Commit
edb41d9
·
verified ·
1 Parent(s): 923e335

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -92
README.md CHANGED
@@ -1,92 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- base_model: bert-base-cased
4
- tags:
5
- - generated_from_trainer
6
- datasets:
7
- - conll2003
8
- metrics:
9
- - precision
10
- - recall
11
- - f1
12
- - accuracy
13
- model-index:
14
- - name: token_classification
15
- results:
16
- - task:
17
- name: Token Classification
18
- type: token-classification
19
- dataset:
20
- name: conll2003
21
- type: conll2003
22
- config: conll2003
23
- split: validation
24
- args: conll2003
25
- metrics:
26
- - name: Precision
27
- type: precision
28
- value: 0.9325062034739454
29
- - name: Recall
30
- type: recall
31
- value: 0.9486704813194211
32
- - name: F1
33
- type: f1
34
- value: 0.9405188954700927
35
- - name: Accuracy
36
- type: accuracy
37
- value: 0.9859745687878966
38
- ---
39
-
40
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
41
- should probably proofread and complete it, then remove this comment. -->
42
-
43
- # token_classification
44
-
45
- This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the conll2003 dataset.
46
- It achieves the following results on the evaluation set:
47
- - Loss: 0.0628
48
- - Precision: 0.9325
49
- - Recall: 0.9487
50
- - F1: 0.9405
51
- - Accuracy: 0.9860
52
-
53
- ## Model description
54
-
55
- More information needed
56
-
57
- ## Intended uses & limitations
58
-
59
- More information needed
60
-
61
- ## Training and evaluation data
62
-
63
- More information needed
64
-
65
- ## Training procedure
66
-
67
- ### Training hyperparameters
68
-
69
- The following hyperparameters were used during training:
70
- - learning_rate: 2e-05
71
- - train_batch_size: 8
72
- - eval_batch_size: 8
73
- - seed: 42
74
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
75
- - lr_scheduler_type: linear
76
- - num_epochs: 3
77
-
78
- ### Training results
79
-
80
- | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
81
- |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
82
- | 0.0751 | 1.0 | 1756 | 0.0683 | 0.8977 | 0.9291 | 0.9132 | 0.9812 |
83
- | 0.0349 | 2.0 | 3512 | 0.0682 | 0.9289 | 0.9433 | 0.9360 | 0.9844 |
84
- | 0.0206 | 3.0 | 5268 | 0.0628 | 0.9325 | 0.9487 | 0.9405 | 0.9860 |
85
-
86
-
87
- ### Framework versions
88
-
89
- - Transformers 4.42.4
90
- - Pytorch 2.3.1+cu121
91
- - Datasets 2.21.0
92
- - Tokenizers 0.19.1
 
1
+ # Token Classification Model
2
+
3
+ ## Description
4
+ This project involves developing a machine learning model for token classification, specifically for Named Entity Recognition (NER). Using a fine-tuned BERT model from the Hugging Face library, this system classifies tokens in text into predefined categories like names, locations, and dates.
5
+
6
+ The model is trained on a dataset annotated with entity labels to accurately classify each token. This token classification system is useful for information extraction, document processing, and conversational AI applications.
7
+
8
+ ## Technologies Used
9
+
10
+ ### Dataset
11
+ - **Source:** kaggle : conll2003
12
+ - **Purpose:** Contains text data with annotated entities for token classification.
13
+
14
+ ### Model
15
+ - **Base Model:** BERT (bert-base-uncased)
16
+ - **Library:** Hugging Face transformers
17
+ - **Task:** Token Classification (Named Entity Recognition)
18
+
19
+ ### Approach
20
+
21
+ #### Preprocessing:
22
+ - Load and preprocess the dataset.
23
+ - Tokenize the text data and align labels with tokens.
24
+
25
+ #### Fine-Tuning:
26
+ - Fine-tune the BERT model on the token classification dataset.
27
+
28
+ #### Training:
29
+ - Train the model to classify each token into predefined entity labels.
30
+
31
+ #### Inference:
32
+ - Use the trained model to predict entity labels for new text inputs.
33
+
34
+ ### Key Technologies
35
+ - **Deep Learning (BERT):** For advanced token classification and contextual understanding.
36
+ - **Natural Language Processing (NLP):** For text preprocessing, tokenization, and entity recognition.
37
+ - **Machine Learning Algorithms:** For model training and prediction tasks.
38
+
39
+ ## Streamlit App
40
+ You can view and interact with the Streamlit app for token classification [here](https://huggingface.co/spaces/AdilHayat173/token_classifcation).
41
+ ## Examples
42
+ Here are some examples of outputs from the model:
43
+
44
+ ![example1](https://github.com/user-attachments/assets/9e9dd85c-1447-4229-b691-febec17439cf)
45
+ ![example2](https://github.com/user-attachments/assets/97dfc391-bda9-4614-93f7-a5f45d64dd03)
46
+
47
+ ## Google Colab Notebook
48
+ You can view and run the Google Colab notebook for this project [here](https://colab.research.google.com/drive/1GYVlIToQ_lnT8XEjGrR2WFkUQWpWXgQi#scrollTo=ZlyX1Lgn8gjj).
49
+
50
+ ## Acknowledgements
51
+ - Hugging Face for transformer models and libraries.
52
+ - Streamlit for creating the interactive web interface.
53
+ - [Your Dataset Provider] for the token classification dataset.
54
+
55
+ ## Author
56
+ - AdilHayat
57
+ - [Hugging Face Profile](https://huggingface.co/AdilHayat173)
58
+ - [GitHub Profile](https://github.com/AdilHayat21173)
59
+
60
+ ## Feedback
61
+ If you have any feedback, please reach out to us at hayatadil300@gmail.com.
62
+
63
+