Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,6 +14,32 @@ library_name: transformers
|
|
| 14 |
**Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
| 15 |
**Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
## Project Overview
|
| 18 |
|
| 19 |
This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.
|
|
|
|
| 14 |
**Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
| 15 |
**Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
|
| 16 |
|
| 17 |
+
## File Structure
|
| 18 |
+
|
| 19 |
+
```plaintext
|
| 20 |
+
.
|
| 21 |
+
βββ README.md # Documentation for the project
|
| 22 |
+
βββ config.json # Configuration file for model deployment
|
| 23 |
+
βββ model-001.safetensors # Model weights in Safetensors format for safe deployment
|
| 24 |
+
βββ sentencepiece.bpe.model # SentencePiece model for tokenization
|
| 25 |
+
βββ special_tokens_map.json # Map for special tokens (e.g., <PAD>, <CLS>)
|
| 26 |
+
βββ tokenizer.json # JSON configuration for tokenizer
|
| 27 |
+
βββ tokenizer_config.json # Additional tokenizer configurations
|
| 28 |
+
βββ xlm_roberta_large.ipynb # Jupyter Notebook for training and experimentation
|
| 29 |
+
βββ xlm_roberta_large.py # Python script for training and evaluation
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
**Explanation**:
|
| 33 |
+
- **README.md**: Provides detailed information on the project, including setup, usage, and evaluation.
|
| 34 |
+
- **config.json**: Stores configuration details for model deployment, such as model parameters.
|
| 35 |
+
- **model-001.safetensors**: Contains model weights in a secure, efficient format.
|
| 36 |
+
- **sentencepiece.bpe.model**: Tokenization model used to segment sentences into subwords for `xlm-roberta-large`.
|
| 37 |
+
- **special_tokens_map.json**: Maps special tokens required by the tokenizer (e.g., `<PAD>` for padding).
|
| 38 |
+
- **tokenizer.json**: Contains the main tokenizer configuration.
|
| 39 |
+
- **tokenizer_config.json**: Additional configuration settings for the tokenizer.
|
| 40 |
+
- **xlm_roberta_large.ipynb**: A Jupyter notebook for experimenting with and training the model.
|
| 41 |
+
- **xlm_roberta_large.py**: Python script for training and running evaluations outside of Jupyter.
|
| 42 |
+
|
| 43 |
## Project Overview
|
| 44 |
|
| 45 |
This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.
|