AventIQ-AI
/

NER-BERT-AI-Model-using-annotated-corpus-ner

Safetensors

bert

Model card Files Files and versions

xet

Community

vishal1364 commited on May 22, 2025

Commit

d80e21b

verified ·

1 Parent(s): e365691

Create README.md

Browse files

Files changed (1) hide show

README.md +107 -0

README.md ADDED Viewed

	@@ -0,0 +1,107 @@

+# 🧠 NER-BERT-AI-Model-using-annotated-corpus-ner
+A BERT-based Named Entity Recognition (NER) model fine-tuned on the Entity Annotated Corpus. It classifies tokens in text into predefined entity types such as Person (PER), Organization (ORG), and Location (LOC). This model is well-suited for information extraction, resume parsing, and chatbot applications.
+---
+## ✨ Model Highlights
+- 📌 Based on `bert-base-cased` (by Google)
+- 🔍 Fine-tuned on the Entity Annotated Corpus (`ner_dataset.csv`)
+- ⚡ Supports prediction of 3 entity types: PER, ORG, LOC
+- 💾 Compatible with Hugging Face `pipeline()` for easy inference
+---
+## 🧠 Intended Uses
+- Resume and document parsing
+- Chatbots and virtual assistants
+- Named entity tagging in structured documents
+- Search and information retrieval systems
+- News or content analysis
+---
+## 🚫 Limitations
+- Trained only on English formal texts
+- May not generalize well to informal text or domain-specific jargon
+- Subword tokenization may split entities (e.g., "Cupertino" → "Cup", "##ert", "##ino")
+- Limited to the entities available in the original dataset (PER, ORG, LOC only)
+---
+## 🏋️‍♂️ Training Details
+| Field         | Value                        |
+|---------------|------------------------------|
+| Base Model    | `bert-base-cased`            |
+| Dataset       | Entity Annotated Corpus      |
+| Framework     | PyTorch with Transformers    |
+| Epochs        | 3                            |
+| Batch Size    | 16                           |
+| Max Length    | 128 tokens                   |
+| Optimizer     | AdamW                        |
+| Loss          | CrossEntropyLoss (token-level) |
+| Device        | Trained on CUDA-enabled GPU  |
+---
+## 📊 Evaluation Metrics
+| Metric    | Score |
+|-----------|-------|
+| Precision | 83.15 |
+| Recall    | 83.85 |
+| F1-Score  | 83.50 |
+---
+## 🔎 Label Mapping
+| Label ID | Entity Type |
+|----------|--------------|
+| 0        | O            |
+| 1        | B-PER        |
+| 2        | I-PER        |
+| 3        | B-ORG        |
+| 4        | I-ORG        |
+| 5        | B-LOC        |
+| 6        | I-LOC        |
+---
+## 🚀 Usage
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+from transformers import pipeline
+model_name = "/AventIQ-AI/NER-BERT-AI-Model-using-annotated-corpus-ner"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForTokenClassification.from_pretrained(model_name)
+nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "My name is Wolfgang and I live in Berlin"
+ner_results = nlp(example)
+print(ner_results)
+```
+## 🧩 Quantization
+Post-training quantization can be applied using PyTorch to reduce model size and improve inference performance, especially on edge devices.
+## 🗂 Repository Structure
+```
+.
+├── model/               # Trained model files
+├── tokenizer_config/    # Tokenizer and vocab files
+├── model.safensors/     # Model in safetensors format
+├── README.md            # Model card
+```
+## 🤝 Contributing
+We welcome feedback, bug reports, and improvements!
+Feel free to open an issue or submit a pull request.