jordigonzm commited on
Commit
1ad7416
·
verified ·
1 Parent(s): 417e4bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - mnaguib/WikiNER
4
+ pipeline_tag: token-classification
5
+ ---
6
+ # Fine-Tuning mDeBERTa for Named Entity Recognition (NER)
7
+
8
+ ## 📌 Model Overview
9
+
10
+ This repository contains a fine-tuned version of `MoritzLaurer/mDeBERTa-v3-base-mnli-xnli` for **Named Entity Recognition (NER)** using the `mnaguib/WikiNER` dataset in multiple languages.
11
+
12
+ ## 🚀 Features
13
+
14
+ - **Pretrained on mDeBERTa**: A powerful multilingual model for text understanding.
15
+ - **Fine-tuned for NER**: Detects entities such as persons (`PER`), locations (`LOC`), organizations (`ORG`), and more.
16
+
17
+ ## 📖 Training Details
18
+
19
+ - **Base model**: `MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`
20
+ - **Dataset**: `mnaguib/WikiNER`
21
+ - **Languages**: English (`en`), Spanish (es), ...
22
+ - **Epochs**: `2`
23
+ - **Optimizer**: AdamW
24
+ - **Loss function**: CrossEntropyLoss
25
+
26
+ ## Inference Example
27
+
28
+ To use the model for inference:
29
+
30
+ ```python
31
+ import torch
32
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
33
+
34
+ # Load the model and tokenizer
35
+ model_path = "jordigonzm/mdeberta-v3-base-multilingual-ner"
36
+ model = AutoModelForTokenClassification.from_pretrained(model_path)
37
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
38
+ model.eval()
39
+
40
+ # NER Prediction Function
41
+ def predict_ner(text):
42
+ tokens = tokenizer(text, truncation=True, padding=True, return_tensors="pt")
43
+ with torch.no_grad():
44
+ outputs = model(**tokens)
45
+ logits = outputs.logits
46
+ predictions = torch.argmax(logits, dim=-1).squeeze().tolist()
47
+ tokens_decoded = tokenizer.convert_ids_to_tokens(tokens["input_ids"].squeeze().tolist())
48
+ return list(zip(tokens_decoded, predictions))
49
+
50
+ # Example
51
+ text = "text = "The Mona Lisa is located in the Louvre Museum, in Paris."
52
+ result = predict_ner(text)
53
+ print(result)
54
+ ```
55
+
56
+ ## Model Usage
57
+
58
+ You can load the model directly from Hugging Face:
59
+
60
+ ```python
61
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
62
+
63
+ model = AutoModelForTokenClassification.from_pretrained("jordigonzm/mdeberta-v3-base-multilingual-ner")
64
+ tokenizer = AutoTokenizer.from_pretrained("jordigonzm/mdeberta-v3-base-multilingual-ner")
65
+ ```
66
+
67
+ ---