primel commited on
Commit
efeb57c
·
verified ·
1 Parent(s): eb5bae0

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Multi-Task NER + Intent + Language Model
3
+
4
+ This model performs three tasks simultaneously:
5
+ 1. **Named Entity Recognition (NER)**: Extracts entities from B2B transaction descriptions
6
+ 2. **Intent Classification**: Classifies transaction intent/purpose
7
+ 3. **Language Detection**: Detects language (English, Russian, Uzbek Latin/Cyrillic, Mixed)
8
+
9
+ ## Model Details
10
+ - Base model: `google-bert/bert-base-multilingual-uncased`
11
+ - Architecture: Enhanced multi-task model with BiLSTM for NER, attention pooling for classification
12
+ - Training: Optimized for realistic B2B transaction descriptions
13
+
14
+ ## Supported Languages
15
+ - English (en)
16
+ - Russian (ru)
17
+ - Uzbek Latin (uz_latn)
18
+ - Uzbek Cyrillic (uz_cyrl)
19
+ - Mixed language text
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ import torch
25
+ import torch.nn as nn
26
+ import numpy as np
27
+ import json
28
+ from transformers import AutoTokenizer, AutoModel
29
+ from huggingface_hub import hf_hub_download
30
+
31
+ # Download model files
32
+ model_id = "primel/aibanov"
33
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
34
+
35
+ # Download label mappings
36
+ mappings_file = hf_hub_download(repo_id=model_id, filename="label_mappings.json")
37
+ with open(mappings_file, "r") as f:
38
+ label_mappings = json.load(f)
39
+
40
+ id2tag = {int(k): v for k, v in label_mappings["id2tag"].items()}
41
+ id2intent = {int(k): v for k, v in label_mappings["id2intent"].items()}
42
+ id2lang = {int(k): v for k, v in label_mappings["id2lang"].items()}
43
+
44
+ # Define model architecture (same as training)
45
+ class EnhancedMultiTaskModel(nn.Module):
46
+ # ... (copy the model class from training script)
47
+ pass
48
+
49
+ # Load model
50
+ base_bert = "google-bert/bert-base-multilingual-uncased"
51
+ model = EnhancedMultiTaskModel(
52
+ model_name=base_bert,
53
+ num_ner_labels=len(label_mappings["tag2id"]),
54
+ num_intent_labels=len(label_mappings["intent2id"]),
55
+ num_lang_labels=len(label_mappings["lang2id"]),
56
+ dropout=0.15
57
+ )
58
+
59
+ # Load trained weights
60
+ weights_file = hf_hub_download(repo_id=model_id, filename="pytorch_model.bin")
61
+ state_dict = torch.load(weights_file, map_location='cpu')
62
+ model.load_state_dict(state_dict)
63
+ model.eval()
64
+
65
+ # Inference
66
+ text = "Оплата 100% за товары согласно договору №123 от 15.01.2025г ИНН 987654321"
67
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=192)
68
+
69
+ with torch.no_grad():
70
+ outputs = model(**inputs)
71
+
72
+ # Process outputs
73
+ ner_logits = outputs['ner_logits'][0].numpy()
74
+ intent_logits = outputs['intent_logits'][0].numpy()
75
+ lang_logits = outputs['lang_logits'][0].numpy()
76
+
77
+ # Get predictions
78
+ intent_id = np.argmax(intent_logits)
79
+ intent = id2intent[intent_id]
80
+ print(f"Intent: {intent}")
81
+
82
+ lang_id = np.argmax(lang_logits)
83
+ language = id2lang[lang_id]
84
+ print(f"Language: {language}")
85
+ ```
86
+
87
+ ## License
88
+ Apache 2.0
89
+
90
+ ## Citation
91
+ ```bibtex
92
+ @misc{aibanov2025,
93
+ author = {primel},
94
+ title = {Multi-Task NER Intent Language Model},
95
+ year = {2025},
96
+ url = {https://huggingface.co/primel/aibanov}
97
+ }
98
+ ```