HuaminChen commited on
Commit
81eb867
·
verified ·
1 Parent(s): 0f038d7

Update to rank 32 LoRA with supplement data (80% accuracy)

Browse files
README.md CHANGED
@@ -1,66 +1,77 @@
1
- # mmBERT-32K Intent Classifier (LoRA)
2
-
3
- Multi-class intent/category classifier based on **mmBERT-32K-YaRN** for routing LLM requests to appropriate models.
4
-
5
- ## Model Description
6
-
7
- This model classifies text into academic/topic categories from MMLU-Pro dataset for intelligent request routing in Mixture-of-Models (MoM) systems.
8
-
9
- ### Categories
10
- Business, Law, Psychology, Biology, Chemistry, Computer Science, Economics, Engineering, Health, History, Math, Philosophy, Physics, and more.
11
-
12
- ### Base Model
13
- - **Base**: [llm-semantic-router/mmbert-32k-yarn](https://huggingface.co/llm-semantic-router/mmbert-32k-yarn)
14
- - **Architecture**: ModernBERT with YaRN RoPE scaling
15
- - **Context Length**: 32,768 tokens
16
- - **Languages**: 1800+ (via Glot500 vocabulary)
17
-
18
- ### Training Details
19
- - **Method**: LoRA fine-tuning
20
- - **LoRA Rank**: 8
21
- - **LoRA Alpha**: 16
22
- - **Epochs**: 5
23
- - **Batch Size**: 8
24
- - **Learning Rate**: 3e-5
25
- - **Dataset**: TIGER-Lab/MMLU-Pro
26
-
27
- ### Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  | Metric | Score |
30
  |--------|-------|
31
- | **Accuracy** | 76.83% |
32
- | **F1 Score** | 76.99% |
33
 
34
  ## Usage
35
 
36
  ```python
37
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
38
  from peft import PeftModel
39
- import torch
40
-
41
- # Load model
42
- base_model = "llm-semantic-router/mmbert-32k-yarn"
43
- adapter = "llm-semantic-router/mmbert32k-intent-classifier-lora"
44
 
45
- tokenizer = AutoTokenizer.from_pretrained(adapter)
46
- model = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=14)
47
- model = PeftModel.from_pretrained(model, adapter)
 
 
 
48
 
49
  # Inference
50
- text = "What is the derivative of x^2?"
51
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
52
- with torch.no_grad():
53
- outputs = model(**inputs)
54
- prediction = torch.argmax(outputs.logits, dim=-1)
55
  ```
56
 
57
- ## Intended Use
58
-
59
- - Request routing in Mixture-of-Models systems
60
- - Topic classification for LLM queries
61
- - Academic domain classification
62
- - Content categorization
63
-
64
- ## License
65
 
66
- Apache 2.0
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: llm-semantic-router/mmbert-32k-yarn
4
+ tags:
5
+ - text-classification
6
+ - intent-classification
7
+ - modernbert
8
+ - lora
9
+ - peft
10
+ - mmlu-pro
11
+ datasets:
12
+ - TIGER-Lab/MMLU-Pro
13
+ - LLM-Semantic-Router/category-classifier-supplement
14
+ language:
15
+ - en
16
+ - multilingual
17
+ metrics:
18
+ - accuracy
19
+ - f1
20
+ pipeline_tag: text-classification
21
+ ---
22
+
23
+ # mmBERT-32K Intent Classifier (LoRA Adapter)
24
+
25
+ LoRA adapter for intent classification based on mmBERT-32K-YaRN (32K context, multilingual).
26
+
27
+ ## Model Details
28
+
29
+ - **Base Model**: [llm-semantic-router/mmbert-32k-yarn](https://huggingface.co/llm-semantic-router/mmbert-32k-yarn)
30
+ - **Training Method**: LoRA (Low-Rank Adaptation)
31
+ - **LoRA Rank**: 32
32
+ - **LoRA Alpha**: 64
33
+ - **Trainable Parameters**: 6.8M (2.2% of base model)
34
+ - **Adapter Size**: 27 MB
35
+
36
+ ## Training Data
37
+
38
+ - **Primary**: [TIGER-Lab/MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) (~12K academic questions)
39
+ - **Supplement**: [LLM-Semantic-Router/category-classifier-supplement](https://huggingface.co/datasets/LLM-Semantic-Router/category-classifier-supplement) (653 samples including casual "other" examples)
40
+
41
+ ## Categories (14 classes)
42
+
43
+ biology, business, chemistry, computer science, economics, engineering, health, history, law, math, other, philosophy, physics, psychology
44
+
45
+ ## Performance
46
 
47
  | Metric | Score |
48
  |--------|-------|
49
+ | Test Accuracy | 80.0% |
50
+ | Adapter Size | 27 MB |
51
 
52
  ## Usage
53
 
54
  ```python
55
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
56
  from peft import PeftModel
 
 
 
 
 
57
 
58
+ # Load base model and LoRA adapter
59
+ base_model = AutoModelForSequenceClassification.from_pretrained(
60
+ "llm-semantic-router/mmbert-32k-yarn", num_labels=14
61
+ )
62
+ model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-intent-classifier-lora")
63
+ tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-intent-classifier-lora")
64
 
65
  # Inference
66
+ inputs = tokenizer("How do neural networks learn?", return_tensors="pt")
67
+ outputs = model(**inputs)
68
+ predicted_class = outputs.logits.argmax().item()
 
 
69
  ```
70
 
71
+ ## Training Configuration
 
 
 
 
 
 
 
72
 
73
+ - Epochs: 5
74
+ - Batch Size: 16
75
+ - Learning Rate: 2e-4
76
+ - Weight Decay: 0.1
77
+ - Optimizer: AdamW with cosine LR scheduler
adapter_config.json CHANGED
@@ -16,7 +16,7 @@
16
  "layers_pattern": null,
17
  "layers_to_transform": null,
18
  "loftq_config": {},
19
- "lora_alpha": 16,
20
  "lora_bias": false,
21
  "lora_dropout": 0.1,
22
  "megatron_config": null,
@@ -28,13 +28,13 @@
28
  "peft_type": "LORA",
29
  "peft_version": "0.18.1",
30
  "qalora_group_size": 16,
31
- "r": 8,
32
  "rank_pattern": {},
33
  "revision": null,
34
  "target_modules": [
35
- "mlp.Wi",
36
  "attn.Wo",
37
  "attn.Wqkv",
 
38
  "mlp.Wo"
39
  ],
40
  "target_parameters": null,
 
16
  "layers_pattern": null,
17
  "layers_to_transform": null,
18
  "loftq_config": {},
19
+ "lora_alpha": 64,
20
  "lora_bias": false,
21
  "lora_dropout": 0.1,
22
  "megatron_config": null,
 
28
  "peft_type": "LORA",
29
  "peft_version": "0.18.1",
30
  "qalora_group_size": 16,
31
+ "r": 32,
32
  "rank_pattern": {},
33
  "revision": null,
34
  "target_modules": [
 
35
  "attn.Wo",
36
  "attn.Wqkv",
37
+ "mlp.Wi",
38
  "mlp.Wo"
39
  ],
40
  "target_parameters": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6eb8ae19bccb619769280b4a86ff4b40c4eb462a0a80a1db45b12173aaeb5be
3
- size 6823088
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa6fc5a99cb5517787073f4e4824d12a8a67ad7a10393a33a11f394836a8ee37
3
+ size 27098736
label_mapping.json CHANGED
@@ -1 +1,34 @@
1
- {"label_to_idx": {"biology": 0, "business": 1, "chemistry": 2, "computer science": 3, "economics": 4, "engineering": 5, "health": 6, "history": 7, "law": 8, "math": 9, "other": 10, "philosophy": 11, "physics": 12, "psychology": 13}, "idx_to_label": {"0": "biology", "1": "business", "2": "chemistry", "3": "computer science", "4": "economics", "5": "engineering", "6": "health", "7": "history", "8": "law", "9": "math", "10": "other", "11": "philosophy", "12": "physics", "13": "psychology"}}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "category_to_idx": {
3
+ "biology": 0,
4
+ "business": 1,
5
+ "chemistry": 2,
6
+ "computer science": 3,
7
+ "economics": 4,
8
+ "engineering": 5,
9
+ "health": 6,
10
+ "history": 7,
11
+ "law": 8,
12
+ "math": 9,
13
+ "other": 10,
14
+ "philosophy": 11,
15
+ "physics": 12,
16
+ "psychology": 13
17
+ },
18
+ "idx_to_category": {
19
+ "0": "biology",
20
+ "1": "business",
21
+ "2": "chemistry",
22
+ "3": "computer science",
23
+ "4": "economics",
24
+ "5": "engineering",
25
+ "6": "health",
26
+ "7": "history",
27
+ "8": "law",
28
+ "9": "math",
29
+ "10": "other",
30
+ "11": "philosophy",
31
+ "12": "physics",
32
+ "13": "psychology"
33
+ }
34
+ }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:de007bea22dcb88578703fe7cbafb05e81ad70f6e86a48f9973a14db3a420500
3
  size 5841
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6ad45c3a9791623d8aa8e5e5e4a4ce6eb6585cd5dbfe87652080f9c36ae1af6
3
  size 5841