Update to rank 32 LoRA with supplement data (80% accuracy)

Browse files

Files changed (5) hide show

README.md +62 -51
adapter_config.json +3 -3
adapter_model.safetensors +2 -2
label_mapping.json +34 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,66 +1,77 @@
-# mmBERT-32K Intent Classifier (LoRA)
-Multi-class intent/category classifier based on **mmBERT-32K-YaRN** for routing LLM requests to appropriate models.
-## Model Description
-This model classifies text into academic/topic categories from MMLU-Pro dataset for intelligent request routing in Mixture-of-Models (MoM) systems.
-### Categories
-Business, Law, Psychology, Biology, Chemistry, Computer Science, Economics, Engineering, Health, History, Math, Philosophy, Physics, and more.
-### Base Model
-- **Base**: [llm-semantic-router/mmbert-32k-yarn](https://huggingface.co/llm-semantic-router/mmbert-32k-yarn)
-- **Architecture**: ModernBERT with YaRN RoPE scaling
-- **Context Length**: 32,768 tokens
-- **Languages**: 1800+ (via Glot500 vocabulary)
-### Training Details
-- **Method**: LoRA fine-tuning
-- **LoRA Rank**: 8
-- **LoRA Alpha**: 16
-- **Epochs**: 5
-- **Batch Size**: 8
-- **Learning Rate**: 3e-5
-- **Dataset**: TIGER-Lab/MMLU-Pro
-### Performance
 | Metric | Score |
 |--------|-------|
-| **Accuracy** | 76.83% |
-| **F1 Score** | 76.99% |
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 from peft import PeftModel
-import torch
-# Load model
-base_model = "llm-semantic-router/mmbert-32k-yarn"
-adapter = "llm-semantic-router/mmbert32k-intent-classifier-lora"
-tokenizer = AutoTokenizer.from_pretrained(adapter)
-model = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=14)
-model = PeftModel.from_pretrained(model, adapter)
 # Inference
-text = "What is the derivative of x^2?"
-inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
-with torch.no_grad():
-    outputs = model(**inputs)
-    prediction = torch.argmax(outputs.logits, dim=-1)
 ```
-## Intended Use
-- Request routing in Mixture-of-Models systems
-- Topic classification for LLM queries
-- Academic domain classification
-- Content categorization
-## License
-Apache 2.0

+---
+license: apache-2.0
+base_model: llm-semantic-router/mmbert-32k-yarn
+tags:
+- text-classification
+- intent-classification
+- modernbert
+- lora
+- peft
+- mmlu-pro
+datasets:
+- TIGER-Lab/MMLU-Pro
+- LLM-Semantic-Router/category-classifier-supplement
+language:
+- en
+- multilingual
+metrics:
+- accuracy
+- f1
+pipeline_tag: text-classification
+---
+# mmBERT-32K Intent Classifier (LoRA Adapter)
+LoRA adapter for intent classification based on mmBERT-32K-YaRN (32K context, multilingual).
+## Model Details
+- **Base Model**: [llm-semantic-router/mmbert-32k-yarn](https://huggingface.co/llm-semantic-router/mmbert-32k-yarn)
+- **Training Method**: LoRA (Low-Rank Adaptation)
+- **LoRA Rank**: 32
+- **LoRA Alpha**: 64
+- **Trainable Parameters**: 6.8M (2.2% of base model)
+- **Adapter Size**: 27 MB
+## Training Data
+- **Primary**: [TIGER-Lab/MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) (~12K academic questions)
+- **Supplement**: [LLM-Semantic-Router/category-classifier-supplement](https://huggingface.co/datasets/LLM-Semantic-Router/category-classifier-supplement) (653 samples including casual "other" examples)
+## Categories (14 classes)
+biology, business, chemistry, computer science, economics, engineering, health, history, law, math, other, philosophy, physics, psychology
+## Performance
 | Metric | Score |
 |--------|-------|
+| Test Accuracy | 80.0% |
+| Adapter Size | 27 MB |
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 from peft import PeftModel
+# Load base model and LoRA adapter
+base_model = AutoModelForSequenceClassification.from_pretrained(
+    "llm-semantic-router/mmbert-32k-yarn", num_labels=14
+)
+model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-intent-classifier-lora")
+tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-intent-classifier-lora")
 # Inference
+inputs = tokenizer("How do neural networks learn?", return_tensors="pt")
+outputs = model(**inputs)
+predicted_class = outputs.logits.argmax().item()
 ```
+## Training Configuration
+- Epochs: 5
+- Batch Size: 16
+- Learning Rate: 2e-4
+- Weight Decay: 0.1
+- Optimizer: AdamW with cosine LR scheduler

adapter_config.json CHANGED Viewed

@@ -16,7 +16,7 @@
   "layers_pattern": null,
   "layers_to_transform": null,
   "loftq_config": {},
-  "lora_alpha": 16,
   "lora_bias": false,
   "lora_dropout": 0.1,
   "megatron_config": null,
@@ -28,13 +28,13 @@
   "peft_type": "LORA",
   "peft_version": "0.18.1",
   "qalora_group_size": 16,
-  "r": 8,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "mlp.Wi",
     "attn.Wo",
     "attn.Wqkv",
     "mlp.Wo"
   ],
   "target_parameters": null,

   "layers_pattern": null,
   "layers_to_transform": null,
   "loftq_config": {},
+  "lora_alpha": 64,
   "lora_bias": false,
   "lora_dropout": 0.1,
   "megatron_config": null,
   "peft_type": "LORA",
   "peft_version": "0.18.1",
   "qalora_group_size": 16,
+  "r": 32,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "attn.Wo",
     "attn.Wqkv",
+    "mlp.Wi",
     "mlp.Wo"
   ],
   "target_parameters": null,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6eb8ae19bccb619769280b4a86ff4b40c4eb462a0a80a1db45b12173aaeb5be
-size 6823088

 version https://git-lfs.github.com/spec/v1
+oid sha256:aa6fc5a99cb5517787073f4e4824d12a8a67ad7a10393a33a11f394836a8ee37
+size 27098736

label_mapping.json CHANGED Viewed

	@@ -1 +1,34 @@
1	- {"label_to_idx": {"biology": 0, "business": 1, "chemistry": 2, "computer science": 3, "economics": 4, "engineering": 5, "health": 6, "history": 7, "law": 8, "math": 9, "other": 10, "philosophy": 11, "physics": 12, "psychology": 13}, "idx_to_label": {"0": "biology", "1": "business", "2": "chemistry", "3": "computer science", "4": "economics", "5": "engineering", "6": "health", "7": "history", "8": "law", "9": "math", "10": "other", "11": "philosophy", "12": "physics", "13": "psychology"}}

+{
+  "category_to_idx": {
+    "biology": 0,
+    "business": 1,
+    "chemistry": 2,
+    "computer science": 3,
+    "economics": 4,
+    "engineering": 5,
+    "health": 6,
+    "history": 7,
+    "law": 8,
+    "math": 9,
+    "other": 10,
+    "philosophy": 11,
+    "physics": 12,
+    "psychology": 13
+  },
+  "idx_to_category": {
+    "0": "biology",
+    "1": "business",
+    "2": "chemistry",
+    "3": "computer science",
+    "4": "economics",
+    "5": "engineering",
+    "6": "health",
+    "7": "history",
+    "8": "law",
+    "9": "math",
+    "10": "other",
+    "11": "philosophy",
+    "12": "physics",
+    "13": "psychology"
+  }
+}

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:de007bea22dcb88578703fe7cbafb05e81ad70f6e86a48f9973a14db3a420500
 size 5841

 version https://git-lfs.github.com/spec/v1
+oid sha256:e6ad45c3a9791623d8aa8e5e5e4a4ce6eb6585cd5dbfe87652080f9c36ae1af6
 size 5841