Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +114 -0
config.json +37 -0
label_mapping.json +14 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,114 @@

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+tags:
+- text-classification
+- image-optimization
+- technique-routing
+- headroom
+datasets:
+- custom
+metrics:
+- accuracy
+base_model: microsoft/MiniLM-L12-H384-uncased
+pipeline_tag: text-classification
+---
+# Technique Router (MiniLM)
+A fine-tuned MiniLM classifier that routes image queries to optimal compression techniques for the [Headroom SDK](https://github.com/headroom-ai/headroom).
+## Model Description
+This model classifies natural language queries about images into one of four optimization techniques:
+| Technique | Token Savings | Best For |
+|-----------|--------------|----------|
+| `transcode` | ~99% | Text extraction, OCR tasks |
+| `crop` | 50-90% | Region-specific queries |
+| `full_low` | ~87% | General understanding |
+| `preserve` | 0% | Fine details, counting |
+## Training Data
+- **Base examples**: 145 human-written queries
+- **Expanded dataset**: 1,157 examples (via template expansion + synonyms)
+- **Split**: 85% train, 15% validation
+## Performance
+- **Validation Accuracy**: 93.7%
+- **Model Size**: ~128MB
+### Per-Class Performance
+| Class | Precision | Recall | F1-Score |
+|-------|-----------|--------|----------|
+| transcode | 0.95 | 0.92 | 0.93 |
+| crop | 0.92 | 0.97 | 0.94 |
+| preserve | 0.97 | 0.90 | 0.93 |
+| full_low | 0.89 | 0.96 | 0.92 |
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model
+model_id = "chopratejas/technique-router"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+model.eval()
+# Classify a query
+query = "What brand is the TV?"
+inputs = tokenizer(query, return_tensors="pt", truncation=True, padding=True)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.softmax(outputs.logits, dim=-1)
+    pred_id = torch.argmax(probs, dim=-1).item()
+    confidence = probs[0][pred_id].item()
+technique = model.config.id2label[pred_id]
+print(f"{query} -> {technique} ({confidence:.0%})")
+# Output: What brand is the TV? -> preserve (73%)
+```
+## With Headroom SDK
+```python
+from headroom.image import TrainedRouter
+router = TrainedRouter()
+decision = router.classify(image_bytes, "What brand is the TV?")
+print(decision.technique)  # Technique.PRESERVE
+```
+## Intended Use
+This model is designed for:
+- Routing image analysis queries to optimal compression techniques
+- Reducing token usage in vision-language model applications
+- Enabling cost-effective image understanding at scale
+## Limitations
+- English language only
+- Optimized for common image understanding queries
+- May not generalize well to domain-specific terminology
+## Citation
+```bibtex
+@misc{headroom-technique-router,
+  title={Technique Router for Image Token Optimization},
+  author={Headroom AI},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/chopratejas/technique-router}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "transcode",
+    "1": "crop",
+    "2": "preserve",
+    "3": "full_low"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "crop": 1,
+    "full_low": 3,
+    "preserve": 2,
+    "transcode": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "transformers_version": "4.57.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

label_mapping.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "label2id": {
+    "transcode": 0,
+    "crop": 1,
+    "preserve": 2,
+    "full_low": 3
+  },
+  "id2label": {
+    "0": "transcode",
+    "1": "crop",
+    "2": "preserve",
+    "3": "full_low"
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45fe8c898db953cd8b62ef06badfdb8e480d8ae59a41c859e45dfc6b50ad11e4
+size 133469456

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 1000000000000000019884624838656,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff