chopratejas commited on
Commit
639f08a
·
verified ·
1 Parent(s): b64452d

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - text-classification
8
+ - image-optimization
9
+ - technique-routing
10
+ - headroom
11
+ datasets:
12
+ - custom
13
+ metrics:
14
+ - accuracy
15
+ base_model: microsoft/MiniLM-L12-H384-uncased
16
+ pipeline_tag: text-classification
17
+ ---
18
+
19
+ # Technique Router (MiniLM)
20
+
21
+ A fine-tuned MiniLM classifier that routes image queries to optimal compression techniques for the [Headroom SDK](https://github.com/headroom-ai/headroom).
22
+
23
+ ## Model Description
24
+
25
+ This model classifies natural language queries about images into one of four optimization techniques:
26
+
27
+ | Technique | Token Savings | Best For |
28
+ |-----------|--------------|----------|
29
+ | `transcode` | ~99% | Text extraction, OCR tasks |
30
+ | `crop` | 50-90% | Region-specific queries |
31
+ | `full_low` | ~87% | General understanding |
32
+ | `preserve` | 0% | Fine details, counting |
33
+
34
+ ## Training Data
35
+
36
+ - **Base examples**: 145 human-written queries
37
+ - **Expanded dataset**: 1,157 examples (via template expansion + synonyms)
38
+ - **Split**: 85% train, 15% validation
39
+
40
+ ## Performance
41
+
42
+ - **Validation Accuracy**: 93.7%
43
+ - **Model Size**: ~128MB
44
+
45
+ ### Per-Class Performance
46
+
47
+ | Class | Precision | Recall | F1-Score |
48
+ |-------|-----------|--------|----------|
49
+ | transcode | 0.95 | 0.92 | 0.93 |
50
+ | crop | 0.92 | 0.97 | 0.94 |
51
+ | preserve | 0.97 | 0.90 | 0.93 |
52
+ | full_low | 0.89 | 0.96 | 0.92 |
53
+
54
+ ## Usage
55
+
56
+ ```python
57
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
58
+ import torch
59
+
60
+ # Load model
61
+ model_id = "chopratejas/technique-router"
62
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
63
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
64
+ model.eval()
65
+
66
+ # Classify a query
67
+ query = "What brand is the TV?"
68
+ inputs = tokenizer(query, return_tensors="pt", truncation=True, padding=True)
69
+
70
+ with torch.no_grad():
71
+ outputs = model(**inputs)
72
+ probs = torch.softmax(outputs.logits, dim=-1)
73
+ pred_id = torch.argmax(probs, dim=-1).item()
74
+ confidence = probs[0][pred_id].item()
75
+
76
+ technique = model.config.id2label[pred_id]
77
+ print(f"{query} -> {technique} ({confidence:.0%})")
78
+ # Output: What brand is the TV? -> preserve (73%)
79
+ ```
80
+
81
+ ## With Headroom SDK
82
+
83
+ ```python
84
+ from headroom.image import TrainedRouter
85
+
86
+ router = TrainedRouter()
87
+ decision = router.classify(image_bytes, "What brand is the TV?")
88
+ print(decision.technique) # Technique.PRESERVE
89
+ ```
90
+
91
+ ## Intended Use
92
+
93
+ This model is designed for:
94
+ - Routing image analysis queries to optimal compression techniques
95
+ - Reducing token usage in vision-language model applications
96
+ - Enabling cost-effective image understanding at scale
97
+
98
+ ## Limitations
99
+
100
+ - English language only
101
+ - Optimized for common image understanding queries
102
+ - May not generalize well to domain-specific terminology
103
+
104
+ ## Citation
105
+
106
+ ```bibtex
107
+ @misc{headroom-technique-router,
108
+ title={Technique Router for Image Token Optimization},
109
+ author={Headroom AI},
110
+ year={2025},
111
+ publisher={Hugging Face},
112
+ url={https://huggingface.co/chopratejas/technique-router}
113
+ }
114
+ ```
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "transcode",
13
+ "1": "crop",
14
+ "2": "preserve",
15
+ "3": "full_low"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 1536,
19
+ "label2id": {
20
+ "crop": 1,
21
+ "full_low": 3,
22
+ "preserve": 2,
23
+ "transcode": 0
24
+ },
25
+ "layer_norm_eps": 1e-12,
26
+ "max_position_embeddings": 512,
27
+ "model_type": "bert",
28
+ "num_attention_heads": 12,
29
+ "num_hidden_layers": 12,
30
+ "pad_token_id": 0,
31
+ "position_embedding_type": "absolute",
32
+ "problem_type": "single_label_classification",
33
+ "transformers_version": "4.57.3",
34
+ "type_vocab_size": 2,
35
+ "use_cache": true,
36
+ "vocab_size": 30522
37
+ }
label_mapping.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label2id": {
3
+ "transcode": 0,
4
+ "crop": 1,
5
+ "preserve": 2,
6
+ "full_low": 3
7
+ },
8
+ "id2label": {
9
+ "0": "transcode",
10
+ "1": "crop",
11
+ "2": "preserve",
12
+ "3": "full_low"
13
+ }
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45fe8c898db953cd8b62ef06badfdb8e480d8ae59a41c859e45dfc6b50ad11e4
3
+ size 133469456
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff