| --- |
| license: apache-2.0 |
| language: |
| - en |
| library_name: transformers |
| tags: |
| - text-classification |
| - image-optimization |
| - technique-routing |
| - headroom |
| datasets: |
| - custom |
| metrics: |
| - accuracy |
| base_model: microsoft/MiniLM-L12-H384-uncased |
| pipeline_tag: text-classification |
| --- |
| |
| # Technique Router (MiniLM) |
|
|
| A fine-tuned MiniLM classifier that routes image queries to optimal compression techniques for the [Headroom SDK](https://github.com/headroom-ai/headroom). |
|
|
| ## Model Description |
|
|
| This model classifies natural language queries about images into one of four optimization techniques: |
|
|
| | Technique | Token Savings | Best For | |
| |-----------|--------------|----------| |
| | `transcode` | ~99% | Text extraction, OCR tasks | |
| | `crop` | 50-90% | Region-specific queries | |
| | `full_low` | ~87% | General understanding | |
| | `preserve` | 0% | Fine details, counting | |
|
|
| ## Training Data |
|
|
| - **Base examples**: 145 human-written queries |
| - **Expanded dataset**: 1,157 examples (via template expansion + synonyms) |
| - **Split**: 85% train, 15% validation |
|
|
| ## Performance |
|
|
| - **Validation Accuracy**: 93.7% |
| - **Model Size**: ~128MB |
|
|
| ### Per-Class Performance |
|
|
| | Class | Precision | Recall | F1-Score | |
| |-------|-----------|--------|----------| |
| | transcode | 0.95 | 0.92 | 0.93 | |
| | crop | 0.92 | 0.97 | 0.94 | |
| | preserve | 0.97 | 0.90 | 0.93 | |
| | full_low | 0.89 | 0.96 | 0.92 | |
| |
| ## Usage |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| # Load model |
| model_id = "chopratejas/technique-router" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| model.eval() |
|
|
| # Classify a query |
| query = "What brand is the TV?" |
| inputs = tokenizer(query, return_tensors="pt", truncation=True, padding=True) |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| probs = torch.softmax(outputs.logits, dim=-1) |
| pred_id = torch.argmax(probs, dim=-1).item() |
| confidence = probs[0][pred_id].item() |
| |
| technique = model.config.id2label[pred_id] |
| print(f"{query} -> {technique} ({confidence:.0%})") |
| # Output: What brand is the TV? -> preserve (73%) |
| ``` |
| |
| ## With Headroom SDK |
| |
| ```python |
| from headroom.image import TrainedRouter |
| |
| router = TrainedRouter() |
| decision = router.classify(image_bytes, "What brand is the TV?") |
| print(decision.technique) # Technique.PRESERVE |
| ``` |
| |
| ## Intended Use |
| |
| This model is designed for: |
| - Routing image analysis queries to optimal compression techniques |
| - Reducing token usage in vision-language model applications |
| - Enabling cost-effective image understanding at scale |
| |
| ## Limitations |
| |
| - English language only |
| - Optimized for common image understanding queries |
| - May not generalize well to domain-specific terminology |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{headroom-technique-router, |
| title={Technique Router for Image Token Optimization}, |
| author={Headroom AI}, |
| year={2025}, |
| publisher={Hugging Face}, |
| url={https://huggingface.co/chopratejas/technique-router} |
| } |
| ``` |
| |