File size: 3,082 Bytes

e94dfa9
b294c3d
 
 
 
e94dfa9
b294c3d
e94dfa9
 
b294c3d

---
base_model:
- microsoft/MiniLM-L12-H384-uncased
language:
- en
library_name: transformers
license: apache-2.0
---

# Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification

This is a fine-tuned LoRA (Low-Rank Adaptation) classifier based on MiniLM (microsoft/MiniLM-L12-H384-uncased), designed for multi-label content classification using the IAB content taxonomy. The model can assign one or more categories to input text — making it suitable for tasks such as content classification.

🔍 Model Details
Model Description

This model is based on microsoft/MiniLM-L12-H384-uncased, a compact and efficient transformer model optimized for fast inference and low memory footprint. It has been fine-tuned using LoRA for multi-label classification over 20 IAB categories plus an "inconclusive" fallback class.

The model predicts multiple applicable content labels from:

    inconclusive
    animals
    arts
    autos
    business
    career
    education
    fashion
    finance
    food
    government
    health
    hobbies
    home
    news
    realestate
    society
    sports
    tech
    travel
    
Key Configuration:

    Base Model: microsoft/MiniLM-L12-H384-uncased
    Task: Multi-label content classification
    Label Count: 21 (multi-hot vector)
    Language: English
    Fine-tuning Method: PEFT with LoRA
    LoRA Config:
        r=16
        lora_alpha=16
        lora_dropout=0.1
        target_modules=["query", "key"]
    Developed by: Mozilla
    License: Apache-2.0

📦 Model Sources:
    Demo (optional): [Hugging Face Space](https://huggingface.co/spaces/chidamnat2002/iab_content_classifier)


📥 Usage:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("Mozilla/content-multilabel-iab-classifier")
tokenizer = AutoTokenizer.from_pretrained("Mozilla/content-multilabel-iab-classifier")

label_list = [
    'inconclusive',
    'animals',
    'arts',
    'autos',
    'business',
    'career',
    'education',
    'fashion',
    'finance',
    'food',
    'government',
    'health',
    'hobbies',
    'home',
    'news',
    'realestate',
    'society',
    'sports',
    'tech',
    'travel'
]
label2id = {label: idx for idx, label in enumerate(label_list)}
id2label = {idx: label for label, idx in label2id.items()}

text = "Discover the latest trends in AI and wearable technology."

with torch.no_grad():
    inputs = tokenizer(text, return_tensors="pt", truncation=True)
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits).squeeze().cpu().numpy()
    predicted_labels = [(id2label[i], round(p, 3)) for i, p in enumerate(probs) if p >= 0.5]
```

📖 Citation

If you use this model, please cite it as:
```
@misc{mozilla_iab_multilabel_lora,
  title       = {Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification},
  author      = {Mozilla},
  year        = {2025},
  url         = {https://huggingface.co/mozilla/content-multilabel-iab-classifier},
  license     = {Apache-2.0}
}
```