File size: 3,082 Bytes
e94dfa9
b294c3d
 
 
 
e94dfa9
b294c3d
e94dfa9
 
b294c3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
base_model:
- microsoft/MiniLM-L12-H384-uncased
language:
- en
library_name: transformers
license: apache-2.0
---

# Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification

This is a fine-tuned LoRA (Low-Rank Adaptation) classifier based on MiniLM (microsoft/MiniLM-L12-H384-uncased), designed for multi-label content classification using the IAB content taxonomy. The model can assign one or more categories to input text โ€” making it suitable for tasks such as content classification.

๐Ÿ” Model Details
Model Description

This model is based on microsoft/MiniLM-L12-H384-uncased, a compact and efficient transformer model optimized for fast inference and low memory footprint. It has been fine-tuned using LoRA for multi-label classification over 20 IAB categories plus an "inconclusive" fallback class.

The model predicts multiple applicable content labels from:

    inconclusive
    animals
    arts
    autos
    business
    career
    education
    fashion
    finance
    food
    government
    health
    hobbies
    home
    news
    realestate
    society
    sports
    tech
    travel
    
Key Configuration:

    Base Model: microsoft/MiniLM-L12-H384-uncased
    Task: Multi-label content classification
    Label Count: 21 (multi-hot vector)
    Language: English
    Fine-tuning Method: PEFT with LoRA
    LoRA Config:
        r=16
        lora_alpha=16
        lora_dropout=0.1
        target_modules=["query", "key"]
    Developed by: Mozilla
    License: Apache-2.0

๐Ÿ“ฆ Model Sources:
    Demo (optional): [Hugging Face Space](https://huggingface.co/spaces/chidamnat2002/iab_content_classifier)


๐Ÿ“ฅ Usage:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("Mozilla/content-multilabel-iab-classifier")
tokenizer = AutoTokenizer.from_pretrained("Mozilla/content-multilabel-iab-classifier")

label_list = [
    'inconclusive',
    'animals',
    'arts',
    'autos',
    'business',
    'career',
    'education',
    'fashion',
    'finance',
    'food',
    'government',
    'health',
    'hobbies',
    'home',
    'news',
    'realestate',
    'society',
    'sports',
    'tech',
    'travel'
]
label2id = {label: idx for idx, label in enumerate(label_list)}
id2label = {idx: label for label, idx in label2id.items()}

text = "Discover the latest trends in AI and wearable technology."

with torch.no_grad():
    inputs = tokenizer(text, return_tensors="pt", truncation=True)
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits).squeeze().cpu().numpy()
    predicted_labels = [(id2label[i], round(p, 3)) for i, p in enumerate(probs) if p >= 0.5]
```

๐Ÿ“– Citation

If you use this model, please cite it as:
```
@misc{mozilla_iab_multilabel_lora,
  title       = {Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification},
  author      = {Mozilla},
  year        = {2025},
  url         = {https://huggingface.co/mozilla/content-multilabel-iab-classifier},
  license     = {Apache-2.0}
}
```