|
|
--- |
|
|
base_model: |
|
|
- microsoft/MiniLM-L12-H384-uncased |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification |
|
|
|
|
|
This is a fine-tuned LoRA (Low-Rank Adaptation) classifier based on MiniLM (microsoft/MiniLM-L12-H384-uncased), designed for multi-label content classification using the IAB content taxonomy. The model can assign one or more categories to input text β making it suitable for tasks such as content classification. |
|
|
|
|
|
π Model Details |
|
|
Model Description |
|
|
|
|
|
This model is based on microsoft/MiniLM-L12-H384-uncased, a compact and efficient transformer model optimized for fast inference and low memory footprint. It has been fine-tuned using LoRA for multi-label classification over 20 IAB categories plus an "inconclusive" fallback class. |
|
|
|
|
|
The model predicts multiple applicable content labels from: |
|
|
|
|
|
inconclusive |
|
|
animals |
|
|
arts |
|
|
autos |
|
|
business |
|
|
career |
|
|
education |
|
|
fashion |
|
|
finance |
|
|
food |
|
|
government |
|
|
health |
|
|
hobbies |
|
|
home |
|
|
news |
|
|
realestate |
|
|
society |
|
|
sports |
|
|
tech |
|
|
travel |
|
|
|
|
|
Key Configuration: |
|
|
|
|
|
Base Model: microsoft/MiniLM-L12-H384-uncased |
|
|
Task: Multi-label content classification |
|
|
Label Count: 21 (multi-hot vector) |
|
|
Language: English |
|
|
Fine-tuning Method: PEFT with LoRA |
|
|
LoRA Config: |
|
|
r=16 |
|
|
lora_alpha=16 |
|
|
lora_dropout=0.1 |
|
|
target_modules=["query", "key"] |
|
|
Developed by: Mozilla |
|
|
License: Apache-2.0 |
|
|
|
|
|
π¦ Model Sources: |
|
|
Demo (optional): [Hugging Face Space](https://huggingface.co/spaces/chidamnat2002/iab_content_classifier) |
|
|
|
|
|
|
|
|
π₯ Usage: |
|
|
``` |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("Mozilla/content-multilabel-iab-classifier") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Mozilla/content-multilabel-iab-classifier") |
|
|
|
|
|
label_list = [ |
|
|
'inconclusive', |
|
|
'animals', |
|
|
'arts', |
|
|
'autos', |
|
|
'business', |
|
|
'career', |
|
|
'education', |
|
|
'fashion', |
|
|
'finance', |
|
|
'food', |
|
|
'government', |
|
|
'health', |
|
|
'hobbies', |
|
|
'home', |
|
|
'news', |
|
|
'realestate', |
|
|
'society', |
|
|
'sports', |
|
|
'tech', |
|
|
'travel' |
|
|
] |
|
|
label2id = {label: idx for idx, label in enumerate(label_list)} |
|
|
id2label = {idx: label for label, idx in label2id.items()} |
|
|
|
|
|
text = "Discover the latest trends in AI and wearable technology." |
|
|
|
|
|
with torch.no_grad(): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True) |
|
|
outputs = model(**inputs) |
|
|
probs = torch.sigmoid(outputs.logits).squeeze().cpu().numpy() |
|
|
predicted_labels = [(id2label[i], round(p, 3)) for i, p in enumerate(probs) if p >= 0.5] |
|
|
``` |
|
|
|
|
|
π Citation |
|
|
|
|
|
If you use this model, please cite it as: |
|
|
``` |
|
|
@misc{mozilla_iab_multilabel_lora, |
|
|
title = {Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification}, |
|
|
author = {Mozilla}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/mozilla/content-multilabel-iab-classifier}, |
|
|
license = {Apache-2.0} |
|
|
} |
|
|
``` |