chidamnat2002
/

content-multilabel-iab-classifier

Text Classification

Model card Files Files and versions

content-multilabel-iab-classifier / README.md

chidamnat2002's picture

Update README.md

b294c3d verified 6 months ago

|

history blame contribute delete

3.08 kB

	---
	base_model:
	- microsoft/MiniLM-L12-H384-uncased
	language:
	- en
	library_name: transformers
	license: apache-2.0
	---

	# Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification

	This is a fine-tuned LoRA (Low-Rank Adaptation) classifier based on MiniLM (microsoft/MiniLM-L12-H384-uncased), designed for multi-label content classification using the IAB content taxonomy. The model can assign one or more categories to input text — making it suitable for tasks such as content classification.

	🔍 Model Details
	Model Description

	This model is based on microsoft/MiniLM-L12-H384-uncased, a compact and efficient transformer model optimized for fast inference and low memory footprint. It has been fine-tuned using LoRA for multi-label classification over 20 IAB categories plus an "inconclusive" fallback class.

	The model predicts multiple applicable content labels from:

	inconclusive
	animals
	arts
	autos
	business
	career
	education
	fashion
	finance
	food
	government
	health
	hobbies
	home
	news
	realestate
	society
	sports
	tech
	travel

	Key Configuration:

	Base Model: microsoft/MiniLM-L12-H384-uncased
	Task: Multi-label content classification
	Label Count: 21 (multi-hot vector)
	Language: English
	Fine-tuning Method: PEFT with LoRA
	LoRA Config:
	r=16
	lora_alpha=16
	lora_dropout=0.1
	target_modules=["query", "key"]
	Developed by: Mozilla
	License: Apache-2.0

	📦 Model Sources:
	Demo (optional): [Hugging Face Space](https://huggingface.co/spaces/chidamnat2002/iab_content_classifier)


	📥 Usage:
	```
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("Mozilla/content-multilabel-iab-classifier")
	tokenizer = AutoTokenizer.from_pretrained("Mozilla/content-multilabel-iab-classifier")

	label_list = [
	'inconclusive',
	'animals',
	'arts',
	'autos',
	'business',
	'career',
	'education',
	'fashion',
	'finance',
	'food',
	'government',
	'health',
	'hobbies',
	'home',
	'news',
	'realestate',
	'society',
	'sports',
	'tech',
	'travel'
	]
	label2id = {label: idx for idx, label in enumerate(label_list)}
	id2label = {idx: label for label, idx in label2id.items()}

	text = "Discover the latest trends in AI and wearable technology."

	with torch.no_grad():
	inputs = tokenizer(text, return_tensors="pt", truncation=True)
	outputs = model(**inputs)
	probs = torch.sigmoid(outputs.logits).squeeze().cpu().numpy()
	predicted_labels = [(id2label[i], round(p, 3)) for i, p in enumerate(probs) if p >= 0.5]
	```

	📖 Citation

	If you use this model, please cite it as:
	```
	@misc{mozilla_iab_multilabel_lora,
	title = {Fine-tuned LoRA Classifier on MiniLM for IAB Multi-Label Classification},
	author = {Mozilla},
	year = {2025},
	url = {https://huggingface.co/mozilla/content-multilabel-iab-classifier},
	license = {Apache-2.0}
	}
	```