Theme classification model (multi-label)

This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub.

Model details

Model architecture: bert-base-uncased (fine-tuned)
Problem type: multi-label classification
Labels: mentorship, entrepreneurship, startup success
Training data: train_theme.jsonl (included)
Final evaluation (example run):
- eval_loss: 0.1822
- eval_micro/f1: 1.0
- eval_macro/f1: 1.0

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "4nkh/theme_model"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."]
inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).int()
    print('probs', probs.numpy(), 'preds', preds.numpy())

Notes

This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed.
If you want to re-train or fine-tune further, see train_theme_model.py in this folder.

License

Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.

Downloads last month: 20

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for 4nkh/theme_model

Base model

google-bert/bert-base-uncased

Finetuned

(6787)

this model

4nkh
/

theme_model

Theme classification model (multi-label)

Model tree for 4nkh/theme_model

Dataset used to train 4nkh/theme_model