Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification

tanaos-topic-classification-v1: A small but performant topic classification model

This model was created by Tanaos with the Artifex Python library.

This is a topic classification model based on FacebookAI/roberta-base and fine-tuned on a synthetic dataset to classify text into one of 15 different intent categories:

Topic Description
politics elections, policies, scandals, ideology.
health physical health, mental health, fitness, diets, medical advice.
technology gadgets, software, AI, cybersecurity.
entertainment movies, TV shows, music, celebrities, streaming platforms.
money_finance investing, budgeting, crypto, real estate.
relationships_dating romance, breakups, marriage, family drama.
education_learning schools, universities, self-study, online courses.,
work_careers job hunting, workplace culture, remote work, career advice.
science research, space, climate, biology, physics, chemistry and the scientific method.
society_culture identity, inequality, norms, language, and society.
gaming video games, esports, hardware, mods, and gaming culture.
lifestyle_hobbies travel, food, fashion, DIY, productivity systems.
sports teams, athletes, events, scores, and sports culture.
automotive cars, motorcycles, reviews, maintenance, and industry news.
other miscellaneous topics not covered by the other categories.

How to Use

Via the Artifex library (pip install artifex)

from artifex import Artifex

topic_classification = Artifex().topic_classification

print(topic_classification("What do you think about the latest AI advancements?"))
# >>> [{'label': 'technology', 'score': 0.9910}]

Via the Transformers library

from transformers import pipeline

clf = pipeline("text-classification", model="tanaos/tanaos-topic-classification-v1")

print(clf("What do you think about the latest AI advancements?"))
# >>> [{'label': 'technology', 'score': 0.9910}]

Model Description

  • Base model: FacebookAI/roberta-base
  • Task: Text classification (topic classification)
  • Languages: English
  • Fine-tuning data: A synthetic, custom dataset of 10,000 utterances, each belonging to one of 15 different topic categories.

Training Details

This model was trained using the Artifex Python library

pip install artifex

by providing the following instructions and generating 10,000 synthetic training samples:

from artifex import Artifex


topic_classification = Artifex().topic_classification

topic_classification.train(
    domain="general",
    classes={
        "politics": "elections, policies, scandals, ideology",
        "health": "physical health, mental health, fitness, diets, medical advice.",
        "technology": "gadgets, software, AI, cybersecurity.",
        "entertainment": "movies, TV shows, music, celebrities, streaming platforms.",
        "money_finance": "investing, budgeting, crypto, real estate.",
        "relationships_dating": "romance, breakups, marriage, family drama.",
        "education_learning": "schools, universities, self-study, online courses.",
        "work_careers": "job hunting, workplace culture, remote work, career advice.",
        "science": "research, space, climate, biology, physics, chemistry and the scientific method.",
        "society_culture": "identity, inequality, norms, language, and society.",
        "gaming": "video games, esports, hardware, mods, and gaming culture.",
        "lifestyle_hobbies": "travel, food, fashion, DIY, productivity systems.",
        "sports": "teams, athletes, events, scores, and sports culture.",
        "automotive": "cars, motorcycles, reviews, maintenance, and industry news.",
        "other": "miscellaneous topics not covered by the other categories."
    },
    num_samples=10000
)

Intended Uses

This model is intended to:

  • Classify conversations, reviews, articles, or any text into one of the predefined topic categories.
  • Be used in applications such as chatbots, content categorization, and sentiment analysis.
  • Serve as a lightweight alternative for topic classification tasks.

Not intended for:

  • Use cases requiring extremely high accuracy or domain-specific knowledge without further fine-tuning.
Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tanaos/tanaos-topic-classification-v1

Finetuned
(2057)
this model

Dataset used to train tanaos/tanaos-topic-classification-v1