Techpto Transformer

RoBERTa multi-label classifier for detecting whether X/Twitter posts or profile text are crypto-related, tech-related, both, or neither.

This is the V5B transformer checkpoint trained for the Techpto classifier project. It was later distilled into the faster techpto-classifier hashed linear model for full-corpus scanning.

Labels

crypto
tech

The model uses sigmoid probabilities, not softmax. A text can match neither label, one label, or both labels.

Files

model.safetensors: transformer weights.
config.json: RobertaForSequenceClassification config with problem_type = multi_label_classification.
tokenizer.json and tokenizer_config.json: tokenizer files.
metrics.json: full training/eval metrics.
classification_thresholds.json: recommended threshold sets.

Recommended Thresholds

For high precision on the test split:

{
  "crypto": 0.80,
  "tech": 0.86
}

For best F1 on the test split:

{
  "crypto": 0.41,
  "tech": 0.57
}

For higher recall / F2:

{
  "crypto": 0.12,
  "tech": 0.16
}

Test Metrics

At the higher-recall thresholds stored in metrics.json:

Exact match accuracy: 0.9071
Micro F1: 0.9200
Macro F1: 0.9099

At the high-precision threshold set:

Exact match accuracy: 0.9305
Micro precision: 0.9704
Micro recall: 0.8873
Micro F1: 0.9270
Macro F1: 0.9186

Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "pompompur-in/techpto-transformer"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
model.eval()

text = "Building a new AI agent workflow for crypto wallet monitoring."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=192)

with torch.no_grad():
    logits = model(**inputs).logits[0]
    probs = torch.sigmoid(logits)

labels = ["crypto", "tech"]
thresholds = {"crypto": 0.80, "tech": 0.86}
predictions = {
    label: float(prob) >= thresholds[label]
    for label, prob in zip(labels, probs)
}

print(dict(zip(labels, map(float, probs))))
print(predictions)

Notes

This checkpoint is intended for classification/research workflows over social text. It is not a general-purpose language model.

Downloads last month: -