Techpto Transformer
RoBERTa multi-label classifier for detecting whether X/Twitter posts or profile text are crypto-related, tech-related, both, or neither.
This is the V5B transformer checkpoint trained for the Techpto classifier project. It was later distilled into the faster techpto-classifier hashed linear model for full-corpus scanning.
Labels
cryptotech
The model uses sigmoid probabilities, not softmax. A text can match neither label, one label, or both labels.
Files
model.safetensors: transformer weights.config.json:RobertaForSequenceClassificationconfig withproblem_type = multi_label_classification.tokenizer.jsonandtokenizer_config.json: tokenizer files.metrics.json: full training/eval metrics.classification_thresholds.json: recommended threshold sets.
Recommended Thresholds
For high precision on the test split:
{
"crypto": 0.80,
"tech": 0.86
}
For best F1 on the test split:
{
"crypto": 0.41,
"tech": 0.57
}
For higher recall / F2:
{
"crypto": 0.12,
"tech": 0.16
}
Test Metrics
At the higher-recall thresholds stored in metrics.json:
- Exact match accuracy:
0.9071 - Micro F1:
0.9200 - Macro F1:
0.9099
At the high-precision threshold set:
- Exact match accuracy:
0.9305 - Micro precision:
0.9704 - Micro recall:
0.8873 - Micro F1:
0.9270 - Macro F1:
0.9186
Usage
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
repo_id = "pompompur-in/techpto-transformer"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
model.eval()
text = "Building a new AI agent workflow for crypto wallet monitoring."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=192)
with torch.no_grad():
logits = model(**inputs).logits[0]
probs = torch.sigmoid(logits)
labels = ["crypto", "tech"]
thresholds = {"crypto": 0.80, "tech": 0.86}
predictions = {
label: float(prob) >= thresholds[label]
for label, prob in zip(labels, probs)
}
print(dict(zip(labels, map(float, probs))))
print(predictions)
Notes
This checkpoint is intended for classification/research workflows over social text. It is not a general-purpose language model.
- Downloads last month
- -