this is a classification model that sorts tweets/profiles off the probability that it is tech/crypto related. this was a model created for a job that fell short. this is a tf-idf model, distilled from a transformer model that I also made. maybe ill upload that soon
Techpto Classifier
This repository contains a lightweight production classifier for detecting whether X/Twitter posts and profiles are crypto-related, tech-related, both, or neither.
Files
text_classifier.json: Rust-compatible hashed logistic-regression classifier.model_config.json: labels, expected inputs, and recommended thresholds.distill_metrics.json: proxy evaluation metrics from distillation.recommended_thresholds_distillation.json: thresholds tuned against the V7 fastText teacher.full_run_manifest.json: counts and thresholds from the large full-corpus run.
Recommended Thresholds
The high-precision full-corpus run used:
{
"post_crypto": 0.85,
"post_tech": 0.90,
"profile_crypto": 0.90,
"profile_tech": 0.99
}
The original distillation-tuned thresholds were:
{
"post_crypto": 0.58,
"post_tech": 0.44,
"profile_crypto": 0.34,
"profile_tech": 0.38
}
Full-Corpus Run
Using the high-precision thresholds:
- Posts scanned:
928,484,069 - Post matches:
7,728,133 - Profiles scanned:
2,667,815,773 - Profile matches:
7,915,096
One corrupt post shard was skipped and is listed in full_run_manifest.json.
Notes
This is not a standard Transformers checkpoint. It is a compact hashed-feature linear classifier intended for very high-throughput local scanning. Metrics in distill_metrics.json are proxy metrics against teacher/weak labels rather than a final human-labeled benchmark.