techpto-classifier / README.md
pom
Add V7R tech crypto classifier
036970a
|
Raw
History Blame Contribute Delete
1.86 kB
metadata
license: mit
tags:
  - text-classification
  - crypto
  - technology
  - twitter
  - x
  - fasttext-distillation

this is a classification model that sorts tweets/profiles off the probability that it is tech/crypto related. this was a model created for a job that fell short. this is a tf-idf model, distilled from a transformer model that I also made. maybe ill upload that soon

Techpto Classifier

This repository contains a lightweight production classifier for detecting whether X/Twitter posts and profiles are crypto-related, tech-related, both, or neither.

Files

  • text_classifier.json: Rust-compatible hashed logistic-regression classifier.
  • model_config.json: labels, expected inputs, and recommended thresholds.
  • distill_metrics.json: proxy evaluation metrics from distillation.
  • recommended_thresholds_distillation.json: thresholds tuned against the V7 fastText teacher.
  • full_run_manifest.json: counts and thresholds from the large full-corpus run.

Recommended Thresholds

The high-precision full-corpus run used:

{
  "post_crypto": 0.85,
  "post_tech": 0.90,
  "profile_crypto": 0.90,
  "profile_tech": 0.99
}

The original distillation-tuned thresholds were:

{
  "post_crypto": 0.58,
  "post_tech": 0.44,
  "profile_crypto": 0.34,
  "profile_tech": 0.38
}

Full-Corpus Run

Using the high-precision thresholds:

  • Posts scanned: 928,484,069
  • Post matches: 7,728,133
  • Profiles scanned: 2,667,815,773
  • Profile matches: 7,915,096

One corrupt post shard was skipped and is listed in full_run_manifest.json.

Notes

This is not a standard Transformers checkpoint. It is a compact hashed-feature linear classifier intended for very high-throughput local scanning. Metrics in distill_metrics.json are proxy metrics against teacher/weak labels rather than a final human-labeled benchmark.