---
license: mit
language:
- en
tags:
- text-classification
- onnx
- bert
- torrent
- content-classification
base_model: prajjwal1/bert-tiny
pipeline_tag: text-classification
---

# BERT Torrent Classifier

A fine-tuned BERT-tiny model for classifying torrent content into media types.

## Model Details

- **Base model:** [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny)
- **Task:** Multi-class text classification
- **Labels:** audio, video, software, book, other
- **Format:** ONNX (with embedded weights)
- **Size:** ~17MB

## Training

- **Training data:** ~10k torrent names with 4-LLM consensus voting
- **LLM ensemble:** qwen2.5:3b, gemma3:4b, mistral:7b, qwen3-coder:30b
- **Consensus rules:** 4-agree = high confidence, 3v1 = majority vote, 2v2 = discarded
- **Accuracy:** ~92% on held-out test set

## Usage

This model is designed for use with [mimmo](https://github.com/lelloman/mimmo), a Rust library for torrent content classification. The ONNX model is embedded directly in the binary at compile time.

```rust
// Model is automatically downloaded during build
const MODEL_BYTES: &[u8] = include_bytes!("../models/bert/model_embedded.onnx");
const TOKENIZER_JSON: &str = include_str!("../models/bert/tokenizer.json");
```

## Performance

- Inference: <10ms per sample (CPU)
- Used as ML fallback when pattern matching is inconclusive

## Files

- `model_embedded.onnx` - ONNX model with embedded weights
- `tokenizer.json` - HuggingFace tokenizer
- `vocab.txt` - Vocabulary file
- `config.json` - Model configuration

## License

MIT