|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- sdf |
|
|
- classification |
|
|
- qwen2.5 |
|
|
- gguf |
|
|
- content-type |
|
|
- web-content |
|
|
base_model: Qwen/Qwen2.5-1.5B-Instruct |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# SDF Classify |
|
|
|
|
|
Content type classifier for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA. |
|
|
|
|
|
## Purpose |
|
|
|
|
|
Classifies web content into SDF's hierarchical type system: 10 parent types and 50+ subtypes (e.g., `article.news`, `commerce.product`, `documentation.api_docs`). |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base model**: Qwen2.5-1.5B-Instruct |
|
|
- **Method**: QLoRA (rank 32, alpha 64, dropout 0.05) |
|
|
- **Training data**: 2,335 classified web documents |
|
|
- **Accuracy**: 95.2% exact type match |
|
|
|
|
|
## Files |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `sdf-classify-Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` | 941 MB | Quantized (Q4_K_M) — recommended for deployment | |
|
|
| `sdf-classify-Qwen2.5-1.5B-Instruct-f16.gguf` | 2.9 GB | Full precision (f16) | |
|
|
| `Modelfile` | — | Ollama import configuration | |
|
|
|
|
|
## Usage with Ollama |
|
|
|
|
|
```bash |
|
|
# Download the Q4_K_M file, then: |
|
|
ollama create sdf-classify -f Modelfile |
|
|
``` |
|
|
|
|
|
## Part of SDF Protocol |
|
|
|
|
|
- **Protocol**: [sdfprotocol.org](https://sdfprotocol.org) |
|
|
- **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf) |
|
|
- **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223) |
|
|
- **Extractor model**: [pranab2050/sdf-extract](https://huggingface.co/pranab2050/sdf-extract) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{sarkar2026sdf, |
|
|
title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages}, |
|
|
author={Sarkar, Pranab}, |
|
|
year={2026}, |
|
|
doi={10.5281/zenodo.18559223}, |
|
|
publisher={Zenodo} |
|
|
} |
|
|
``` |
|
|
|