sdf-extract / README.md
pranab2050's picture
Upload README.md with huggingface_hub
60309be verified
---
language: en
license: mit
tags:
- sdf
- extraction
- smollm3
- gguf
- structured-data
- web-content
base_model: HuggingFaceTB/SmolLM3-3B
pipeline_tag: text-generation
---
# SDF Extract
Structured data extractor for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from SmolLM3-3B using QLoRA.
## Purpose
Extracts structured semantic data from web content: entities, claims, relationships, summaries, and type-specific fields. Takes the type classification from [sdf-classify](https://huggingface.co/pranab2050/sdf-classify) as input to condition extraction on the content type.
## Training
- **Base model**: HuggingFaceTB/SmolLM3-3B
- **Method**: QLoRA (rank 32, alpha 64, dropout 0.05)
- **Training data**: 2,335 extracted web documents
- **Accuracy**: 90% exact extraction match across all field types
## Files
| File | Size | Description |
|------|------|-------------|
| `sdf-extract-SmolLM3-3B-Q4_K_M.gguf` | 1.8 GB | Quantized (Q4_K_M) β€” recommended for deployment |
| `sdf-extract-SmolLM3-3B-f16.gguf` | 5.8 GB | Full precision (f16) |
| `Modelfile` | β€” | Ollama import configuration |
## Usage with Ollama
```bash
# Download the Q4_K_M file, then:
ollama create sdf-extract -f Modelfile
```
## Part of SDF Protocol
- **Protocol**: [sdfprotocol.org](https://sdfprotocol.org)
- **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf)
- **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223)
- **Classifier model**: [pranab2050/sdf-classify](https://huggingface.co/pranab2050/sdf-classify)
## Citation
```bibtex
@article{sarkar2026sdf,
title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages},
author={Sarkar, Pranab},
year={2026},
doi={10.5281/zenodo.18559223},
publisher={Zenodo}
}
```