--- language: en license: mit tags: - sdf - extraction - smollm3 - gguf - structured-data - web-content base_model: HuggingFaceTB/SmolLM3-3B pipeline_tag: text-generation --- # SDF Extract Structured data extractor for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from SmolLM3-3B using QLoRA. ## Purpose Extracts structured semantic data from web content: entities, claims, relationships, summaries, and type-specific fields. Takes the type classification from [sdf-classify](https://huggingface.co/pranab2050/sdf-classify) as input to condition extraction on the content type. ## Training - **Base model**: HuggingFaceTB/SmolLM3-3B - **Method**: QLoRA (rank 32, alpha 64, dropout 0.05) - **Training data**: 2,335 extracted web documents - **Accuracy**: 90% exact extraction match across all field types ## Files | File | Size | Description | |------|------|-------------| | `sdf-extract-SmolLM3-3B-Q4_K_M.gguf` | 1.8 GB | Quantized (Q4_K_M) — recommended for deployment | | `sdf-extract-SmolLM3-3B-f16.gguf` | 5.8 GB | Full precision (f16) | | `Modelfile` | — | Ollama import configuration | ## Usage with Ollama ```bash # Download the Q4_K_M file, then: ollama create sdf-extract -f Modelfile ``` ## Part of SDF Protocol - **Protocol**: [sdfprotocol.org](https://sdfprotocol.org) - **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf) - **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223) - **Classifier model**: [pranab2050/sdf-classify](https://huggingface.co/pranab2050/sdf-classify) ## Citation ```bibtex @article{sarkar2026sdf, title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages}, author={Sarkar, Pranab}, year={2026}, doi={10.5281/zenodo.18559223}, publisher={Zenodo} } ```