pranab2050 commited on
Commit
60309be
Β·
verified Β·
1 Parent(s): 02fe67d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - sdf
6
+ - extraction
7
+ - smollm3
8
+ - gguf
9
+ - structured-data
10
+ - web-content
11
+ base_model: HuggingFaceTB/SmolLM3-3B
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # SDF Extract
16
+
17
+ Structured data extractor for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from SmolLM3-3B using QLoRA.
18
+
19
+ ## Purpose
20
+
21
+ Extracts structured semantic data from web content: entities, claims, relationships, summaries, and type-specific fields. Takes the type classification from [sdf-classify](https://huggingface.co/pranab2050/sdf-classify) as input to condition extraction on the content type.
22
+
23
+ ## Training
24
+
25
+ - **Base model**: HuggingFaceTB/SmolLM3-3B
26
+ - **Method**: QLoRA (rank 32, alpha 64, dropout 0.05)
27
+ - **Training data**: 2,335 extracted web documents
28
+ - **Accuracy**: 90% exact extraction match across all field types
29
+
30
+ ## Files
31
+
32
+ | File | Size | Description |
33
+ |------|------|-------------|
34
+ | `sdf-extract-SmolLM3-3B-Q4_K_M.gguf` | 1.8 GB | Quantized (Q4_K_M) β€” recommended for deployment |
35
+ | `sdf-extract-SmolLM3-3B-f16.gguf` | 5.8 GB | Full precision (f16) |
36
+ | `Modelfile` | β€” | Ollama import configuration |
37
+
38
+ ## Usage with Ollama
39
+
40
+ ```bash
41
+ # Download the Q4_K_M file, then:
42
+ ollama create sdf-extract -f Modelfile
43
+ ```
44
+
45
+ ## Part of SDF Protocol
46
+
47
+ - **Protocol**: [sdfprotocol.org](https://sdfprotocol.org)
48
+ - **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf)
49
+ - **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223)
50
+ - **Classifier model**: [pranab2050/sdf-classify](https://huggingface.co/pranab2050/sdf-classify)
51
+
52
+ ## Citation
53
+
54
+ ```bibtex
55
+ @article{sarkar2026sdf,
56
+ title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages},
57
+ author={Sarkar, Pranab},
58
+ year={2026},
59
+ doi={10.5281/zenodo.18559223},
60
+ publisher={Zenodo}
61
+ }
62
+ ```