pranab2050 commited on
Commit
df6fef7
·
verified ·
1 Parent(s): 9952575

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - sdf
6
+ - classification
7
+ - qwen2.5
8
+ - gguf
9
+ - content-type
10
+ - web-content
11
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # SDF Classify
16
+
17
+ Content type classifier for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA.
18
+
19
+ ## Purpose
20
+
21
+ Classifies web content into SDF's hierarchical type system: 10 parent types and 50+ subtypes (e.g., `article.news`, `commerce.product`, `documentation.api_docs`).
22
+
23
+ ## Training
24
+
25
+ - **Base model**: Qwen2.5-1.5B-Instruct
26
+ - **Method**: QLoRA (rank 32, alpha 64, dropout 0.05)
27
+ - **Training data**: 2,335 classified web documents
28
+ - **Accuracy**: 95.2% exact type match
29
+
30
+ ## Files
31
+
32
+ | File | Size | Description |
33
+ |------|------|-------------|
34
+ | `sdf-classify-Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` | 941 MB | Quantized (Q4_K_M) — recommended for deployment |
35
+ | `sdf-classify-Qwen2.5-1.5B-Instruct-f16.gguf` | 2.9 GB | Full precision (f16) |
36
+ | `Modelfile` | — | Ollama import configuration |
37
+
38
+ ## Usage with Ollama
39
+
40
+ ```bash
41
+ # Download the Q4_K_M file, then:
42
+ ollama create sdf-classify -f Modelfile
43
+ ```
44
+
45
+ ## Part of SDF Protocol
46
+
47
+ - **Protocol**: [sdfprotocol.org](https://sdfprotocol.org)
48
+ - **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf)
49
+ - **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223)
50
+ - **Extractor model**: [pranab2050/sdf-extract](https://huggingface.co/pranab2050/sdf-extract)
51
+
52
+ ## Citation
53
+
54
+ ```bibtex
55
+ @article{sarkar2026sdf,
56
+ title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages},
57
+ author={Sarkar, Pranab},
58
+ year={2026},
59
+ doi={10.5281/zenodo.18559223},
60
+ publisher={Zenodo}
61
+ }
62
+ ```