DuoNeural commited on
Commit
1ba019a
·
verified ·
1 Parent(s): b268180

Add SmolLM2-135M-Instruct-LiteRT GGUF Q4_K_M conversion

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ SmolLM2-135M-Instruct-LiteRT_Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
DONE.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ converted 2026-05-06 06:08:19
2
+ size_mb: 105
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - duoneural
6
+ - litert
7
+ - edge
8
+ - gguf
9
+ - on-device
10
+ - smollm
11
+ - smol
12
+ - tiny
13
+ - litert
14
+ - edge
15
+ - instruct
16
+ base_model: HuggingFaceTB/SmolLM2-135M-Instruct
17
+ pipeline_tag: text-generation
18
+ license: apache-2.0
19
+ ---
20
+
21
+ # SmolLM2-135M-Instruct-LiteRT
22
+
23
+ **SmolLM2 135M Instruct — ultra-tiny on-device assistant (~90MB)** — converted for mobile and edge deployment by [DuoNeural](https://huggingface.co/DuoNeural).
24
+
25
+ - **Source model:** [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
26
+ - **Format:** GGUF Q4_K_M (llama.cpp-compatible)
27
+ - **File size:** 105 MB
28
+ - **Quantization:** 4-bit K-mean (Q4_K_M) — excellent accuracy/size trade-off for edge devices
29
+ - **Target platforms:** Android, iOS, desktop edge inference
30
+ - **Converted:** 2026-05-06 06:08:19 by Archon / DuoNeural
31
+
32
+ ## Usage
33
+
34
+ ### llama.cpp (CLI)
35
+ ```bash
36
+ ./llama-cli -m SmolLM2-135M-Instruct-LiteRT_Q4_K_M.gguf -n 512 --temp 0.7
37
+ ```
38
+
39
+ ### Google AI Edge / MediaPipe (Android/iOS)
40
+ This GGUF is compatible with [MLC-LLM](https://github.com/mlc-ai/mlc-llm) and [llama.cpp Android bindings](https://github.com/ggerganov/llama.cpp) for on-device inference. For use with [Google Edge Gallery](https://ai.google.dev/edge/gallery), convert to `.task` bundle using MediaPipe LLM conversion tools.
41
+
42
+ ### Python via llama-cpp-python
43
+ ```python
44
+ from llama_cpp import Llama
45
+
46
+ llm = Llama(
47
+ model_path="SmolLM2-135M-Instruct-LiteRT_Q4_K_M.gguf",
48
+ n_ctx=2048,
49
+ n_threads=4,
50
+ verbose=False,
51
+ )
52
+
53
+ response = llm.create_chat_completion(
54
+ messages=[
55
+ {"role": "system", "content": "You are a helpful assistant."},
56
+ {"role": "user", "content": "Hello! How can you help me today?"},
57
+ ]
58
+ )
59
+ print(response["choices"][0]["message"]["content"])
60
+ ```
61
+
62
+ ### Ollama
63
+ ```bash
64
+ ollama run hf.co/DuoNeural/SmolLM2-135M-Instruct-LiteRT
65
+ ```
66
+
67
+ ## About the Conversion
68
+
69
+ Converted using [llama.cpp](https://github.com/ggerganov/llama.cpp) GGUF pipeline with CUDA acceleration.
70
+ Source weights downloaded from HuggingFace, converted to F16 GGUF, then quantized to Q4_K_M.
71
+
72
+ ---
73
+
74
+ ## DuoNeural
75
+
76
+ **DuoNeural** is an open AI research lab — human + AI in collaboration.
77
+
78
+ | Platform | Link |
79
+ |----------|------|
80
+ | HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
81
+ | Website | [duoneural.com](https://duoneural.com) |
82
+ | GitHub | [github.com/DuoNeural](https://github.com/DuoNeural) |
83
+ | X / Twitter | [@DuoNeural](https://x.com/DuoNeural) |
84
+ | Email | duoneural@proton.me |
85
+ | Newsletter | [duoneural.beehiiv.com](https://duoneural.beehiiv.com) |
86
+ | Support | [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) |
87
+
88
+ ### DuoNeural Research Publications
89
+
90
+ | Title | DOI |
91
+ |-------|-----|
92
+ | [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622) | [10.5281/zenodo.19775622](https://doi.org/10.5281/zenodo.19775622) |
93
+ | [Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments](https://doi.org/10.5281/zenodo.19810620) | [10.5281/zenodo.19810620](https://doi.org/10.5281/zenodo.19810620) |
94
+ | [Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?](https://doi.org/10.5281/zenodo.19846804) | [10.5281/zenodo.19846804](https://doi.org/10.5281/zenodo.19846804) |
95
+ | [The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems](https://doi.org/10.5281/zenodo.19952612) | [10.5281/zenodo.19952612](https://doi.org/10.5281/zenodo.19952612) |
96
+
97
+ *Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.*
98
+
99
+ ### Research Team
100
+ - **Jesse** — Vision, hardware, direction
101
+ - **Archon** — Lab Director, post-training, abliteration, experiments
102
+ - **Aura** — Research AI, literature synthesis, novel proposals
103
+
104
+ *Subscribe to the lab newsletter at [duoneural.beehiiv.com](https://duoneural.beehiiv.com) for model drops before they go anywhere else.*
SmolLM2-135M-Instruct-LiteRT_Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1273c53b7d32828a61a093cd4288e821fbe8e670d04898e0e573c96b0e19e32c
3
+ size 105454560