KennedyOfficaly commited on
Commit
51b1ed3
·
verified ·
1 Parent(s): 290cf32

Upload 12 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ axl_comment-q4_k_m_v2.gguf filter=lfs diff=lfs merge=lfs -text
37
+ axl-comment-f16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ axl-comment-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Modelfile ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ./axl_comment_5m-f16.gguf
2
+
3
+ TEMPLATE """{{ .System }}
4
+
5
+ User: {{ .Prompt }}
6
+ Assistant: """
7
+
8
+ SYSTEM """You are AXL-Comment-5M, a code generation assistant built by Koinic."""
9
+
10
+ PARAMETER temperature 0.8
11
+ PARAMETER top_k 40
12
+ PARAMETER top_p 0.9
13
+ PARAMETER repeat_penalty 1.1
14
+ PARAMETER num_ctx 256
README.md CHANGED
@@ -1,3 +1,190 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - code
5
+ tags:
6
+ - code-generation
7
+ - multi-scale-transformer
8
+ - cpu-optimized
9
+ - koinic
10
+ - pytorch
11
+ - llama
12
+ - gguf
13
+ - byte-level
14
+ - commenting
15
+ pipeline_tag: text-generation
16
+ library_name: transformers
17
+ datasets:
18
+ - bigcode/starcoderdata
19
+ - theblackcat102/evol-codealpaca-v1
20
+ widget:
21
+ - text: "Code:\ndef quicksort(arr):\n if len(arr) <= 1: return arr\nCommented:"
22
+ model-index:
23
+ - name: AXL-Comment-5M
24
+ results:
25
+ - task:
26
+ type: text-generation
27
+ metrics:
28
+ - name: Perplexity (byte-level)
29
+ type: perplexity
30
+ value: 1.01
31
  ---
32
+
33
+ # AXL-Comment-5M
34
+
35
+ Code commenting. 7.2M params. PPL 1.01. Context 512 bytes. Part of the AXL model family by [KoinicLabs](https://huggingface.co/KoinicLabs).
36
+
37
+ ## Model Details
38
+
39
+ | Property | Value |
40
+ |----------|-------|
41
+ | Developed by | [KoinicLabs](https://huggingface.co/KoinicLabs) |
42
+ | Architecture | Multi-Scale Transformer |
43
+ | Parameters | 7M |
44
+ | Optimizer | Lion |
45
+ | Attention | SDPA |
46
+ | Vocab Size | 258 (byte-level) |
47
+ | Context Window | 512 bytes |
48
+ | d_model | 192 |
49
+ | Attention Heads | 3 |
50
+ | Layers per Scale | 3 |
51
+ | Downsample Factors | [1, 2, 4] |
52
+ | License | Apache 2.0 |
53
+
54
+ ### Sources
55
+
56
+ - **Repository:** [GitHub](https://github.com/Koinic/AXL)
57
+ - **Organization:** [KoinicLabs](https://huggingface.co/KoinicLabs)
58
+
59
+ ## Uses
60
+
61
+ ### Direct Use
62
+
63
+ Code commenting.
64
+
65
+ ```python
66
+ import torch
67
+ from multiscale_transformer.model.model import MultiScaleTransformer
68
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
69
+ ckpt = torch.load("axl_comment_5m.pt", map_location="cpu")
70
+ model = MultiScaleTransformer(config)
71
+ model.load_state_dict(ckpt["model_state_dict"])
72
+ model.eval()
73
+ tokenizer = ByteTokenizer()
74
+ ids = torch.tensor([tokenizer.encode("def hello():")], dtype=torch.long)
75
+ with torch.no_grad():
76
+ out = model.generate(ids, max_new_tokens=50, temperature=0.8)
77
+ print(tokenizer.decode(out[0].tolist()))
78
+ ```
79
+
80
+ ### Out-of-Scope Use
81
+
82
+ Not for production code generation. Not for non-code NLP tasks. For integration with tools like Continue.dev, LlamaIndex, or LangChain, use the Python API server which provides OpenAI-compatible endpoints.
83
+
84
+ ## Bias, Risks, and Limitations
85
+
86
+ Byte-level perplexity is not comparable to BPE-level perplexity. Max context 512 bytes. Note: GGUF files for Ollama use a simplified single-stack encoder. For full AXL quality, use the Python API server.
87
+
88
+ ### Recommendations
89
+
90
+ - Use for prototyping and experimentation, not production code generation.
91
+ - Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab).
92
+ - For better results, use the Lion-optimized version if available.
93
+
94
+ ## Training Details
95
+
96
+ ### Training Data
97
+
98
+ Retrained with Lion on 20MB commenting pairs. 263 steps in 10 min.
99
+
100
+ ### Preprocessing
101
+
102
+ Byte-level tokenization with vocabulary size 258 (256 bytes + BOS + EOS). No vocabulary training required.
103
+
104
+ ### Speeds, Sizes, Times
105
+
106
+ | Metric | Value |
107
+ |--------|-------|
108
+ | Training Steps | 263 |
109
+ | Training Time | 10 min |
110
+ | Final Loss | 0.1476 |
111
+
112
+ ## Evaluation
113
+
114
+ ### Metrics
115
+
116
+ Perplexity on held-out Python code using byte-level tokenization.
117
+
118
+ ### Results
119
+
120
+ | Metric | Value |
121
+ |--------|-------|
122
+ | Perplexity (byte-level) | 1.01 |
123
+ | Final Loss | 0.1476 |
124
+ | Training Steps | 263 |
125
+ | Training Time | 10 min |
126
+
127
+ **Summary:** Adds inline comments to explain code logic.
128
+
129
+ ## Environmental Impact
130
+
131
+ | Property | Value |
132
+ |----------|-------|
133
+ | Hardware | AMD Ryzen 5 5600G |
134
+ | Hours Used | 0.167 |
135
+ | Carbon Emitted | 0.0070 kg CO2 |
136
+ | Cloud Provider | None (local CPU) |
137
+
138
+ ## Technical Specifications
139
+
140
+ ### Model Architecture
141
+
142
+ Multi-Scale Transformer with three parallel encoder stacks at resolution scales 1x, 2x, and 4x. Cross-scale attention connects all scale pairs. Adaptive gating fusion. SwiGLU feed-forward. RoPE positional encoding.
143
+
144
+ ### Compute Infrastructure
145
+
146
+ | Property | Value |
147
+ |----------|-------|
148
+ | Hardware | AMD Ryzen 5 5600G (6 cores, 12 threads) |
149
+ | RAM | 16 GB |
150
+ | GPU | None (CPU-only) |
151
+
152
+ ## Citation
153
+
154
+ ```bibtex
155
+ @misc{axl_2026,
156
+ title={AXL: AXL-Comment-5M - Multi-Scale Transformer for CPU Code Generation},
157
+ author={Koinic},
158
+ year={2026},
159
+ url={https://huggingface.co/KoinicLabs}
160
+ }
161
+ ```
162
+
163
+ ## How to Get Started
164
+
165
+ ### With Ollama
166
+
167
+ ```bash
168
+ ollama create axl-comment-5m -f Modelfile
169
+ ollama run axl-comment-5m "def fibonacci():"
170
+ ```
171
+
172
+ ### With Python
173
+
174
+ ```python
175
+ import torch
176
+ from multiscale_transformer.model.config import load_config
177
+ from multiscale_transformer.model.model import MultiScaleTransformer
178
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
179
+ config = load_config("config.json")
180
+ model = MultiScaleTransformer(config)
181
+ ckpt = torch.load("axl_comment_5m.pt", map_location="cpu")
182
+ model.load_state_dict(ckpt["model_state_dict"])
183
+ model.eval()
184
+ tokenizer = ByteTokenizer()
185
+ prompt = "def fibonacci():"
186
+ ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
187
+ with torch.no_grad():
188
+ out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
189
+ print(tokenizer.decode(out[0].tolist()))
190
+ ```
axl-comment-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88cc145b217319b19d85d3e929f0b669ebffcd608c576d8da59c823bfd31691b
3
+ size 14483552
axl-comment-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88cc145b217319b19d85d3e929f0b669ebffcd608c576d8da59c823bfd31691b
3
+ size 14483552
axl_comment-q4_k_m_v2.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cacc69f96c6f9d78ac231220531ec12dc8d395c200ab7f3fbaf054e465c0e235
3
+ size 4754400
axl_comment.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c81e6ba4d7f6a25cdaabe62b0cb97cab5b1821bbf93cc3beb0ed04d6143a95b9
3
+ size 28806303
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "multiscale_transformer",
3
+ "architectures": [
4
+ "MultiScaleForCausalLM"
5
+ ],
6
+ "vocab_size": 258,
7
+ "d_model": 192,
8
+ "n_heads": 3,
9
+ "d_ff": 512,
10
+ "n_layers_per_scale": 3,
11
+ "n_cross_attn_layers": 1,
12
+ "max_seq_len": 512,
13
+ "dropout": 0.0,
14
+ "bias": false,
15
+ "rope_theta": 10000.0,
16
+ "downsample_factors": [
17
+ 1,
18
+ 2,
19
+ 4
20
+ ],
21
+ "num_parameters": 6732096,
22
+ "training_results": {
23
+ "model": "AXL-Comment-5M",
24
+ "params": 7182144,
25
+ "steps": 246,
26
+ "time": 60.12719678878784,
27
+ "final_loss": 0.009364727884531021,
28
+ "perplexity": 1.01,
29
+ "max_seq_len": 512,
30
+ "context_window": "512 bytes"
31
+ }
32
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "max_new_tokens": 256,
3
+ "temperature": 0.8,
4
+ "top_k": 40,
5
+ "top_p": 0.9,
6
+ "repetition_penalty": 1.1,
7
+ "do_sample": true
8
+ }
index.html ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width,initial-scale=1.0">
6
+ <title>AXL-Comment-5M - AXL</title>
7
+ <style>*{margin:0;padding:0;box-sizing:border-box}
8
+ body{font-family:-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,sans-serif;background:#0d1117;color:#c9d1d9;line-height:1.6}
9
+ a{color:#58a6ff;text-decoration:none}a:hover{text-decoration:underline}
10
+ .hero{padding:40px 20px;text-align:center;border-bottom:1px solid #30363d;background:linear-gradient(135deg,#0d1117,#161b22,#0d1117)}
11
+ .hero h1{font-size:2.2rem;color:#fff;letter-spacing:-1px}
12
+ .cat{display:inline-block;padding:3px 12px;border-radius:12px;font-size:.75rem;font-weight:600;margin-bottom:12px}
13
+ .cat.Lion{background:#1f3a5f;color:#4285f4}
14
+ .cat.SGD{background:#3d1f1f;color:#f85149}
15
+ .cat.Specialized{background:#2d1b69;color:#bb86fc}
16
+ .desc{color:#8b949e;font-size:.95rem;max-width:600px;margin:12px auto 0}
17
+ .ms{display:flex;flex-wrap:wrap;gap:12px;justify-content:center;padding:24px 20px}
18
+ .mc{background:#161b22;border:1px solid #30363d;border-radius:10px;padding:16px 24px;text-align:center;min-width:120px}
19
+ .v{font-size:1.5rem;font-weight:700;color:#fff}.l{font-size:.75rem;color:#8b949e;margin-top:2px}
20
+ .tabs{max-width:800px;margin:0 auto;padding:0 20px}
21
+ .tabs>input[type=radio]{display:none}
22
+ .tl{display:inline-block;background:#21262d;border:1px solid #30363d;color:#8b949e;padding:7px 16px;border-radius:8px;cursor:pointer;font-size:.85rem;margin:0 4px 16px;transition:all .2s}
23
+ .tl:hover{background:#30363d;color:#c9d1d9}
24
+ .p{display:none;background:#161b22;border:1px solid #30363d;border-radius:12px;padding:24px;margin-bottom:24px}
25
+ #t1:checked~.p1,#t2:checked~.p2,#t3:checked~.p3,#t4:checked~.p4{display:block}
26
+ #t1:checked+label[for=t1],#t2:checked+label[for=t2],#t3:checked+label[for=t3],#t4:checked+label[for=t4]{background:#4285f4;color:#fff;border-color:#4285f4}
27
+ table{width:100%;border-collapse:collapse}
28
+ th{text-align:left;color:#8b949e;font-size:.8rem;padding:8px 12px;border-bottom:1px solid #21262d;font-weight:600}
29
+ td{padding:8px 12px;font-size:.9rem;border-bottom:1px solid #21262d}
30
+ pre{background:#0d1117;padding:14px;border-radius:8px;overflow-x:auto;margin:12px 0}
31
+ code{color:#c9d1d9;font-size:.82rem;line-height:1.5}
32
+ .note{background:#21262d;border-left:3px solid #4285f4;padding:12px 16px;border-radius:0 8px 8px 0;margin:12px 0;font-size:.85rem;color:#8b949e}
33
+ .story{font-size:.9rem;color:#8b949e;line-height:1.6;margin:8px 0}
34
+ .back{text-align:center;padding:24px 20px 40px}
35
+ .back a{color:#58a6ff;font-size:.9rem}
36
+ @media(max-width:768px){.hero h1{font-size:1.6rem}.ms{flex-direction:column;align-items:center}.mc{min-width:200px}}</style>
37
+ </head>
38
+ <body>
39
+ <div class="hero">
40
+ <div class="cat Lion">Lion Optimized</div>
41
+ <h1>AXL-Comment-5M</h1>
42
+ <p class="desc">Code commenting. 7.2M params. PPL 1.16. Context 2048 bytes.</p>
43
+ </div>
44
+ <div class="ms">
45
+ <div class="mc"><div class="v">7M</div><div class="l">Parameters</div></div>
46
+ <div class="mc"><div class="v">1.16</div><div class="l">Perplexity</div></div>
47
+ <div class="mc"><div class="v">10 min</div><div class="l">Training</div></div>
48
+ <div class="mc"><div class="v">14 MB</div><div class="l">GGUF</div></div>
49
+ </div>
50
+ <div class="tabs">
51
+ <input type="radio" name="t" id="t1" checked><label for="t1" class="tl">Specs</label>
52
+ <input type="radio" name="t" id="t2"><label for="t2" class="tl">Training</label>
53
+ <input type="radio" name="t" id="t3"><label for="t3" class="tl">Usage</label>
54
+ <input type="radio" name="t" id="t4"><label for="t4" class="tl">Download</label>
55
+ <div class="p p1">
56
+ <table>
57
+ <tr><th>Property</th><th>Value</th></tr>
58
+ <tr><td>Architecture</td><td>Multi-Scale Transformer</td></tr>
59
+ <tr><td>d_model</td><td>?</td></tr>
60
+ <tr><td>Attention Heads</td><td>?</td></tr>
61
+ <tr><td>Layers per Scale</td><td>?</td></tr>
62
+ <tr><td>Context Window</td><td>2048 bytes</td></tr>
63
+ <tr><td>Downsample Factors</td><td>[1, 2, 4]</td></tr>
64
+ <tr><td>Vocab Size</td><td>258 (byte-level)</td></tr>
65
+ <tr><td>Optimizer</td><td>Lion</td></tr>
66
+ </table>
67
+ </div>
68
+ <div class="p p2">
69
+ <div class="story">Retrained with Lion on 20MB commenting pairs. 263 steps in 10 min.</div>
70
+ <table>
71
+ <tr><th>Metric</th><th>Value</th></tr>
72
+ <tr><td>Final Loss</td><td>0.1476</td></tr>
73
+ <tr><td>Perplexity</td><td>1.16</td></tr>
74
+ <tr><td>Training Steps</td><td>263</td></tr>
75
+ <tr><td>Training Time</td><td>10 min</td></tr>
76
+ </table>
77
+ </div>
78
+ <div class="p p3">
79
+ <h3 style="color:#fff;margin-bottom:12px">Usage</h3>
80
+ <pre><code>ollama create axl-comment-5m -f Modelfile
81
+ ollama run axl-comment-5m "def fibonacci():"</code></pre>
82
+ <div class="note">Adds inline comments to explain code logic.</div>
83
+ </div>
84
+ <div class="p p4">
85
+ <table>
86
+ <tr><th>File</th><th>Size</th><th>Format</th></tr>
87
+ <tr><td>F16 GGUF</td><td>14 MB</td><td>Full precision</td></tr>
88
+ <tr><td>Q4_K_M GGUF</td><td>14 MB</td><td>4-bit quantized</td></tr>
89
+ </table>
90
+ <div class="note" style="margin-top:16px">GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.</div>
91
+ </div>
92
+ </div>
93
+ <div class="back"><a href="../">← All AXL Models</a></div>
94
+ </body>
95
+ </html>
results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "AXL-Comment-5M",
3
+ "params": 7182144,
4
+ "steps": 263,
5
+ "time": 600.2396535873413,
6
+ "final_loss": 0.14764484763145447,
7
+ "perplexity": 1.16,
8
+ "optimizer": "Lion",
9
+ "max_seq_len": 2048
10
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "pad_token": "[PAD]",
3
+ "bos_token": "[BOS]",
4
+ "eos_token": "[EOS]",
5
+ "unk_token": "[UNK]"
6
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "ByteTokenizer",
3
+ "vocab_size": 258,
4
+ "pad_token": "[PAD]",
5
+ "bos_token": "[BOS]",
6
+ "eos_token": "[EOS]",
7
+ "unk_token": "[UNK]"
8
+ }