KennedyOfficaly commited on
Commit
6dcdbfb
·
verified ·
1 Parent(s): 30a6b47

Upload 9 files

Browse files
Modelfile ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ./axl_vision_0.8m-f16.gguf
2
+
3
+ TEMPLATE """{{ .System }}
4
+
5
+ User: {{ .Prompt }}
6
+ Assistant: """
7
+
8
+ SYSTEM """You are AXL-Vision-0.8M, a code generation assistant built by Koinic."""
9
+
10
+ PARAMETER temperature 0.8
11
+ PARAMETER top_k 40
12
+ PARAMETER top_p 0.9
13
+ PARAMETER repeat_penalty 1.1
14
+ PARAMETER num_ctx 256
README.md CHANGED
@@ -1,3 +1,188 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - code
5
+ tags:
6
+ - code-generation
7
+ - multi-scale-transformer
8
+ - cpu-optimized
9
+ - koinic
10
+ - pytorch
11
+ - llama
12
+ - gguf
13
+ - byte-level
14
+ - vision
15
+ - multi-modal
16
+ pipeline_tag: text-generation
17
+ library_name: transformers
18
+ datasets:
19
+ - koinic/axl-synthetic-ui
20
+ model-index:
21
+ - name: AXL-Vision-0.8M
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ metrics:
26
+ - name: Perplexity (byte-level)
27
+ type: perplexity
28
+ value: ---
29
  ---
30
+
31
+ # AXL-Vision-0.8M
32
+
33
+ Vision encoder. 1M params. Converts 224x224 images to feature vectors. Part of the AXL model family by [KoinicLabs](https://huggingface.co/KoinicLabs).
34
+
35
+ ## Model Details
36
+
37
+ | Property | Value |
38
+ |----------|-------|
39
+ | Developed by | [KoinicLabs](https://huggingface.co/KoinicLabs) |
40
+ | Architecture | Multi-Scale Transformer |
41
+ | Parameters | 753024 |
42
+ | Optimizer | SGD |
43
+ | Attention | SDPA |
44
+ | Vocab Size | 258 (byte-level) |
45
+ | Context Window | 256 bytes |
46
+ | d_model | 128 |
47
+ | Attention Heads | 4 |
48
+ | Layers per Scale | 4 |
49
+ | Downsample Factors | [1, 2, 4] |
50
+ | License | Apache 2.0 |
51
+
52
+ ### Sources
53
+
54
+ - **Repository:** [GitHub](https://github.com/Koinic/AXL)
55
+ - **Organization:** [KoinicLabs](https://huggingface.co/KoinicLabs)
56
+
57
+ ## Uses
58
+
59
+ ### Direct Use
60
+
61
+ Image feature extraction (224x224 images).
62
+
63
+ ```python
64
+ import torch
65
+ from multiscale_transformer.model.model import MultiScaleTransformer
66
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
67
+ ckpt = torch.load("axl_vision_0.8m.pt", map_location="cpu")
68
+ model = MultiScaleTransformer(config)
69
+ model.load_state_dict(ckpt["model_state_dict"])
70
+ model.eval()
71
+ tokenizer = ByteTokenizer()
72
+ ids = torch.tensor([tokenizer.encode("def hello():")], dtype=torch.long)
73
+ with torch.no_grad():
74
+ out = model.generate(ids, max_new_tokens=50, temperature=0.8)
75
+ print(tokenizer.decode(out[0].tolist()))
76
+ ```
77
+
78
+ ### Out-of-Scope Use
79
+
80
+ Not for text generation. Vision-only model. For integration with tools like Continue.dev, LlamaIndex, or LangChain, use the Python API server which provides OpenAI-compatible endpoints.
81
+
82
+ ## Bias, Risks, and Limitations
83
+
84
+ This is a vision encoder, not a text generation model. No GGUF export available. Note: GGUF files for Ollama use a simplified single-stack encoder. For full AXL quality, use the Python API server.
85
+
86
+ ### Recommendations
87
+
88
+ - Use for prototyping and experimentation, not production code generation.
89
+ - Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab).
90
+ - For better results, use the Lion-optimized version if available.
91
+
92
+ ## Training Details
93
+
94
+ ### Training Data
95
+
96
+ Patch-based image encoder with 16x16 patches. Foundation for multi-modal AXL.
97
+
98
+ ### Preprocessing
99
+
100
+ Byte-level tokenization with vocabulary size 258 (256 bytes + BOS + EOS). No vocabulary training required.
101
+
102
+ ### Speeds, Sizes, Times
103
+
104
+ | Metric | Value |
105
+ |--------|-------|
106
+ | Training Steps | 32402 |
107
+ | Training Time | 30 min |
108
+ | Final Loss | 1.0014 |
109
+
110
+ ## Evaluation
111
+
112
+ ### Metrics
113
+
114
+ Perplexity on held-out Python code using byte-level tokenization.
115
+
116
+ ### Results
117
+
118
+ | Metric | Value |
119
+ |--------|-------|
120
+ | Perplexity (byte-level) | --- |
121
+ | Final Loss | 1.0014 |
122
+ | Training Steps | 32402 |
123
+ | Training Time | 30 min |
124
+
125
+ **Summary:** Image feature extraction for downstream vision tasks.
126
+
127
+ ## Environmental Impact
128
+
129
+ | Property | Value |
130
+ |----------|-------|
131
+ | Hardware | AMD Ryzen 5 5600G |
132
+ | Hours Used | 0.500 |
133
+ | Carbon Emitted | 0.0210 kg CO2 |
134
+ | Cloud Provider | None (local CPU) |
135
+
136
+ ## Technical Specifications
137
+
138
+ ### Model Architecture
139
+
140
+ Multi-Scale Transformer with three parallel encoder stacks at resolution scales 1x, 2x, and 4x. Cross-scale attention connects all scale pairs. Adaptive gating fusion. SwiGLU feed-forward. RoPE positional encoding.
141
+
142
+ ### Compute Infrastructure
143
+
144
+ | Property | Value |
145
+ |----------|-------|
146
+ | Hardware | AMD Ryzen 5 5600G (6 cores, 12 threads) |
147
+ | RAM | 16 GB |
148
+ | GPU | None (CPU-only) |
149
+
150
+ ## Citation
151
+
152
+ ```bibtex
153
+ @misc{axl_2026,
154
+ title={AXL: AXL-Vision-0.8M - Multi-Scale Transformer for CPU Code Generation},
155
+ author={Koinic},
156
+ year={2026},
157
+ url={https://huggingface.co/KoinicLabs}
158
+ }
159
+ ```
160
+
161
+ ## How to Get Started
162
+
163
+ ### With Ollama
164
+
165
+ ```bash
166
+ ollama create axl-vision-0.8m -f Modelfile
167
+ ollama run axl-vision-0.8m "def fibonacci():"
168
+ ```
169
+
170
+ ### With Python
171
+
172
+ ```python
173
+ import torch
174
+ from multiscale_transformer.model.config import load_config
175
+ from multiscale_transformer.model.model import MultiScaleTransformer
176
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
177
+ config = load_config("config.json")
178
+ model = MultiScaleTransformer(config)
179
+ ckpt = torch.load("axl_vision_0.8m.pt", map_location="cpu")
180
+ model.load_state_dict(ckpt["model_state_dict"])
181
+ model.eval()
182
+ tokenizer = ByteTokenizer()
183
+ prompt = "def fibonacci():"
184
+ ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
185
+ with torch.no_grad():
186
+ out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
187
+ print(tokenizer.decode(out[0].tolist()))
188
+ ```
axl_vision.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f654b04f26842221c5715cd8b456664c2fd6827054a3ec71669ee0bc5d5a565
3
+ size 3031989
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "multiscale_transformer",
3
+ "architectures": [
4
+ "MultiScaleForCausalLM"
5
+ ],
6
+ "vocab_size": 258,
7
+ "d_model": 128,
8
+ "n_heads": 4,
9
+ "d_ff": 256,
10
+ "n_layers_per_scale": 4,
11
+ "n_cross_attn_layers": 1,
12
+ "max_seq_len": 256,
13
+ "dropout": 0.0,
14
+ "bias": false,
15
+ "rope_theta": 10000.0,
16
+ "downsample_factors": [
17
+ 1,
18
+ 2,
19
+ 4
20
+ ],
21
+ "num_parameters": 3006080,
22
+ "training_results": {
23
+ "model": "AXL-Vision",
24
+ "params": 753024,
25
+ "steps": 32402,
26
+ "time": 1800.0420067310333,
27
+ "final_loss": 1.0014228820800781
28
+ }
29
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "max_new_tokens": 256,
3
+ "temperature": 0.8,
4
+ "top_k": 40,
5
+ "top_p": 0.9,
6
+ "repetition_penalty": 1.1,
7
+ "do_sample": true
8
+ }
index.html ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width,initial-scale=1.0">
6
+ <title>AXL-Vision-0.8M - AXL</title>
7
+ <style>*{margin:0;padding:0;box-sizing:border-box}
8
+ body{font-family:-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,sans-serif;background:#0d1117;color:#c9d1d9;line-height:1.6}
9
+ a{color:#58a6ff;text-decoration:none}a:hover{text-decoration:underline}
10
+ .hero{padding:40px 20px;text-align:center;border-bottom:1px solid #30363d;background:linear-gradient(135deg,#0d1117,#161b22,#0d1117)}
11
+ .hero h1{font-size:2.2rem;color:#fff;letter-spacing:-1px}
12
+ .cat{display:inline-block;padding:3px 12px;border-radius:12px;font-size:.75rem;font-weight:600;margin-bottom:12px}
13
+ .cat.Lion{background:#1f3a5f;color:#4285f4}
14
+ .cat.SGD{background:#3d1f1f;color:#f85149}
15
+ .cat.Specialized{background:#2d1b69;color:#bb86fc}
16
+ .desc{color:#8b949e;font-size:.95rem;max-width:600px;margin:12px auto 0}
17
+ .ms{display:flex;flex-wrap:wrap;gap:12px;justify-content:center;padding:24px 20px}
18
+ .mc{background:#161b22;border:1px solid #30363d;border-radius:10px;padding:16px 24px;text-align:center;min-width:120px}
19
+ .v{font-size:1.5rem;font-weight:700;color:#fff}.l{font-size:.75rem;color:#8b949e;margin-top:2px}
20
+ .tabs{max-width:800px;margin:0 auto;padding:0 20px}
21
+ .tabs>input[type=radio]{display:none}
22
+ .tl{display:inline-block;background:#21262d;border:1px solid #30363d;color:#8b949e;padding:7px 16px;border-radius:8px;cursor:pointer;font-size:.85rem;margin:0 4px 16px;transition:all .2s}
23
+ .tl:hover{background:#30363d;color:#c9d1d9}
24
+ .p{display:none;background:#161b22;border:1px solid #30363d;border-radius:12px;padding:24px;margin-bottom:24px}
25
+ #t1:checked~.p1,#t2:checked~.p2,#t3:checked~.p3,#t4:checked~.p4{display:block}
26
+ #t1:checked+label[for=t1],#t2:checked+label[for=t2],#t3:checked+label[for=t3],#t4:checked+label[for=t4]{background:#bb86fc;color:#fff;border-color:#bb86fc}
27
+ table{width:100%;border-collapse:collapse}
28
+ th{text-align:left;color:#8b949e;font-size:.8rem;padding:8px 12px;border-bottom:1px solid #21262d;font-weight:600}
29
+ td{padding:8px 12px;font-size:.9rem;border-bottom:1px solid #21262d}
30
+ pre{background:#0d1117;padding:14px;border-radius:8px;overflow-x:auto;margin:12px 0}
31
+ code{color:#c9d1d9;font-size:.82rem;line-height:1.5}
32
+ .note{background:#21262d;border-left:3px solid #bb86fc;padding:12px 16px;border-radius:0 8px 8px 0;margin:12px 0;font-size:.85rem;color:#8b949e}
33
+ .story{font-size:.9rem;color:#8b949e;line-height:1.6;margin:8px 0}
34
+ .back{text-align:center;padding:24px 20px 40px}
35
+ .back a{color:#58a6ff;font-size:.9rem}
36
+ @media(max-width:768px){.hero h1{font-size:1.6rem}.ms{flex-direction:column;align-items:center}.mc{min-width:200px}}</style>
37
+ </head>
38
+ <body>
39
+ <div class="hero">
40
+ <div class="cat Specialized">Specialized Optimized</div>
41
+ <h1>AXL-Vision-0.8M</h1>
42
+ <p class="desc">Vision encoder. 1M params. Converts 224x224 images to feature vectors.</p>
43
+ </div>
44
+ <div class="ms">
45
+ <div class="mc"><div class="v">753024</div><div class="l">Parameters</div></div>
46
+ <div class="mc"><div class="v">---</div><div class="l">Perplexity</div></div>
47
+ <div class="mc"><div class="v">30 min</div><div class="l">Training</div></div>
48
+ <div class="mc"><div class="v">---</div><div class="l">GGUF</div></div>
49
+ </div>
50
+ <div class="tabs">
51
+ <input type="radio" name="t" id="t1" checked><label for="t1" class="tl">Specs</label>
52
+ <input type="radio" name="t" id="t2"><label for="t2" class="tl">Training</label>
53
+ <input type="radio" name="t" id="t3"><label for="t3" class="tl">Usage</label>
54
+ <input type="radio" name="t" id="t4"><label for="t4" class="tl">Download</label>
55
+ <div class="p p1">
56
+ <table>
57
+ <tr><th>Property</th><th>Value</th></tr>
58
+ <tr><td>Architecture</td><td>Multi-Scale Transformer</td></tr>
59
+ <tr><td>d_model</td><td>?</td></tr>
60
+ <tr><td>Attention Heads</td><td>?</td></tr>
61
+ <tr><td>Layers per Scale</td><td>?</td></tr>
62
+ <tr><td>Context Window</td><td>256 bytes</td></tr>
63
+ <tr><td>Downsample Factors</td><td>[1, 2, 4]</td></tr>
64
+ <tr><td>Vocab Size</td><td>258 (byte-level)</td></tr>
65
+ <tr><td>Optimizer</td><td>SGD</td></tr>
66
+ </table>
67
+ </div>
68
+ <div class="p p2">
69
+ <div class="story">Patch-based image encoder with 16x16 patches. Foundation for multi-modal AXL.</div>
70
+ <table>
71
+ <tr><th>Metric</th><th>Value</th></tr>
72
+ <tr><td>Final Loss</td><td>1.0014</td></tr>
73
+ <tr><td>Perplexity</td><td>---</td></tr>
74
+ <tr><td>Training Steps</td><td>32402</td></tr>
75
+ <tr><td>Training Time</td><td>30 min</td></tr>
76
+ </table>
77
+ </div>
78
+ <div class="p p3">
79
+ <h3 style="color:#fff;margin-bottom:12px">Usage</h3>
80
+ <pre><code>ollama create axl-vision-0.8m -f Modelfile
81
+ ollama run axl-vision-0.8m "def fibonacci():"</code></pre>
82
+ <div class="note">Image feature extraction for downstream vision tasks.</div>
83
+ </div>
84
+ <div class="p p4">
85
+ <table>
86
+ <tr><th>File</th><th>Size</th><th>Format</th></tr>
87
+ <tr><td>F16 GGUF</td><td>---</td><td>Full precision</td></tr>
88
+ <tr><td>Q4_K_M GGUF</td><td>---</td><td>4-bit quantized</td></tr>
89
+ </table>
90
+ <div class="note" style="margin-top:16px">GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.</div>
91
+ </div>
92
+ </div>
93
+ <div class="back"><a href="../">← All AXL Models</a></div>
94
+ </body>
95
+ </html>
results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "AXL-Vision",
3
+ "params": 753024,
4
+ "steps": 32402,
5
+ "time": 1800.0420067310333,
6
+ "final_loss": 1.0014228820800781
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "pad_token": "[PAD]",
3
+ "bos_token": "[BOS]",
4
+ "eos_token": "[EOS]",
5
+ "unk_token": "[UNK]"
6
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "ByteTokenizer",
3
+ "vocab_size": 258,
4
+ "pad_token": "[PAD]",
5
+ "bos_token": "[BOS]",
6
+ "eos_token": "[EOS]",
7
+ "unk_token": "[UNK]"
8
+ }