KennedyOfficaly commited on
Commit
39622b9
·
verified ·
1 Parent(s): 9dd264d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +186 -0
README.md CHANGED
@@ -1,3 +1,189 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - code
5
+ tags:
6
+ - code-generation
7
+ - multi-scale-transformer
8
+ - cpu-optimized
9
+ - koinic
10
+ - pytorch
11
+ - llama
12
+ - gguf
13
+ - byte-level
14
+ pipeline_tag: text-generation
15
+ library_name: transformers
16
+ datasets:
17
+ - bigcode/starcoderdata
18
+ - theblackcat102/evol-codealpaca-v1
19
+ widget:
20
+ - text: "To be or not to be"
21
+ model-index:
22
+ - name: AXL-Micro-600K
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ metrics:
27
+ - name: Perplexity (byte-level)
28
+ type: perplexity
29
+ value: 1.04
30
  ---
31
+
32
+ # AXL-Micro-600K
33
+
34
+ Smallest AXL model. 677K params. PPL 1.04.. Context 256 bytes. Demo model. Part of the AXL model family by [KoinicLabs](https://huggingface.co/KoinicLabs).
35
+
36
+ ## Model Details
37
+
38
+ | Property | Value |
39
+ |----------|-------|
40
+ | Developed by | [KoinicLabs](https://huggingface.co/KoinicLabs) |
41
+ | Architecture | Multi-Scale Transformer |
42
+ | Parameters | 677056 |
43
+ | Optimizer | Lion |
44
+ | Attention | SDPA |
45
+ | Vocab Size | 258 (byte-level) |
46
+ | Context Window | 256 bytes |
47
+ | d_model | 64 |
48
+ | Attention Heads | 4 |
49
+ | Layers per Scale | 2 |
50
+ | Downsample Factors | [1, 2, 4] |
51
+ | License | Apache 2.0 |
52
+
53
+ ### Sources
54
+
55
+ - **Repository:** [GitHub](https://github.com/Koinic/AXL)
56
+ - **Organization:** [KoinicLabs](https://huggingface.co/KoinicLabs)
57
+
58
+ ## Uses
59
+
60
+ ### Direct Use
61
+
62
+ Demo/testing model (Shakespeare).
63
+
64
+ ```python
65
+ import torch
66
+ from multiscale_transformer.model.model import MultiScaleTransformer
67
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
68
+ ckpt = torch.load("axl_micro_600k.pt", map_location="cpu")
69
+ model = MultiScaleTransformer(config)
70
+ model.load_state_dict(ckpt["model_state_dict"])
71
+ model.eval()
72
+ tokenizer = ByteTokenizer()
73
+ ids = torch.tensor([tokenizer.encode("def hello():")], dtype=torch.long)
74
+ with torch.no_grad():
75
+ out = model.generate(ids, max_new_tokens=50, temperature=0.8)
76
+ print(tokenizer.decode(out[0].tolist()))
77
+ ```
78
+
79
+ ### Out-of-Scope Use
80
+
81
+ Not for production code generation. Not for code generation tasks. For integration with tools like Continue.dev, LlamaIndex, or LangChain, use the Python API server which provides OpenAI-compatible endpoints.
82
+
83
+ ## Bias, Risks, and Limitations
84
+
85
+ Byte-level perplexity is not comparable to BPE-level perplexity. Shakespeare-trained demo model. Not for code generation. Note: GGUF files for Ollama use a simplified single-stack encoder. For full AXL quality, use the Python API server.
86
+
87
+ ### Recommendations
88
+
89
+ - Use for prototyping and experimentation, not production code generation.
90
+ - Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab).
91
+ - For better results, use the Lion-optimized version if available.
92
+
93
+ ## Training Details
94
+
95
+ ### Training Data
96
+
97
+ Retrained with Lion on Shakespeare. 2435 steps in 2 min. PPL 1.04.
98
+
99
+ ### Preprocessing
100
+
101
+ Byte-level tokenization with vocabulary size 258 (256 bytes + BOS + EOS). No vocabulary training required.
102
+
103
+ ### Speeds, Sizes, Times
104
+
105
+ | Metric | Value |
106
+ |--------|-------|
107
+ | Training Steps | 2435 |
108
+ | Training Time | 2 min |
109
+ | Final Loss | 0.0747 |
110
+
111
+ ## Evaluation
112
+
113
+ ### Metrics
114
+
115
+ Perplexity on held-out Python code using byte-level tokenization.
116
+
117
+ ### Results
118
+
119
+ | Metric | Value |
120
+ |--------|-------|
121
+ | Perplexity (byte-level) | 1.04 |
122
+ | Final Loss | 0.0747 |
123
+ | Training Steps | 2435 |
124
+ | Training Time | 2 min |
125
+
126
+ **Summary:** Demo model for testing architecture. Shakespeare-trained.
127
+
128
+ ## Environmental Impact
129
+
130
+ | Property | Value |
131
+ |----------|-------|
132
+ | Hardware | AMD Ryzen 5 5600G |
133
+ | Hours Used | 0.033 |
134
+ | Carbon Emitted | 0.0014 kg CO2 |
135
+ | Cloud Provider | None (local CPU) |
136
+
137
+ ## Technical Specifications
138
+
139
+ ### Model Architecture
140
+
141
+ Multi-Scale Transformer with three parallel encoder stacks at resolution scales 1x, 2x, and 4x. Cross-scale attention connects all scale pairs. Adaptive gating fusion. SwiGLU feed-forward. RoPE positional encoding.
142
+
143
+ ### Compute Infrastructure
144
+
145
+ | Property | Value |
146
+ |----------|-------|
147
+ | Hardware | AMD Ryzen 5 5600G (6 cores, 12 threads) |
148
+ | RAM | 16 GB |
149
+ | GPU | None (CPU-only) |
150
+
151
+ ## Citation
152
+
153
+ ```bibtex
154
+ @misc{axl_2026,
155
+ title={AXL: AXL-Micro-600K - Multi-Scale Transformer for CPU Code Generation},
156
+ author={Koinic},
157
+ year={2026},
158
+ url={https://huggingface.co/KoinicLabs}
159
+ }
160
+ ```
161
+
162
+ ## How to Get Started
163
+
164
+ ### With Ollama
165
+
166
+ ```bash
167
+ ollama create axl-micro-600k -f Modelfile
168
+ ollama run axl-micro-600k "def fibonacci():"
169
+ ```
170
+
171
+ ### With Python
172
+
173
+ ```python
174
+ import torch
175
+ from multiscale_transformer.model.config import load_config
176
+ from multiscale_transformer.model.model import MultiScaleTransformer
177
+ from multiscale_transformer.training.tokenizer import ByteTokenizer
178
+ config = load_config("config.json")
179
+ model = MultiScaleTransformer(config)
180
+ ckpt = torch.load("axl_micro_600k.pt", map_location="cpu")
181
+ model.load_state_dict(ckpt["model_state_dict"])
182
+ model.eval()
183
+ tokenizer = ByteTokenizer()
184
+ prompt = "def fibonacci():"
185
+ ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
186
+ with torch.no_grad():
187
+ out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
188
+ print(tokenizer.decode(out[0].tolist()))
189
+ ```