| ---
|
| license: apache-2.0
|
| language:
|
| - code
|
| tags:
|
| - code-generation
|
| - multi-scale-transformer
|
| - cpu-optimized
|
| - koinic
|
| - pytorch
|
| - llama
|
| - gguf
|
| - byte-level
|
| - conversational
|
| pipeline_tag: text-generation
|
| library_name: transformers
|
| datasets:
|
| - koinic/axl-chat-pairs
|
| widget:
|
| - text: "User: Explain binary search trees\nAssistant:"
|
| - text: "User: What is a generator in Python?\nAssistant:"
|
| model-index:
|
| - name: AXL-Chat-Pro
|
| results:
|
| - task:
|
| type: text-generation
|
| metrics:
|
| - name: Perplexity (byte-level)
|
| type: perplexity
|
| value: 1.34
|
| ---
|
|
|
| # AXL-Chat-Pro
|
|
|
| Advanced conversational AI. 12.8M params. PPL 1.34.. Context 256 bytes. Part of the AXL model family by [KoinicLabs](https://huggingface.co/KoinicLabs).
|
|
|
| ## Model Details
|
|
|
| | Property | Value |
|
| |----------|-------|
|
| | Developed by | [KoinicLabs](https://huggingface.co/KoinicLabs) |
|
| | Architecture | Multi-Scale Transformer |
|
| | Parameters | 13M |
|
| | Optimizer | Lion |
|
| | Attention | SDPA |
|
| | Vocab Size | 258 (byte-level) |
|
| | Context Window | 256 bytes |
|
| | d_model | 256 |
|
| | Attention Heads | 4 |
|
| | Layers per Scale | 3 |
|
| | Downsample Factors | [1, 2, 4] |
|
| | License | Apache 2.0 |
|
|
|
| ### Sources
|
|
|
| - **Repository:** [GitHub](https://github.com/Koinic/AXL)
|
| - **Organization:** [KoinicLabs](https://huggingface.co/KoinicLabs)
|
|
|
| ## Uses
|
|
|
| ### Direct Use
|
|
|
| Advanced conversational AI for code explanation.
|
|
|
| ```python
|
| import torch
|
| from multiscale_transformer.model.model import MultiScaleTransformer
|
| from multiscale_transformer.training.tokenizer import ByteTokenizer
|
| ckpt = torch.load("axl_chat_pro.pt", map_location="cpu")
|
| model = MultiScaleTransformer(config)
|
| model.load_state_dict(ckpt["model_state_dict"])
|
| model.eval()
|
| tokenizer = ByteTokenizer()
|
| ids = torch.tensor([tokenizer.encode("def hello():")], dtype=torch.long)
|
| with torch.no_grad():
|
| out = model.generate(ids, max_new_tokens=50, temperature=0.8)
|
| print(tokenizer.decode(out[0].tolist()))
|
| ```
|
|
|
| ### Out-of-Scope Use
|
|
|
| Not for general code generation. Task-specific model. For integration with tools like Continue.dev, LlamaIndex, or LangChain, use the Python API server which provides OpenAI-compatible endpoints.
|
|
|
| ## Bias, Risks, and Limitations
|
|
|
| Byte-level perplexity is not comparable to BPE-level perplexity. Specialized for chat. Max context 256 bytes. IMPORTANT: GGUF files exported for Ollama/LM Studio use only the fine-scale encoder (1/3 of the AXL architecture). The reported PPL applies to the full multi-scale model. For full AXL quality, use the Python API server at http://localhost:8880/v1/completions.
|
|
|
| ### Recommendations
|
|
|
| - Use for prototyping and experimentation, not production code generation.
|
| - Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab).
|
| - For better results, use the Lion-optimized version if available.
|
|
|
| ## Training Details
|
|
|
| ### Training Data
|
|
|
| Rewritten from numpy to PyTorch. Trained with Lion on 10MB chat pairs. 208 steps in 10 min.
|
|
|
| ### Preprocessing
|
|
|
| Byte-level tokenization with vocabulary size 258 (256 bytes + BOS + EOS). No vocabulary training required.
|
|
|
| ### Speeds, Sizes, Times
|
|
|
| | Metric | Value |
|
| |--------|-------|
|
| | Training Steps | 208 |
|
| | Training Time | 10 min |
|
| | Final Loss | 0.3106 |
|
|
|
| ## Evaluation
|
|
|
| ### Metrics
|
|
|
| Perplexity on held-out Python code using byte-level tokenization.
|
|
|
| ### Results
|
|
|
| | Metric | Value |
|
| |--------|-------|
|
| | Perplexity (byte-level) | 1.34 |
|
| | Final Loss | 0.3106 |
|
| | Training Steps | 208 |
|
| | Training Time | 10 min |
|
|
|
| **Summary:** Better quality than AXL-Chat-Lion (PPL 1.34 vs 1.52).
|
|
|
| ## Environmental Impact
|
|
|
| | Property | Value |
|
| |----------|-------|
|
| | Hardware | AMD Ryzen 5 5600G |
|
| | Hours Used | 0.167 |
|
| | Carbon Emitted | 0.0070 kg CO2 |
|
| | Cloud Provider | None (local CPU) |
|
|
|
| ## Technical Specifications
|
|
|
| ### Model Architecture
|
|
|
| Multi-Scale Transformer with three parallel encoder stacks at resolution scales 1x, 2x, and 4x. Cross-scale attention connects all scale pairs. Adaptive gating fusion. SwiGLU feed-forward. RoPE positional encoding.
|
|
|
| ### Compute Infrastructure
|
|
|
| | Property | Value |
|
| |----------|-------|
|
| | Hardware | AMD Ryzen 5 5600G (6 cores, 12 threads) |
|
| | RAM | 16 GB |
|
| | GPU | None (CPU-only) |
|
|
|
| ## Citation
|
|
|
| ```bibtex
|
| @misc{axl_2026,
|
| title={AXL: AXL-Chat-Pro - Multi-Scale Transformer for CPU Code Generation},
|
| author={Koinic},
|
| year={2026},
|
| url={https://huggingface.co/KoinicLabs}
|
| }
|
| ```
|
|
|
| ## How to Get Started
|
|
|
| ### With Ollama
|
|
|
| ```bash
|
| ollama create axl-chat-pro -f Modelfile
|
| ollama run axl-chat-pro "def fibonacci():"
|
| ```
|
|
|
| ### With Python
|
|
|
| ```python
|
| import torch
|
| from multiscale_transformer.model.config import load_config
|
| from multiscale_transformer.model.model import MultiScaleTransformer
|
| from multiscale_transformer.training.tokenizer import ByteTokenizer
|
| config = load_config("config.json")
|
| model = MultiScaleTransformer(config)
|
| ckpt = torch.load("axl_chat_pro.pt", map_location="cpu")
|
| model.load_state_dict(ckpt["model_state_dict"])
|
| model.eval()
|
| tokenizer = ByteTokenizer()
|
| prompt = "def fibonacci():"
|
| ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
|
| with torch.no_grad():
|
| out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
|
| print(tokenizer.decode(out[0].tolist()))
|
| ```
|
|
|