|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: cc-by-nc-4.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- code |
|
|
- math |
|
|
- reasoning |
|
|
- 0.6b |
|
|
pipeline_tag: text-generation |
|
|
base_model: |
|
|
- Arioron/Vex-Amber-Mini-1.0 |
|
|
--- |
|
|
|
|
|
# Vex Amber Mini 1.2 |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Vex Amber Mini 1.2** is a 0.6B parameter decoder-only transformer model that demonstrates exceptional capabilities in mathematical reasoning and code generation. Building upon Vex Amber Mini 1.0, this model achieves state-of-the-art performance for its size class, particularly excelling in programming tasks and mathematical problem-solving. |
|
|
|
|
|
- **Developed by:** Arioron |
|
|
- **Model type:** Decoder-only Transformer |
|
|
- **Language(s):** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Finetuned from model:** [Arioron/Vex-Amber-Mini-1.0](https://huggingface.co/Arioron/Vex-Amber-Mini-1.0) |
|
|
|
|
|
## Model Sources |
|
|
|
|
|
- **Base Model:** Qwen/Qwen3-0.6B |
|
|
- **Repository:** [https://huggingface.co/Arioron/Vex-Amber-Mini-1.2](https://huggingface.co/Arioron/Vex-Amber-Mini-1.2) |
|
|
- **Documentation:** [Arioron Model Docs](https://docs.arioron.com) |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Benchmark | Metric | Score | |
|
|
|-----------|--------|-------| |
|
|
| HumanEval | Pass@1 | 21.34% | |
|
|
| MBPP | Pass@1 | 38.7% | |
|
|
| GSM8K | Accuracy | 65.2% | |
|
|
| MATH | Accuracy | 45.8% | |
|
|
| MMLU | Accuracy | 58.3% | |
|
|
|
|
|
## Quick Start |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
model_name = "Arioron/Vex-Amber-Mini-1.2" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Code generation example |
|
|
prompt = "Write a Python function to reverse a linked list:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=256, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
top_p=0.9, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Capabilities |
|
|
|
|
|
### 🎯 Code Generation |
|
|
```python |
|
|
# Example: The model can generate efficient algorithms |
|
|
def quick_sort(arr): |
|
|
if len(arr) <= 1: |
|
|
return arr |
|
|
pivot = arr[len(arr) // 2] |
|
|
left = [x for x in arr if x < pivot] |
|
|
middle = [x for x in arr if x == pivot] |
|
|
right = [x for x in arr if x > pivot] |
|
|
return quick_sort(left) + middle + quick_sort(right) |
|
|
``` |
|
|
|
|
|
### 🔢 Mathematical Reasoning |
|
|
```python |
|
|
# Example: Solve quadratic equations and explain steps |
|
|
""" |
|
|
Solve: x² - 5x + 6 = 0 |
|
|
Step 1: Factor the equation: (x - 2)(x - 3) = 0 |
|
|
Step 2: Set each factor to zero: x - 2 = 0 or x - 3 = 0 |
|
|
Step 3: Solve for x: x = 2 or x = 3 |
|
|
""" |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on a carefully curated mixture of: |
|
|
|
|
|
- 45% Code (Python, JavaScript, Java, C++) |
|
|
- 30% Mathematical content (textbooks, problems, proofs) |
|
|
- 15% General reasoning tasks |
|
|
- 10% Conversational data |
|
|
|
|
|
### Technical Specifications |
|
|
|
|
|
- Architecture: Transformer-based decoder |
|
|
- Context Length: 8,192 tokens |
|
|
- Precision: float16 |
|
|
- Training Framework: Native PyTorch |
|
|
- Positional Encoding: Rotary Positional Embeddings (RoPE) |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Code completion and generation |
|
|
- Mathematical problem solving |
|
|
- Educational assistance |
|
|
- Technical documentation |
|
|
- Research prototyping |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
- Integration into IDEs and code editors |
|
|
- Educational platforms |
|
|
- Technical chatbots |
|
|
- Research tools for mathematics and computer science |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The 0.6B parameter count may limit performance on extremely complex, multi-step reasoning tasks |
|
|
- While strong for its size, it may not match the performance of larger models (7B+) on some benchmarks |
|
|
- Context window of 8K tokens may be insufficient for very long code files or documents |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
The model is trained on publicly available data and is designed to be helpful, harmless, and honest. However, as with any language model: |
|
|
|
|
|
- Outputs should be verified for accuracy in critical applications |
|
|
- The model should not be used for high-stakes decisions without human oversight |
|
|
- Users should be aware of potential biases in training data |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
```bibtex |
|
|
@misc{vexambermini1.2, |
|
|
title = {Vex Amber Mini 1.2: A Compact Language Model for Code and Mathematics}, |
|
|
author = {Arioron}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/Arioron/Vex-Amber-Mini-1.2}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
- Email: inquiry@arioron.com |
|
|
- Website: https://arioron.com |
|
|
- Documentation: https://docs.arioron.com |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
Thanks to the open-source community and the Qwen team for their foundational work. Special thanks to all contributors and researchers who have advanced the field of efficient language modeling. |
|
|
|
|
|
--- |
|
|
|
|
|
For technical details, training recipes, and comprehensive evaluation results, please refer to our technical documentation. |