Vex-Amber-Mini-1.2 / README.md

Update README.md

2ffbe65 verified 10 days ago

5.2 kB

	---
	language:
	- en
	license: cc-by-nc-4.0
	library_name: transformers
	tags:
	- code
	- math
	- reasoning
	- 0.6b
	pipeline_tag: text-generation
	base_model:
	- Arioron/Vex-Amber-Mini-1.0
	---

	# Vex Amber Mini 1.2

	![Vex Amber Mini](https://img.shields.io/badge/Vex-Amber_Mini_1.2-blue)
	![License](https://img.shields.io/badge/License-Apache_2.0-green)
	![Parameters](https://img.shields.io/badge/Parameters-0.6B-orange)
	![HumanEval](https://img.shields.io/badge/HumanEval-21.34%25-brightgreen)

	## Model Description

	Vex Amber Mini 1.2 is a 0.6B parameter decoder-only transformer model that demonstrates exceptional capabilities in mathematical reasoning and code generation. Building upon Vex Amber Mini 1.0, this model achieves state-of-the-art performance for its size class, particularly excelling in programming tasks and mathematical problem-solving.

	- Developed by: Arioron
	- Model type: Decoder-only Transformer
	- Language(s): English
	- License: Apache 2.0
	- Finetuned from model: [Arioron/Vex-Amber-Mini-1.0](https://huggingface.co/Arioron/Vex-Amber-Mini-1.0)

	## Model Sources

	- Base Model: Qwen/Qwen3-0.6B
	- Repository: [https://huggingface.co/Arioron/Vex-Amber-Mini-1.2](https://huggingface.co/Arioron/Vex-Amber-Mini-1.2)
	- Documentation: [Arioron Model Docs](https://docs.arioron.com)

	## Performance

	\| Benchmark \| Metric \| Score \|
	\|-----------\|--------\|-------\|
	\| HumanEval \| Pass@1 \| 21.34% \|
	\| MBPP \| Pass@1 \| 38.7% \|
	\| GSM8K \| Accuracy \| 65.2% \|
	\| MATH \| Accuracy \| 45.8% \|
	\| MMLU \| Accuracy \| 58.3% \|

	## Quick Start
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_name = "Arioron/Vex-Amber-Mini-1.2"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Code generation example
	prompt = "Write a Python function to reverse a linked list:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.7,
	do_sample=True,
	top_p=0.9,
	pad_token_id=tokenizer.eos_token_id
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Capabilities

	### 🎯 Code Generation
	```python
	# Example: The model can generate efficient algorithms
	def quick_sort(arr):
	if len(arr) <= 1:
	return arr
	pivot = arr[len(arr) // 2]
	left = [x for x in arr if x < pivot]
	middle = [x for x in arr if x == pivot]
	right = [x for x in arr if x > pivot]
	return quick_sort(left) + middle + quick_sort(right)
	```

	### 🔢 Mathematical Reasoning
	```python
	# Example: Solve quadratic equations and explain steps
	"""
	Solve: x² - 5x + 6 = 0
	Step 1: Factor the equation: (x - 2)(x - 3) = 0
	Step 2: Set each factor to zero: x - 2 = 0 or x - 3 = 0
	Step 3: Solve for x: x = 2 or x = 3
	"""
	```

	## Training Details

	### Training Data

	The model was trained on a carefully curated mixture of:

	- 45% Code (Python, JavaScript, Java, C++)
	- 30% Mathematical content (textbooks, problems, proofs)
	- 15% General reasoning tasks
	- 10% Conversational data

	### Technical Specifications

	- Architecture: Transformer-based decoder
	- Context Length: 8,192 tokens
	- Precision: float16
	- Training Framework: Native PyTorch
	- Positional Encoding: Rotary Positional Embeddings (RoPE)

	## Intended Uses

	### Direct Use

	- Code completion and generation
	- Mathematical problem solving
	- Educational assistance
	- Technical documentation
	- Research prototyping

	### Downstream Use

	- Integration into IDEs and code editors
	- Educational platforms
	- Technical chatbots
	- Research tools for mathematics and computer science

	## Limitations

	- The 0.6B parameter count may limit performance on extremely complex, multi-step reasoning tasks
	- While strong for its size, it may not match the performance of larger models (7B+) on some benchmarks
	- Context window of 8K tokens may be insufficient for very long code files or documents

	## Ethical Considerations

	The model is trained on publicly available data and is designed to be helpful, harmless, and honest. However, as with any language model:

	- Outputs should be verified for accuracy in critical applications
	- The model should not be used for high-stakes decisions without human oversight
	- Users should be aware of potential biases in training data

	## Citation

	If you use this model in your research, please cite:
	```bibtex
	@misc{vexambermini1.2,
	title = {Vex Amber Mini 1.2: A Compact Language Model for Code and Mathematics},
	author = {Arioron},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Arioron/Vex-Amber-Mini-1.2}}
	}
	```

	## Contact

	- Email: inquiry@arioron.com
	- Website: https://arioron.com
	- Documentation: https://docs.arioron.com

	## Acknowledgements

	Thanks to the open-source community and the Qwen team for their foundational work. Special thanks to all contributors and researchers who have advanced the field of efficient language modeling.

	---

	For technical details, training recipes, and comprehensive evaluation results, please refer to our technical documentation.