zenlm
/

zen-training

zen

zenlm

Model card Files Files and versions

xet

Community

zeekay commited on Feb 28

Commit

39fc60e

verified ·

1 Parent(s): 44abfbe

Update model card: add zen/zenlm tags, fix branding

Browse files

Files changed (1) hide show

README.md +122 -52

README.md CHANGED Viewed

@@ -1,81 +1,151 @@
 ---
-language: en
 license: apache-2.0
 tags:
-  - training
   - zen
   - zenlm
-  - hanzo
 ---
-# Zen Training
-Training infrastructure and recipes for the Zen model family.
-**Zen LM by Hanzo AI** — Open training configurations for all Zen models.
-## Overview
-This repository contains the training configurations, scripts, and recipes used to train Zen models using the Zen MoDE (Mixture of Distilled Experts) architecture. All training runs use mixed-precision distributed training with full support for LoRA/QLoRA fine-tuning and alignment techniques.
-## Training Recipes
-| Model | Type | Parameters | Context | Hardware |
-|-------|------|-----------|---------|----------|
-| Zen Nano | Dense | 0.6B | 32K | 1x H100 |
-| Zen Eco | Dense | 4B | 64K | 4x H100 |
-| Zen Pro | Dense | 8B | 128K | 8x H100 |
-| Zen MAX | MoE | 235B (22B active) | 128K | 64x H100 |
-## Features
-- Mixed precision training (BF16)
-- Gradient checkpointing
-- Distributed training with FSDP / DeepSpeed ZeRO-3
-- LoRA / QLoRA fine-tuning support
-- RLHF and DPO alignment pipelines
-- Dataset mixing and curriculum scheduling
-- Evaluation harness integration
-## Supported Training Tasks
-- Instruction tuning
-- Function calling
-- Agent trajectory training
-- Vision-language alignment
-- Code generation fine-tuning
-- Reasoning / chain-of-thought distillation
-## Dataset Support
-Training recipes support direct streaming from HuggingFace datasets:
-- Instruction tuning corpora
-- Agent behavior datasets
-- Function calling datasets
-- Code and math reasoning sets
-- Multilingual alignment data
-## Quick Start
-See [github.com/zenlm/zen-family](https://github.com/zenlm/zen-family) for full documentation, training scripts, and configuration files.
-```bash
-git clone https://github.com/zenlm/zen-family
-cd zen-family/training
-pip install -r requirements.txt
-python train.py --config configs/zen-pro-8b.yaml
 ```
-## Related Repositories
-| Repo | Description |
-|------|-------------|
-| [zenlm/zen-family](https://huggingface.co/zenlm/zen-family) | Model family overview |
-| [zenlm/zen-nano-600m-instruct](https://huggingface.co/zenlm/zen-nano-600m-instruct) | Zen Nano — 0.6B |
-| [zenlm/zen-pro-8b-instruct](https://huggingface.co/zenlm/zen-pro-8b-instruct) | Zen Pro — 8B |
-| [zenlm/zen-max-235b-a22b-instruct](https://huggingface.co/zenlm/zen-max-235b-a22b-instruct) | Zen MAX — 235B MoE |
-## License
 Apache 2.0

 ---
+title: Zen Training
+emoji: 🧘
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: true
 license: apache-2.0
+hardware: a10g-large
 tags:
   - zen
   - zenlm
 ---
+# 🧘 Zen Training Space
+**Unified Training Platform for All Zen Models**
+Train any Zen model with any dataset combination from HuggingFace. Everything runs directly from HF datasets - no local storage needed!
+## 🎯 Features
+### Supported Models
+**Language Models:**
+- `zen-nano` (0.6B) - Edge deployment
+- `zen-eco` (4B) - Balanced performance
+- `zen-omni` (7B) - Multi-task
+- `zen-coder` (14B) - Code generation
+- `zen-next` (32B) - Frontier performance
+**Vision-Language Models:**
+- `zen-vl-4b` - Efficient VL with function calling
+- `zen-vl-8b` - Enhanced VL capabilities
+- `zen-vl-30b` - Maximum VL performance
+### Supported Datasets
+**Agent Training (ADP):**
+- AgentTuning OS/KG/DB (~15k samples)
+- Synatra (99k agent trajectories)
+- Code Feedback (66k samples)
+- Go Browse (27k web interactions)
+**Function Calling:**
+- xLAM 60k (Salesforce high-quality function calling)
+**Instruction Tuning:**
+- Alpaca (52k instruction samples)
+## 🚀 How to Use
+1. **Select Model**: Choose from language or vision-language models
+2. **Select Datasets**: Check multiple datasets to combine them
+3. **Configure Training**: Set epochs, batch size, learning rate, max samples
+4. **Set Output Repo**: Specify HuggingFace repo for trained model
+5. **Start Training**: Click the button and monitor logs
+## ⚙️ Training Configuration
+### Recommended Settings
+**4B Models (A10G - 24GB):**
+- Batch Size: 1-2
+- Max Samples: 10,000-30,000
+- Time: 4-8 hours
+- Cost: ~$3-5
+**8B Models (A100 - 40GB):**
+- Batch Size: 2-4
+- Max Samples: 30,000-50,000
+- Time: 8-12 hours
+- Cost: ~$15-20
+**32B Models (A100 - 80GB):**
+- Batch Size: 1-2
+- Max Samples: 50,000-100,000
+- Time: 20-30 hours
+- Cost: ~$50-80
+## 📊 Dataset Combinations
+### For Agent Training:
+```
+ADP Synatra (80%) + xLAM (20%)
+= Strong agent + quality function calling
+```
+### For Code Models:
+```
+Code Feedback (70%) + Alpaca (30%)
+= Code expertise + general instruction following
+```
+### For VL Models:
+```
+ADP (all configs) + xLAM
+= Complete vision-language agent training
 ```
+## 🔒 Requirements
+- HuggingFace Pro account (for GPU access)
+- Write access to output repository
+- HF_TOKEN secret set in Space settings
+## 💡 Tips
+1. **Start Small**: Test with 1,000 samples first
+2. **Mix Datasets**: Combine complementary datasets for best results
+3. **Monitor Logs**: Watch for OOM errors and adjust batch size
+4. **Save Often**: Lower save_steps for longer training runs
+## 📚 Resources
+- **Website**: https://zenlm.org
+- **GitHub**: https://github.com/zenlm
+- **Models**: https://huggingface.co/zenlm
+- **Datasets**:
+  - [ADP](https://huggingface.co/datasets/neulab/agent-data-collection)
+  - [xLAM](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)
+## 📄 License
 Apache 2.0
+## 🙏 Citations
+```bibtex
+@software{zen-training-2025,
+  title={Zen Training: Unified Training Platform for Zen Models},
+  author={Zen AI Team},
+  year={2025},
+  url={https://huggingface.co/spaces/zenlm/zen-training}
+}
+@article{adp2024,
+  title={Agent Data Protocol},
+  author={NeuLab},
+  journal={arXiv preprint arXiv:2510.24702},
+  year={2024}
+}
+@dataset{xlam2024,
+  title={xLAM Function Calling Dataset},
+  author={Salesforce Research},
+  year={2024}
+}
+```