zen
zenlm
zeekay commited on
Commit
39fc60e
Β·
verified Β·
1 Parent(s): 44abfbe

Update model card: add zen/zenlm tags, fix branding

Browse files
Files changed (1) hide show
  1. README.md +122 -52
README.md CHANGED
@@ -1,81 +1,151 @@
1
  ---
2
- language: en
 
 
 
 
 
 
 
3
  license: apache-2.0
 
4
  tags:
5
- - training
6
  - zen
7
  - zenlm
8
- - hanzo
9
  ---
10
 
11
- # Zen Training
12
 
13
- Training infrastructure and recipes for the Zen model family.
14
 
15
- **Zen LM by Hanzo AI** β€” Open training configurations for all Zen models.
16
 
17
- ## Overview
18
 
19
- This repository contains the training configurations, scripts, and recipes used to train Zen models using the Zen MoDE (Mixture of Distilled Experts) architecture. All training runs use mixed-precision distributed training with full support for LoRA/QLoRA fine-tuning and alignment techniques.
20
 
21
- ## Training Recipes
 
 
 
 
 
22
 
23
- | Model | Type | Parameters | Context | Hardware |
24
- |-------|------|-----------|---------|----------|
25
- | Zen Nano | Dense | 0.6B | 32K | 1x H100 |
26
- | Zen Eco | Dense | 4B | 64K | 4x H100 |
27
- | Zen Pro | Dense | 8B | 128K | 8x H100 |
28
- | Zen MAX | MoE | 235B (22B active) | 128K | 64x H100 |
29
 
30
- ## Features
31
 
32
- - Mixed precision training (BF16)
33
- - Gradient checkpointing
34
- - Distributed training with FSDP / DeepSpeed ZeRO-3
35
- - LoRA / QLoRA fine-tuning support
36
- - RLHF and DPO alignment pipelines
37
- - Dataset mixing and curriculum scheduling
38
- - Evaluation harness integration
39
 
40
- ## Supported Training Tasks
 
41
 
42
- - Instruction tuning
43
- - Function calling
44
- - Agent trajectory training
45
- - Vision-language alignment
46
- - Code generation fine-tuning
47
- - Reasoning / chain-of-thought distillation
48
 
49
- ## Dataset Support
50
 
51
- Training recipes support direct streaming from HuggingFace datasets:
 
 
 
 
52
 
53
- - Instruction tuning corpora
54
- - Agent behavior datasets
55
- - Function calling datasets
56
- - Code and math reasoning sets
57
- - Multilingual alignment data
58
 
59
- ## Quick Start
60
 
61
- See [github.com/zenlm/zen-family](https://github.com/zenlm/zen-family) for full documentation, training scripts, and configuration files.
 
 
 
 
62
 
63
- ```bash
64
- git clone https://github.com/zenlm/zen-family
65
- cd zen-family/training
66
- pip install -r requirements.txt
67
- python train.py --config configs/zen-pro-8b.yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ```
69
 
70
- ## Related Repositories
 
 
 
 
 
 
71
 
72
- | Repo | Description |
73
- |------|-------------|
74
- | [zenlm/zen-family](https://huggingface.co/zenlm/zen-family) | Model family overview |
75
- | [zenlm/zen-nano-600m-instruct](https://huggingface.co/zenlm/zen-nano-600m-instruct) | Zen Nano β€” 0.6B |
76
- | [zenlm/zen-pro-8b-instruct](https://huggingface.co/zenlm/zen-pro-8b-instruct) | Zen Pro β€” 8B |
77
- | [zenlm/zen-max-235b-a22b-instruct](https://huggingface.co/zenlm/zen-max-235b-a22b-instruct) | Zen MAX β€” 235B MoE |
78
 
79
- ## License
 
 
 
 
 
 
 
 
 
80
 
81
  Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Zen Training
3
+ emoji: 🧘
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.0.0
8
+ app_file: app.py
9
+ pinned: true
10
  license: apache-2.0
11
+ hardware: a10g-large
12
  tags:
 
13
  - zen
14
  - zenlm
 
15
  ---
16
 
17
+ # 🧘 Zen Training Space
18
 
19
+ **Unified Training Platform for All Zen Models**
20
 
21
+ Train any Zen model with any dataset combination from HuggingFace. Everything runs directly from HF datasets - no local storage needed!
22
 
23
+ ## 🎯 Features
24
 
25
+ ### Supported Models
26
 
27
+ **Language Models:**
28
+ - `zen-nano` (0.6B) - Edge deployment
29
+ - `zen-eco` (4B) - Balanced performance
30
+ - `zen-omni` (7B) - Multi-task
31
+ - `zen-coder` (14B) - Code generation
32
+ - `zen-next` (32B) - Frontier performance
33
 
34
+ **Vision-Language Models:**
35
+ - `zen-vl-4b` - Efficient VL with function calling
36
+ - `zen-vl-8b` - Enhanced VL capabilities
37
+ - `zen-vl-30b` - Maximum VL performance
 
 
38
 
39
+ ### Supported Datasets
40
 
41
+ **Agent Training (ADP):**
42
+ - AgentTuning OS/KG/DB (~15k samples)
43
+ - Synatra (99k agent trajectories)
44
+ - Code Feedback (66k samples)
45
+ - Go Browse (27k web interactions)
 
 
46
 
47
+ **Function Calling:**
48
+ - xLAM 60k (Salesforce high-quality function calling)
49
 
50
+ **Instruction Tuning:**
51
+ - Alpaca (52k instruction samples)
 
 
 
 
52
 
53
+ ## πŸš€ How to Use
54
 
55
+ 1. **Select Model**: Choose from language or vision-language models
56
+ 2. **Select Datasets**: Check multiple datasets to combine them
57
+ 3. **Configure Training**: Set epochs, batch size, learning rate, max samples
58
+ 4. **Set Output Repo**: Specify HuggingFace repo for trained model
59
+ 5. **Start Training**: Click the button and monitor logs
60
 
61
+ ## βš™οΈ Training Configuration
 
 
 
 
62
 
63
+ ### Recommended Settings
64
 
65
+ **4B Models (A10G - 24GB):**
66
+ - Batch Size: 1-2
67
+ - Max Samples: 10,000-30,000
68
+ - Time: 4-8 hours
69
+ - Cost: ~$3-5
70
 
71
+ **8B Models (A100 - 40GB):**
72
+ - Batch Size: 2-4
73
+ - Max Samples: 30,000-50,000
74
+ - Time: 8-12 hours
75
+ - Cost: ~$15-20
76
+
77
+ **32B Models (A100 - 80GB):**
78
+ - Batch Size: 1-2
79
+ - Max Samples: 50,000-100,000
80
+ - Time: 20-30 hours
81
+ - Cost: ~$50-80
82
+
83
+ ## πŸ“Š Dataset Combinations
84
+
85
+ ### For Agent Training:
86
+ ```
87
+ ADP Synatra (80%) + xLAM (20%)
88
+ = Strong agent + quality function calling
89
+ ```
90
+
91
+ ### For Code Models:
92
+ ```
93
+ Code Feedback (70%) + Alpaca (30%)
94
+ = Code expertise + general instruction following
95
+ ```
96
+
97
+ ### For VL Models:
98
+ ```
99
+ ADP (all configs) + xLAM
100
+ = Complete vision-language agent training
101
  ```
102
 
103
+ ## πŸ”’ Requirements
104
+
105
+ - HuggingFace Pro account (for GPU access)
106
+ - Write access to output repository
107
+ - HF_TOKEN secret set in Space settings
108
+
109
+ ## πŸ’‘ Tips
110
 
111
+ 1. **Start Small**: Test with 1,000 samples first
112
+ 2. **Mix Datasets**: Combine complementary datasets for best results
113
+ 3. **Monitor Logs**: Watch for OOM errors and adjust batch size
114
+ 4. **Save Often**: Lower save_steps for longer training runs
 
 
115
 
116
+ ## πŸ“š Resources
117
+
118
+ - **Website**: https://zenlm.org
119
+ - **GitHub**: https://github.com/zenlm
120
+ - **Models**: https://huggingface.co/zenlm
121
+ - **Datasets**:
122
+ - [ADP](https://huggingface.co/datasets/neulab/agent-data-collection)
123
+ - [xLAM](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)
124
+
125
+ ## πŸ“„ License
126
 
127
  Apache 2.0
128
+
129
+ ## πŸ™ Citations
130
+
131
+ ```bibtex
132
+ @software{zen-training-2025,
133
+ title={Zen Training: Unified Training Platform for Zen Models},
134
+ author={Zen AI Team},
135
+ year={2025},
136
+ url={https://huggingface.co/spaces/zenlm/zen-training}
137
+ }
138
+
139
+ @article{adp2024,
140
+ title={Agent Data Protocol},
141
+ author={NeuLab},
142
+ journal={arXiv preprint arXiv:2510.24702},
143
+ year={2024}
144
+ }
145
+
146
+ @dataset{xlam2024,
147
+ title={xLAM Function Calling Dataset},
148
+ author={Salesforce Research},
149
+ year={2024}
150
+ }
151
+ ```