camdog920 commited on
Commit
0f70ee2
Β·
verified Β·
1 Parent(s): e38e45d

Upload TRAINING_OPTIONS.md

Browse files
Files changed (1) hide show
  1. TRAINING_OPTIONS.md +199 -0
TRAINING_OPTIONS.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AETHER Training β€” All Available Options
2
+
3
+ Since HF Jobs credits are not available, here are every working alternative to train your AETHER model.
4
+
5
+ ---
6
+
7
+ ## Option 1: Google Colab (FREE β€” Recommended)
8
+
9
+ **GPU**: T4 (16GB VRAM) β€” FREE for ~12 hours/day
10
+ **Time**: 2-3 hours for 1 epoch on Qwen 0.5B
11
+ **Cost**: $0
12
+
13
+ ### Steps:
14
+ 1. Open the notebook: [`AETHER_Colab_Training.ipynb`](./AETHER_Colab_Training.ipynb)
15
+ 2. Upload to Google Colab: https://colab.research.google.com/
16
+ 3. Runtime β†’ Change runtime type β†’ GPU β†’ T4
17
+ 4. Run all cells
18
+ 5. Model auto-pushes to your HF Hub at the end
19
+
20
+ ### Colab Direct Link:
21
+ ```
22
+ https://colab.research.google.com/github/camdog920/aether-core/blob/main/AETHER_Colab_Training.ipynb
23
+ ```
24
+
25
+ **Pro tip**: Use `accelerate launch` for faster training with gradient accumulation.
26
+
27
+ ---
28
+
29
+ ## Option 2: Kaggle (FREE)
30
+
31
+ **GPU**: T4 x2 (30 hours/week free)
32
+ **Better than Colab**: 2x GPU, longer sessions
33
+
34
+ ### Steps:
35
+ 1. Go to https://www.kaggle.com/code
36
+ 2. New Notebook β†’ Add dataset β†’ Upload `AETHER_Colab_Training.ipynb`
37
+ 3. Accelerator β†’ GPU T4 x2
38
+ 4. Run
39
+
40
+ ---
41
+
42
+ ## Option 3: Vast.ai (CHEAP β€” $0.20-0.50/hr)
43
+
44
+ **GPU**: RTX 3090 (24GB) ~$0.30/hr, RTX 4090 ~$0.50/hr
45
+ **Best value**: Massive VRAM for larger models
46
+
47
+ ### Steps:
48
+ 1. Go to https://vast.ai/
49
+ 2. Search: `RTX 3090`, sort by $/hr
50
+ 3. Rent instance (need ~$5 credit)
51
+ 4. SSH in:
52
+ ```bash
53
+ # On the instance
54
+ git clone https://huggingface.co/camdog920/aether-core
55
+ cd aether-core
56
+ pip install -r requirements.txt
57
+ python aether_train.py --model_name Qwen/Qwen2.5-0.5B-Instruct
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Option 4: RunPod (CHEAP β€” $0.30-0.60/hr)
63
+
64
+ **GPU**: RTX 3090/4090, A100 (80GB)
65
+ **Good**: Serverless training, auto-shutdown
66
+
67
+ ### Steps:
68
+ 1. https://www.runpod.io/
69
+ 2. Community Cloud β†’ RTX 3090
70
+ 3. Deploy PyTorch template
71
+ 4. Same commands as Vast.ai above
72
+
73
+ ---
74
+
75
+ ## Option 5: Lambda Labs (FREE TRIAL β€” $30 credits)
76
+
77
+ **GPU**: A10 (24GB), A100 (40GB)
78
+ **Free tier**: $30 credit for new users
79
+
80
+ ### Steps:
81
+ 1. https://lambdalabs.com/service/gpu-cloud
82
+ 2. Sign up β†’ get $30 free
83
+ 3. Launch instance
84
+ 4. Train:
85
+ ```bash
86
+ git clone https://huggingface.co/camdog920/aether-core
87
+ cd aether-core
88
+ pip install -r requirements.txt
89
+ HF_TOKEN=your_token python aether_train.py
90
+ ```
91
+
92
+ ---
93
+
94
+ ## Option 6: Paperspace (FREE β€” Community GPUs)
95
+
96
+ **GPU**: Free community GPUs available
97
+ **URL**: https://www.paperspace.com/
98
+
99
+ ---
100
+
101
+ ## Option 7: Your Local Machine
102
+
103
+ If you have a GPU with 8GB+ VRAM:
104
+
105
+ ```bash
106
+ # Clone repo
107
+ git clone https://huggingface.co/camdog920/aether-core
108
+ cd aether-core
109
+
110
+ # Create conda env
111
+ conda create -n aether python=3.10
112
+ conda activate aether
113
+
114
+ # Install deps
115
+ pip install -r requirements.txt
116
+
117
+ # Set your HF token
118
+ export HF_TOKEN=hf_xxxxxxxxxxxxxxxx
119
+
120
+ # Train (uses bf16 on Ampere/Ada, fp16 on older)
121
+ python aether_train.py \
122
+ --model_name Qwen/Qwen2.5-0.5B-Instruct \
123
+ --num_train_epochs 1 \
124
+ --per_device_train_batch_size 1 \
125
+ --gradient_accumulation_steps 8 \
126
+ --learning_rate 2e-5 \
127
+ --push_to_hub \
128
+ --hub_model_id your-username/aether-qwen-0.5b-grpo
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Option 8: SageMaker (AWS Free Tier)
134
+
135
+ AWS Free Tier: 250 hours/ml.t3.medium (CPU) or use Spot instances for GPU:
136
+ ```bash
137
+ # Using SageMaker Python SDK
138
+ import sagemaker
139
+ from sagemaker.pytorch import PyTorch
140
+
141
+ estimator = PyTorch(
142
+ entry_point='aether_train.py',
143
+ source_dir='.',
144
+ instance_type='ml.g4dn.xlarge', # T4 GPU, use Spot for 70% discount
145
+ instance_count=1,
146
+ framework_version='2.1',
147
+ py_version='py310',
148
+ hyperparameters={'model_name': 'Qwen/Qwen2.5-0.5B-Instruct'},
149
+ )
150
+ estimator.fit()
151
+ ```
152
+
153
+ ---
154
+
155
+ ## Hardware Requirements by Model Size
156
+
157
+ | Model Size | VRAM Needed | Batch Size | Free Option | Paid Option ($/hr) |
158
+ |-----------|------------|-----------|-------------|-------------------|
159
+ | 0.5B (Qwen2.5) | 4GB | 1 + grad_acc=8 | Colab T4 | Vast.ai $0.20 |
160
+ | 1.5B | 6GB | 1 + grad_acc=16 | Colab T4 | Vast.ai $0.20 |
161
+ | 3B | 10GB | 1 + grad_acc=16 | Colab T4 | Vast.ai $0.30 |
162
+ | 7B (LoRA) | 14GB | 1 + LoRA | Kaggle T4x2 | Vast.ai $0.40 |
163
+ | 7B (Full) | 28GB | 1 | β€” | RunPod A100 $1.50 |
164
+ | 14B (LoRA) | 24GB | 1 + LoRA | β€” | Vast.ai $0.60 |
165
+
166
+ ---
167
+
168
+ ## Quick Start (Any Platform)
169
+
170
+ ```bash
171
+ # 1. Clone
172
+ git clone https://huggingface.co/camdog920/aether-core
173
+ cd aether-core
174
+
175
+ # 2. Install
176
+ pip install torch transformers datasets accelerate peft trl
177
+
178
+ # 3. Train
179
+ python aether_train.py
180
+
181
+ # 4. Done β€” model is on your HF Hub
182
+ ```
183
+
184
+ ---
185
+
186
+ ## What You Get After Training
187
+
188
+ - Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` with AETHER neuro-symbolic reasoning
189
+ - Model pushed to: `your-username/aether-qwen-0.5b-grpo`
190
+ - Custom reward function rewards: reasoning structure, step enumeration, causal logic, hierarchical planning, meta-cognition
191
+ - Can integrate with AETHER Core for recursive self-evolution loop
192
+
193
+ ---
194
+
195
+ ## Support
196
+
197
+ - Code: https://huggingface.co/camdog920/aether-core
198
+ - Issues: Open a discussion on the repo
199
+ - Demo: Run `python aether_demo.py` to see all components working