camdog920
/

aether-core

Model card Files Files and versions

xet

Community

camdog920 commited on 21 days ago

Commit

0f70ee2

verified ·

1 Parent(s): e38e45d

Upload TRAINING_OPTIONS.md

Browse files

Files changed (1) hide show

TRAINING_OPTIONS.md +199 -0

TRAINING_OPTIONS.md ADDED Viewed

	@@ -0,0 +1,199 @@

+# AETHER Training — All Available Options
+Since HF Jobs credits are not available, here are every working alternative to train your AETHER model.
+---
+## Option 1: Google Colab (FREE — Recommended)
+**GPU**: T4 (16GB VRAM) — FREE for ~12 hours/day
+**Time**: 2-3 hours for 1 epoch on Qwen 0.5B
+**Cost**: $0
+### Steps:
+1. Open the notebook: [`AETHER_Colab_Training.ipynb`](./AETHER_Colab_Training.ipynb)
+2. Upload to Google Colab: https://colab.research.google.com/
+3. Runtime → Change runtime type → GPU → T4
+4. Run all cells
+5. Model auto-pushes to your HF Hub at the end
+### Colab Direct Link:
+```
+https://colab.research.google.com/github/camdog920/aether-core/blob/main/AETHER_Colab_Training.ipynb
+```
+**Pro tip**: Use `accelerate launch` for faster training with gradient accumulation.
+---
+## Option 2: Kaggle (FREE)
+**GPU**: T4 x2 (30 hours/week free)
+**Better than Colab**: 2x GPU, longer sessions
+### Steps:
+1. Go to https://www.kaggle.com/code
+2. New Notebook → Add dataset → Upload `AETHER_Colab_Training.ipynb`
+3. Accelerator → GPU T4 x2
+4. Run
+---
+## Option 3: Vast.ai (CHEAP — $0.20-0.50/hr)
+**GPU**: RTX 3090 (24GB) ~$0.30/hr, RTX 4090 ~$0.50/hr
+**Best value**: Massive VRAM for larger models
+### Steps:
+1. Go to https://vast.ai/
+2. Search: `RTX 3090`, sort by $/hr
+3. Rent instance (need ~$5 credit)
+4. SSH in:
+```bash
+# On the instance
+git clone https://huggingface.co/camdog920/aether-core
+cd aether-core
+pip install -r requirements.txt
+python aether_train.py --model_name Qwen/Qwen2.5-0.5B-Instruct
+```
+---
+## Option 4: RunPod (CHEAP — $0.30-0.60/hr)
+**GPU**: RTX 3090/4090, A100 (80GB)
+**Good**: Serverless training, auto-shutdown
+### Steps:
+1. https://www.runpod.io/
+2. Community Cloud → RTX 3090
+3. Deploy PyTorch template
+4. Same commands as Vast.ai above
+---
+## Option 5: Lambda Labs (FREE TRIAL — $30 credits)
+**GPU**: A10 (24GB), A100 (40GB)
+**Free tier**: $30 credit for new users
+### Steps:
+1. https://lambdalabs.com/service/gpu-cloud
+2. Sign up → get $30 free
+3. Launch instance
+4. Train:
+```bash
+git clone https://huggingface.co/camdog920/aether-core
+cd aether-core
+pip install -r requirements.txt
+HF_TOKEN=your_token python aether_train.py
+```
+---
+## Option 6: Paperspace (FREE — Community GPUs)
+**GPU**: Free community GPUs available
+**URL**: https://www.paperspace.com/
+---
+## Option 7: Your Local Machine
+If you have a GPU with 8GB+ VRAM:
+```bash
+# Clone repo
+git clone https://huggingface.co/camdog920/aether-core
+cd aether-core
+# Create conda env
+conda create -n aether python=3.10
+conda activate aether
+# Install deps
+pip install -r requirements.txt
+# Set your HF token
+export HF_TOKEN=hf_xxxxxxxxxxxxxxxx
+# Train (uses bf16 on Ampere/Ada, fp16 on older)
+python aether_train.py \
+    --model_name Qwen/Qwen2.5-0.5B-Instruct \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 1 \
+    --gradient_accumulation_steps 8 \
+    --learning_rate 2e-5 \
+    --push_to_hub \
+    --hub_model_id your-username/aether-qwen-0.5b-grpo
+```
+---
+## Option 8: SageMaker (AWS Free Tier)
+AWS Free Tier: 250 hours/ml.t3.medium (CPU) or use Spot instances for GPU:
+```bash
+# Using SageMaker Python SDK
+import sagemaker
+from sagemaker.pytorch import PyTorch
+estimator = PyTorch(
+    entry_point='aether_train.py',
+    source_dir='.',
+    instance_type='ml.g4dn.xlarge',  # T4 GPU, use Spot for 70% discount
+    instance_count=1,
+    framework_version='2.1',
+    py_version='py310',
+    hyperparameters={'model_name': 'Qwen/Qwen2.5-0.5B-Instruct'},
+)
+estimator.fit()
+```
+---
+## Hardware Requirements by Model Size
+| Model Size | VRAM Needed | Batch Size | Free Option | Paid Option ($/hr) |
+|-----------|------------|-----------|-------------|-------------------|
+| 0.5B (Qwen2.5) | 4GB | 1 + grad_acc=8 | Colab T4 | Vast.ai $0.20 |
+| 1.5B | 6GB | 1 + grad_acc=16 | Colab T4 | Vast.ai $0.20 |
+| 3B | 10GB | 1 + grad_acc=16 | Colab T4 | Vast.ai $0.30 |
+| 7B (LoRA) | 14GB | 1 + LoRA | Kaggle T4x2 | Vast.ai $0.40 |
+| 7B (Full) | 28GB | 1 | — | RunPod A100 $1.50 |
+| 14B (LoRA) | 24GB | 1 + LoRA | — | Vast.ai $0.60 |
+---
+## Quick Start (Any Platform)
+```bash
+# 1. Clone
+git clone https://huggingface.co/camdog920/aether-core
+cd aether-core
+# 2. Install
+pip install torch transformers datasets accelerate peft trl
+# 3. Train
+python aether_train.py
+# 4. Done — model is on your HF Hub
+```
+---
+## What You Get After Training
+- Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` with AETHER neuro-symbolic reasoning
+- Model pushed to: `your-username/aether-qwen-0.5b-grpo`
+- Custom reward function rewards: reasoning structure, step enumeration, causal logic, hierarchical planning, meta-cognition
+- Can integrate with AETHER Core for recursive self-evolution loop
+---
+## Support
+- Code: https://huggingface.co/camdog920/aether-core
+- Issues: Open a discussion on the repo
+- Demo: Run `python aether_demo.py` to see all components working