# Getting Started with OktoEngine **Your first 5 minutes with OktoEngine** - A quick guide to get you up and running. --- ## Prerequisites - OktoEngine installed (download from [GitHub Releases](https://github.com/oktoseek/oktoengine/releases)) - Basic understanding of AI/ML concepts - A dataset ready for training (optional for first run) --- ## Step 1: Install OktoEngine ### Download Pre-built Binary 1. Visit [GitHub Releases](https://github.com/oktoseek/oktoengine/releases) 2. Download the binary for your platform: - **Windows:** `okto-windows.exe` - **Linux:** `okto-linux` - **macOS:** `okto-macos` 3. Make it executable (Linux/Mac): ```bash chmod +x okto-linux ``` 4. Add to PATH (optional but recommended) ### Verify Installation ```bash okto --version ``` Should output: `okto 0.1.0` --- ## Step 2: Check Your System Before starting, check if your system is ready: ```bash okto doctor ``` This will show: - ✅ Platform information - ✅ RAM and CPU - ✅ GPU detection - ✅ CUDA availability - ✅ Runtime environment - ✅ Dependencies status **If dependencies are missing:** ```bash okto doctor --install ``` Automatically installs missing dependencies. --- ## Step 3: Create Your First Project Initialize a new OktoScript project: ```bash okto init my-first-model cd my-first-model ``` This creates: ``` my-first-model/ ├── scripts/ │ └── train.okt # Your training configuration ├── dataset/ │ ├── train.jsonl # Training data (sample) │ └── val.jsonl # Validation data (sample) └── export/ # Where models will be exported ``` --- ## Step 4: Prepare Your Dataset Edit `dataset/train.jsonl` with your training data: **dataset/train.jsonl:** ```json {"input":"Hello","output":"Hi! How can I help you?"} {"input":"What's the weather?","output":"I don't have access to weather data."} {"input":"Thank you","output":"You're welcome!"} ``` **Minimum requirements:** - At least 10 examples for basic training - Consistent format (JSONL recommended) - Valid JSON on each line **Supported formats:** - JSONL (recommended) - CSV - TXT - Parquet --- ## Step 5: Configure Your Training Edit `scripts/train.okt`: ```okt PROJECT "MyFirstModel" DESCRIPTION "My first AI model with OktoEngine" ENV { accelerator: "gpu" min_memory: "8GB" precision: "fp16" install_missing: true } DATASET { train: "dataset/train.jsonl" validation: "dataset/val.jsonl" } MODEL { base: "gpt2" } TRAIN { epochs: 5 batch_size: 32 device: "auto" } EXPORT { format: ["okm"] path: "export/" } ``` **Key settings:** - `PROJECT` - Your model name - `MODEL.base` - Base model (gpt2, distilgpt2, etc.) - `TRAIN.epochs` - Number of training epochs - `TRAIN.batch_size` - Batch size - `TRAIN.device` - "auto" detects GPU/CPU automatically - `EXPORT.format` - Output format --- ## Step 6: Validate Your Configuration Before training, validate your configuration: ```bash okto validate ``` **What it checks:** - ✅ Syntax is correct - ✅ All required fields are present - ✅ Dataset files exist - ✅ Model paths are valid - ✅ Values are within allowed ranges **Example output:** ``` 🐙 OktoEngine v0.1 🔍 Validating OktoScript file: "scripts/train.okt" 📄 File: "scripts/train.okt" 📄 Size: 382 bytes 📄 Lines: 31 ✔ File parsed successfully 📋 Validation Results: ✅ Validation passed! No errors or warnings. 📊 Summary: Project: MyFirstModel ENV: Configured Dataset: dataset/train.jsonl Model: gpt2 Training: 5 epochs, batch size 32 Export: ["okm"] ``` **If validation fails:** - Check error messages - Fix syntax errors - Verify file paths - Run `okto validate --debug` for detailed logs --- ## Step 7: Train Your Model Start training: ```bash okto train ``` **What happens:** 1. ✅ Configuration is parsed and validated 2. ✅ System environment is checked 3. ✅ Dependencies are verified 4. ✅ Dataset is loaded 5. ✅ Model is initialized (downloads from HuggingFace if needed) 6. ✅ Training loop starts 7. ✅ Progress is shown in real-time 8. ✅ Model is saved to `runs/MyFirstModel/` 9. ✅ Exported models saved to `export/` **Example output:** ``` 🐙 OktoEngine v0.1 📄 Reading: "scripts/train.okt" 📊 Environment Check: ✔ Runtime: Python 3.14.0 ✔ GPU: NVIDIA GeForce RTX 4070 ✔ RAM: 63GB (40GB available) ✔ Platform: windows 📦 Checking dependencies... ✔ All dependencies available 🚀 Starting training pipeline... Epoch 1/5: 100%|████████████| 500/500 [02:15<00:00, 3.70it/s] Loss: 2.345 → 1.892 Learning Rate: 5e-5 GPU Memory: 8.2GB / 12GB Epoch 2/5: 100%|████████████| 500/500 [02:14<00:00, 3.72it/s] Loss: 1.892 → 1.654 ... ✅ Training completed successfully! 📁 Output: runs/MyFirstModel/ ``` **Training time:** - Small models (100M params): 5-15 minutes - Medium models (1B params): 30-60 minutes - Large models (7B params): Several hours --- ## Step 8: Check Your Results After training completes: **Check training output:** ```bash ls runs/MyFirstModel/ ``` **Files created:** - `checkpoint-*/` - Training checkpoints - `training_logs.json` - Detailed training logs - `metrics.json` - Training metrics - `tokenizer.json` - Tokenizer configuration **Check exported models:** ```bash ls export/ ``` **Exported files:** - `model.okm` - OktoSeek Model format --- ## Step 9: Evaluate Your Model (Optional) Evaluate your trained model: ```bash okto eval ``` **Output:** ``` 🐙 OktoEngine v0.1 📊 Evaluating model... 📈 Evaluation Results: Accuracy: 0.892 Loss: 1.234 Perplexity: 2.456 F1-Score: 0.876 ✅ Evaluation completed! ``` --- ## Common First Steps ### Using GPU If you have a GPU, OktoEngine will automatically detect and use it. To ensure GPU usage: ```okt ENV { accelerator: "gpu" precision: "fp16" } TRAIN { device: "auto" # or "cuda" for explicit GPU } ``` ### Adding More Epochs ```okt TRAIN { epochs: 10 # Increase from 5 batch_size: 32 } ``` ### Exporting to Multiple Formats ```okt EXPORT { format: ["okm", "onnx", "gguf"] path: "export/" } ``` ### Using Debug Mode For detailed logs during training: ```bash okto train --debug ``` Shows: - Parsing details - Execution flow - Error diagnostics - Performance metrics --- ## Troubleshooting ### Training Fails **Check system:** ```bash okto doctor ``` **Check configuration:** ```bash okto validate --debug ``` **Common issues:** - **Out of memory:** Reduce `batch_size` in TRAIN block - **Model not found:** Check `MODEL.base` is a valid HuggingFace model - **Dataset not found:** Verify paths in DATASET block - **Dependencies missing:** Run `okto doctor --install` ### Validation Fails **Enable debug mode:** ```bash okto validate --debug ``` **Common errors:** - Syntax errors - Check OktoScript syntax - Missing fields - Add required blocks - Invalid paths - Verify file paths exist - Invalid values - Check value ranges ### System Issues **Check system:** ```bash okto doctor ``` **Install dependencies:** ```bash okto doctor --install ``` --- ## Next Steps - 📚 Read the [Complete CLI Reference](./CLI_REFERENCE.md) - 🎯 Check out [Examples](../examples/) for advanced use cases - 🐛 Learn about [Debug Mode](./DEBUG_GUIDE.md) - 💡 Explore [FAQ](./FAQ.md) for common questions --- ## Quick Reference | Task | Command | |------|---------| | Initialize project | `okto init ` | | Validate | `okto validate` | | Check system | `okto doctor` | | Train | `okto train` | | Evaluate | `okto eval` | | Export | `okto export --format okm` | | Debug mode | `okto train --debug` | | Upgrade | `okto upgrade` | --- **Need help?** Check the [FAQ](./FAQ.md) or open an issue on [GitHub](https://github.com/oktoseek/oktoengine/issues).