oktoengine / docs /GETTING_STARTED.md
OktoSeek's picture
Upload 26 files
f203641 verified

Getting Started with OktoEngine

Your first 5 minutes with OktoEngine - A quick guide to get you up and running.


Prerequisites

  • OktoEngine installed (download from GitHub Releases)
  • Basic understanding of AI/ML concepts
  • A dataset ready for training (optional for first run)

Step 1: Install OktoEngine

Download Pre-built Binary

  1. Visit GitHub Releases
  2. Download the binary for your platform:
    • Windows: okto-windows.exe
    • Linux: okto-linux
    • macOS: okto-macos
  3. Make it executable (Linux/Mac):
    chmod +x okto-linux
    
  4. Add to PATH (optional but recommended)

Verify Installation

okto --version

Should output: okto 0.1.0


Step 2: Check Your System

Before starting, check if your system is ready:

okto doctor

This will show:

  • βœ… Platform information
  • βœ… RAM and CPU
  • βœ… GPU detection
  • βœ… CUDA availability
  • βœ… Runtime environment
  • βœ… Dependencies status

If dependencies are missing:

okto doctor --install

Automatically installs missing dependencies.


Step 3: Create Your First Project

Initialize a new OktoScript project:

okto init my-first-model
cd my-first-model

This creates:

my-first-model/
β”œβ”€β”€ scripts/
β”‚   └── train.okt          # Your training configuration
β”œβ”€β”€ dataset/
β”‚   β”œβ”€β”€ train.jsonl        # Training data (sample)
β”‚   └── val.jsonl          # Validation data (sample)
└── export/                # Where models will be exported

Step 4: Prepare Your Dataset

Edit dataset/train.jsonl with your training data:

dataset/train.jsonl:

{"input":"Hello","output":"Hi! How can I help you?"}
{"input":"What's the weather?","output":"I don't have access to weather data."}
{"input":"Thank you","output":"You're welcome!"}

Minimum requirements:

  • At least 10 examples for basic training
  • Consistent format (JSONL recommended)
  • Valid JSON on each line

Supported formats:

  • JSONL (recommended)
  • CSV
  • TXT
  • Parquet

Step 5: Configure Your Training

Edit scripts/train.okt:

PROJECT "MyFirstModel"
DESCRIPTION "My first AI model with OktoEngine"

ENV {
  accelerator: "gpu"
  min_memory: "8GB"
  precision: "fp16"
  install_missing: true
}

DATASET {
  train: "dataset/train.jsonl"
  validation: "dataset/val.jsonl"
}

MODEL {
  base: "gpt2"
}

TRAIN {
  epochs: 5
  batch_size: 32
  device: "auto"
}

EXPORT {
  format: ["okm"]
  path: "export/"
}

Key settings:

  • PROJECT - Your model name
  • MODEL.base - Base model (gpt2, distilgpt2, etc.)
  • TRAIN.epochs - Number of training epochs
  • TRAIN.batch_size - Batch size
  • TRAIN.device - "auto" detects GPU/CPU automatically
  • EXPORT.format - Output format

Step 6: Validate Your Configuration

Before training, validate your configuration:

okto validate

What it checks:

  • βœ… Syntax is correct
  • βœ… All required fields are present
  • βœ… Dataset files exist
  • βœ… Model paths are valid
  • βœ… Values are within allowed ranges

Example output:

πŸ™ OktoEngine v0.1
πŸ” Validating OktoScript file: "scripts/train.okt"
πŸ“„ File: "scripts/train.okt"
πŸ“„ Size: 382 bytes
πŸ“„ Lines: 31

βœ” File parsed successfully

πŸ“‹ Validation Results:
βœ… Validation passed! No errors or warnings.

πŸ“Š Summary:
  Project: MyFirstModel
  ENV: Configured
  Dataset: dataset/train.jsonl
  Model: gpt2
  Training: 5 epochs, batch size 32
  Export: ["okm"]

If validation fails:

  • Check error messages
  • Fix syntax errors
  • Verify file paths
  • Run okto validate --debug for detailed logs

Step 7: Train Your Model

Start training:

okto train

What happens:

  1. βœ… Configuration is parsed and validated
  2. βœ… System environment is checked
  3. βœ… Dependencies are verified
  4. βœ… Dataset is loaded
  5. βœ… Model is initialized (downloads from HuggingFace if needed)
  6. βœ… Training loop starts
  7. βœ… Progress is shown in real-time
  8. βœ… Model is saved to runs/MyFirstModel/
  9. βœ… Exported models saved to export/

Example output:

πŸ™ OktoEngine v0.1
πŸ“„ Reading: "scripts/train.okt"

πŸ“Š Environment Check:
  βœ” Runtime: Python 3.14.0
  βœ” GPU: NVIDIA GeForce RTX 4070
  βœ” RAM: 63GB (40GB available)
  βœ” Platform: windows

πŸ“¦ Checking dependencies...
  βœ” All dependencies available

πŸš€ Starting training pipeline...

Epoch 1/5: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:15<00:00, 3.70it/s]
  Loss: 2.345 β†’ 1.892
  Learning Rate: 5e-5
  GPU Memory: 8.2GB / 12GB

Epoch 2/5: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:14<00:00, 3.72it/s]
  Loss: 1.892 β†’ 1.654

...

βœ… Training completed successfully!
πŸ“ Output: runs/MyFirstModel/

Training time:

  • Small models (100M params): 5-15 minutes
  • Medium models (1B params): 30-60 minutes
  • Large models (7B params): Several hours

Step 8: Check Your Results

After training completes:

Check training output:

ls runs/MyFirstModel/

Files created:

  • checkpoint-*/ - Training checkpoints
  • training_logs.json - Detailed training logs
  • metrics.json - Training metrics
  • tokenizer.json - Tokenizer configuration

Check exported models:

ls export/

Exported files:

  • model.okm - OktoSeek Model format

Step 9: Evaluate Your Model (Optional)

Evaluate your trained model:

okto eval

Output:

πŸ™ OktoEngine v0.1
πŸ“Š Evaluating model...

πŸ“ˆ Evaluation Results:
  Accuracy: 0.892
  Loss: 1.234
  Perplexity: 2.456
  F1-Score: 0.876

βœ… Evaluation completed!

Common First Steps

Using GPU

If you have a GPU, OktoEngine will automatically detect and use it. To ensure GPU usage:

ENV {
  accelerator: "gpu"
  precision: "fp16"
}

TRAIN {
  device: "auto"  # or "cuda" for explicit GPU
}

Adding More Epochs

TRAIN {
  epochs: 10  # Increase from 5
  batch_size: 32
}

Exporting to Multiple Formats

EXPORT {
  format: ["okm", "onnx", "gguf"]
  path: "export/"
}

Using Debug Mode

For detailed logs during training:

okto train --debug

Shows:

  • Parsing details
  • Execution flow
  • Error diagnostics
  • Performance metrics

Troubleshooting

Training Fails

Check system:

okto doctor

Check configuration:

okto validate --debug

Common issues:

  • Out of memory: Reduce batch_size in TRAIN block
  • Model not found: Check MODEL.base is a valid HuggingFace model
  • Dataset not found: Verify paths in DATASET block
  • Dependencies missing: Run okto doctor --install

Validation Fails

Enable debug mode:

okto validate --debug

Common errors:

  • Syntax errors - Check OktoScript syntax
  • Missing fields - Add required blocks
  • Invalid paths - Verify file paths exist
  • Invalid values - Check value ranges

System Issues

Check system:

okto doctor

Install dependencies:

okto doctor --install

Next Steps


Quick Reference

Task Command
Initialize project okto init <name>
Validate okto validate
Check system okto doctor
Train okto train
Evaluate okto eval
Export okto export --format okm
Debug mode okto train --debug
Upgrade okto upgrade

Need help? Check the FAQ or open an issue on GitHub.