oktoengine / README.md

OktoSeek

Update README.md

b0673d8 verified about 2 months ago

preview code

raw

history blame contribute delete

18.2 kB

metadata

tags:
  - ai
  - training
  - dsl
  - oktoscript
  - oktoseek
  - okto
  - automation
  - ai-pipelines
  - ai-governance
language:
  - en
frameworks:
  - pytorch
  - tensorflow

OktoEngine Banner

OktoScript Banner

OktoEngine

Professional CLI Engine for Training AI Models with OktoScript

Built by OktoSeek AI for the OktoSeek ecosystem

OktoSeek Homepage • OktoScript Language • Twitter • YouTube

What is OktoEngine?
Quick Start
Key Features
Installation
CLI Commands
Training Capabilities
Debug Mode
Examples
System Requirements
Documentation
FAQ
License
Contact

🚀 Quick Start

Get started with OktoEngine in 3 steps:

Download the latest release from GitHub Releases
Initialize a project: okto init my-project
Train your model: okto train

# Initialize a new project
okto init my-ai-model

# Navigate to project
cd my-ai-model

# Validate your OktoScript configuration
okto validate

# Train your model
okto train

📚 Full documentation: docs/GETTING_STARTED.md
🔍 CLI Reference: docs/CLI_REFERENCE.md

🚀 What is OktoEngine?

OktoEngine is the official execution engine for OktoScript—a powerful CLI tool that transforms declarative AI configurations into trained, production-ready models.

Built for Scale

OktoEngine is engineered to handle:

✅ Models of any size - From millions to billions of parameters
✅ Complex training pipelines - Full fine-tuning, LoRA adapters, and more
✅ Production workloads - Optimized for real-world AI development
✅ Enterprise-grade reliability - Robust error handling and validation

Why OktoEngine?

Traditional Approach:

# Hundreds of lines of Python code
# Complex configuration management
# Error-prone manual setup
# Difficult to reproduce

With OktoEngine:

PROJECT "MyModel"
MODEL { base: "gpt2" }
DATASET { train: "dataset/train.jsonl" }
TRAIN { epochs: 5, batch_size: 32 }
EXPORT { format: ["okm"] }

One command: okto train → Trained model ready for deployment

✨ Key Features

🎯 Complete CLI Interface

Professional command-line interface with intuitive commands:

Core Commands:

okto init          # Initialize new projects
okto validate      # Validate OktoScript files
okto train         # Train models
okto eval          # Evaluate models
okto export        # Export to multiple formats
okto convert       # Convert between formats (PyTorch, ONNX, GGUF, TFLite, OktoModel)

Inference Commands:

okto infer         # Direct inference (single input/output)
okto chat          # Interactive chat mode with session context

Analysis Commands:

okto compare       # Compare two models (latency, accuracy, loss)
okto logs          # View historical training logs and CONTROL decisions
okto tune          # Auto-tune training using CONTROL block logic

Utility Commands:

okto list          # List projects, models, datasets, or exports
okto doctor        # System diagnostics and dependency checking
okto upgrade       # Auto-update engine to latest version
okto about         # Engine and language information
okto exit          # Exit interactive mode

What you can do:

🚀 Train models with full fine-tuning or LoRA adapters
🔄 Convert models between formats for different deployment targets
💬 Chat interactively with trained models
📊 Compare model versions to find the best one
📈 Monitor training with real-time logs and metrics
🎛️ Auto-tune training parameters intelligently
🔍 Validate configurations before training
📦 Export to production-ready formats

🔧 Advanced Training Capabilities

Training Methods:

Full Fine-tuning - Train entire models from scratch with complete parameter updates
LoRA Fine-tuning - Efficient adapter-based training (LoRA, QLoRA, PEFT) with minimal memory footprint
Multi-dataset Training - Combine multiple datasets with weighted sampling and custom mixing strategies
Model Adapters - Apply pre-trained adapters (LoRA/PEFT) to base models for rapid customization

Intelligent Training Control:

Automatic Checkpointing - Never lose progress with smart checkpoint management
Real-time Metrics - Monitor training in the terminal with live updates
CONTROL Block - Define conditional logic (IF, WHEN, EVERY) for autonomous decision-making
Auto-parameter Adjustment - Automatically adjust learning rate, batch size, and other parameters based on metrics
Early Stopping - Intelligent stopping when model performance plateaus or diverges
Memory-aware Training - Automatically reduce batch size when GPU memory is low

Monitoring & Governance:

MONITOR Block - Track any metric (loss, accuracy, GPU usage, throughput, latency, confidence, etc.)
GUARD Block - Safety and ethics protection (hallucination, toxicity, bias detection)
BEHAVIOR Block - Control model personality, verbosity, language, and response style
STABILITY Block - Training safety controls (NaN detection, divergence prevention)
EXPLORER Block - AutoML-style hyperparameter search and optimization

What makes it unique:

🧠 Decision-driven - Models can make autonomous decisions during training
🔄 Self-adapting - Automatically adjusts parameters based on real-time metrics
🛡️ Safe by design - Built-in safety guards and content filtering
📊 Fully observable - Complete visibility into training process and decisions
⚡ Production-ready - Export to multiple formats for deployment

📊 Detailed Metrics & Monitoring

Real-time training metrics displayed directly in your terminal:

🚀 Starting training pipeline...

Epoch 1/5: 100%|████████████| 500/500 [02:15<00:00, 3.70it/s]
  Loss: 2.345 → 1.892
  Learning Rate: 5e-5
  GPU Memory: 8.2GB / 12GB

Epoch 2/5: 100%|████████████| 500/500 [02:14<00:00, 3.72it/s]
  Loss: 1.892 → 1.654
  ...

🐛 Debug Mode

Comprehensive debug mode for troubleshooting:

okto train --debug
okto validate --debug

Shows detailed parsing logs, execution flow, and error diagnostics.

🔄 Automatic Updates

Built-in upgrade system:

okto upgrade

Automatically downloads and installs the latest version from GitHub Releases.

🏥 System Diagnostics

Comprehensive environment checking:

okto doctor

Checks GPU, CUDA, RAM, dependencies, and provides recommendations.

📦 Dependency Management

Automatic dependency installation:

okto doctor --install

Installs missing dependencies automatically.

📥 Installation

Download Pre-built Binaries

Download the latest release for your platform:

Windows: okto-windows.exe
Linux: okto-linux
macOS: okto-macos

Available at: GitHub Releases

Upgrade Existing Installation

okto upgrade

Automatically updates to the latest version.

🖥️ CLI Commands

Core Commands

Initialize Project:

okto init my-project

Creates a new OktoScript project with proper folder structure.

Validate Configuration:

okto validate
okto validate --file scripts/train.okt

Validates OktoScript syntax and configuration.

Train Model:

okto train
okto train --file scripts/train.okt
okto train --debug  # Enable debug mode

Executes the complete training pipeline.

Evaluate Model:

okto eval --file scripts/train.okt

Evaluates a trained model against test datasets.

Export Model:

okto export --format okm --file scripts/train.okt
okto export --format onnx

Exports trained models to various formats.

Convert Model Formats:

okto convert --input model.pt --from pt --to gguf --output model.gguf
okto convert --input model.pt --from pt --to onnx --output model.onnx

Converts models between different formats (PyTorch, ONNX, GGUF, TFLite, OktoModel).

Direct Inference:

okto infer --model models/chatbot.okm --text "Hello, how can I help?"

Runs single inference on a trained model. Automatically respects BEHAVIOR, GUARD, INFERENCE, and CONTROL blocks.

Interactive Chat:

okto chat --model models/chatbot.okm

Starts an interactive chat session. Uses BEHAVIOR settings, enforces GUARD rules, and supports session context.

Compare Models:

okto compare models/v1.okm models/v2.okm

Compares two models on latency, accuracy, loss, and resource usage.

View Logs:

okto logs my-model

Views historical training logs, metrics, and CONTROL decisions.

Auto-tune Training:

okto tune

Uses CONTROL block to auto-adjust training parameters (learning rate, batch size, early stopping).

Utility Commands

System Diagnostics:

okto doctor              # Check system
okto doctor --install    # Auto-install dependencies

Upgrade Engine:

okto upgrade

List Resources:

okto list projects
okto list models
okto list datasets
okto list exports

Other Commands:

okto about      # Show information
okto --version  # Show version
okto exit       # Exit interactive mode

📚 Complete CLI Reference: docs/CLI_REFERENCE.md Automatically updates to the latest version.

About:

okto about

Shows information about OktoEngine and OktoScript.

List Resources:

okto list projects
okto list models
okto list datasets

Global Flags

--debug    # Enable debug mode (detailed logs)
--help     # Show help
--version  # Show version

📖 Complete CLI Reference: docs/CLI_REFERENCE.md

🎓 Training Capabilities

Supported Model Sizes

OktoEngine can train models of any size:

Small Models (1M - 100M parameters) - Fast training, minimal resources
Medium Models (100M - 1B parameters) - Balanced performance
Large Models (1B - 7B parameters) - Requires GPU, optimized training
Very Large Models (7B+ parameters) - Enterprise-grade, multi-GPU support

Training Methods

Full Fine-tuning:

TRAIN {
  epochs: 5
  batch_size: 32
  device: "auto"
}

LoRA Fine-tuning:

FT_LORA {
  lora_rank: 8
  lora_alpha: 32
  epochs: 3
}

Automatic Optimizations

Mixed Precision Training - FP16/BF16 support
Gradient Accumulation - Train large models on smaller GPUs
Automatic Device Selection - CPU/GPU/CUDA detection
Memory Optimization - Efficient memory management
Checkpoint Management - Automatic saving and resuming

🐛 Debug Mode

Debug mode provides detailed insights into the engine's operation:

Enable Debug Mode

# Via command flag
okto train --debug
okto validate --debug

# Via environment variable
OKTO_DEBUG=1 okto train

What Debug Mode Shows

Parsing Details:

DEBUG: Starting parse_oktoscript. Input preview: '# okto_version: "1.0" PROJECT...'
DEBUG: Parsed version: Some("1.0")
DEBUG: Parsed project: my-model
DEBUG: After PROJECT, remaining input: 'ENV { accelerator: "gpu"...'

Execution Flow:

DEBUG: Attempting to parse ENV block...
DEBUG: Parsed ENV field: accelerator = gpu
DEBUG: Parsed ENV field: precision = fp16
DEBUG: Successfully parsed ENV block with 5 fields

Error Diagnostics:

DEBUG: Failed to parse key in ENV block. Input: 'accelerator: "gpu"...'
DEBUG: Failed to parse ':' after key 'accelerator'. Input: '"gpu"...'

Use Cases

Troubleshooting parsing errors - See exactly where parsing fails
Understanding execution flow - Track how your configuration is processed
Performance analysis - Identify bottlenecks
Configuration debugging - Verify your OktoScript is parsed correctly

📖 Debug Guide: docs/DEBUG_GUIDE.md

📚 Examples

Basic Training Example

scripts/train.okt:

PROJECT "ChatBot"
ENV {
  accelerator: "gpu"
  precision: "fp16"
  install_missing: true
}
DATASET {
  train: "dataset/train.jsonl"
  validation: "dataset/val.jsonl"
}
MODEL {
  base: "gpt2"
}
TRAIN {
  epochs: 5
  batch_size: 32
  device: "auto"
}
EXPORT {
  format: ["okm"]
  path: "export/"
}

Terminal Output:

$ okto train

🐙 OktoEngine v0.1
📄 Reading: "scripts/train.okt"

📊 Environment Check:
  ✔ Runtime: Python 3.14.0
  ✔ GPU: NVIDIA GeForce RTX 4070
  ✔ RAM: 63GB (40GB available)
  ✔ Platform: windows

📦 Checking dependencies...
  ✔ All dependencies available

🚀 Starting training pipeline...

Epoch 1/5: 100%|████████████| 500/500 [02:15<00:00, 3.70it/s]
  Loss: 2.345 → 1.892
  Learning Rate: 5e-5

✅ Training completed successfully!
📁 Output: runs/ChatBot/

Advanced Example with LoRA

See examples/lora-training.okt for a complete LoRA fine-tuning example.

Complete Project Examples

examples/basic-training/ - Minimal working example
examples/chatbot/ - Conversational AI training
examples/vision-model/ - Computer vision pipeline

📖 More Examples: examples/README.md

💻 System Requirements

Minimum Requirements

OS: Windows 10+, Linux (Ubuntu 20.04+), macOS 11+
RAM: 8GB (16GB recommended)
Storage: 10GB free space
Runtime: Compatible runtime environment

Recommended for Training

GPU: NVIDIA GPU with CUDA support (8GB+ VRAM)
RAM: 32GB+ for large models
Storage: SSD with 50GB+ free space
CPU: Multi-core processor (8+ cores)

Check Your System

okto doctor

Shows detailed system information and recommendations.

📚 Documentation

Complete documentation for OktoEngine:

📖 Getting Started Guide - Your first 5 minutes
🖥️ CLI Reference - Complete command reference
🐛 Debug Guide - Debug mode usage
💡 Examples - Working examples
❓ FAQ - Frequently Asked Questions
📋 Changelog - Version history

Advanced Topics

Training Optimization - Best practices for efficient training
Error Handling - Troubleshooting common issues
Performance Tuning - Maximize training speed
Integration - Using OktoEngine in your workflow

❓ Frequently Asked Questions (FAQ)

Q: What models can I train with OktoEngine?
A: OktoEngine supports any model compatible with modern AI frameworks. From small models (millions of parameters) to large language models (billions of parameters).

Q: Do I need to know Python to use OktoEngine?
A: No! OktoEngine provides a complete CLI interface. You only need to write OktoScript configuration files.

Q: Can I train models without a GPU?
A: Yes, OktoEngine automatically detects available hardware and uses CPU when GPU is not available. Training will be slower but fully functional.

Q: How do I update OktoEngine?
A: Simply run okto upgrade to automatically download and install the latest version.

Q: What formats can I export to?
A: OktoEngine supports multiple export formats: OKM (OktoSeek), ONNX, GGUF, SafeTensors, and more.

Q: Can I resume training from a checkpoint?
A: Yes, OktoEngine automatically saves checkpoints and can resume training from any checkpoint.

📖 Complete FAQ →

🔮 Future Integration

OktoEngine will be integrated into OktoSeek IDE for visual training workflows:

🎯 Visual Pipeline Builder - Drag-and-drop training configuration
📊 Real-time Dashboard - Live training metrics and visualization
🔄 One-click Training - Train models directly from the IDE
📁 Project Management - Organize and manage multiple training projects

🐙 Powered by OktoSeek AI

OktoEngine is developed and maintained by OktoSeek AI.

Official website: https://www.oktoseek.com
OktoScript Language: https://github.com/oktoseek/oktoscript
Twitter: https://x.com/oktoseek
YouTube: https://www.youtube.com/@Oktoseek
Repository: https://github.com/oktoseek/oktoengine

📄 License

This software is proprietary and licensed under the End User License Agreement (EULA). See LICENSE file for details.

Important: OktoEngine is not open source. Binary releases are available for download, but the source code is proprietary.

📧 Contact

For questions, support, or licensing inquiries:

Email: service@oktoseek.com
GitHub Issues: https://github.com/oktoseek/oktoengine/issues
Website: https://www.oktoseek.com

Made with ❤️ by the OktoSeek AI team

OktoEngine

Table of Contents

🚀 Quick Start

🚀 What is OktoEngine?

Built for Scale

Why OktoEngine?

✨ Key Features

🎯 Complete CLI Interface

🔧 Advanced Training Capabilities

📊 Detailed Metrics & Monitoring

🐛 Debug Mode

🔄 Automatic Updates

🏥 System Diagnostics

📦 Dependency Management

📥 Installation

Download Pre-built Binaries

Upgrade Existing Installation

🖥️ CLI Commands

Core Commands

Utility Commands

Global Flags

🎓 Training Capabilities

Supported Model Sizes

Training Methods

Automatic Optimizations

🐛 Debug Mode

Enable Debug Mode

What Debug Mode Shows

Use Cases

📚 Examples

Basic Training Example

Advanced Example with LoRA

Complete Project Examples

💻 System Requirements

Minimum Requirements

Recommended for Training

Check Your System

📚 Documentation

Advanced Topics

❓ Frequently Asked Questions (FAQ)

🔮 Future Integration

🐙 Powered by OktoSeek AI

📄 License

📧 Contact