# OktoScript Grammar Specification v1.2 Complete formal grammar for the OktoScript language, developed by **OktoSeek AI**. > **Version Compatibility:** This specification covers OktoScript v1.2, which is 100% backward compatible with v1.0 and v1.1. Files without version declaration default to v1.0. --- ## Table of Contents 1. [Grammar Overview](#grammar-overview) 2. [Basic Metadata Blocks](#basic-metadata-blocks) 3. [ENV Block](#env-block) 4. [DATASET Block](#dataset-block) 5. [MODEL Block](#model-block) 6. [TRAIN Block](#train-block) 7. [METRICS Block](#metrics-block) 8. [VALIDATION Block](#validation-block) 9. [INFERENCE Block](#inference-block) 10. [CONTROL Block — Decision Engine](#control-block--decision-engine) 11. [MONITOR Block — Full Metrics Support](#monitor-block--full-metrics-support) 12. [GUARD Block — Safety / Ethics / Protection](#guard-block--safety--ethics--protection) 13. [BEHAVIOR Block — Model Personality](#behavior-block--model-personality) 14. [EXPLORER Block — Parameter Search](#explorer-block--parameter-search) 15. [STABILITY Block — Training Safety](#stability-block--training-safety) 16. [Boolean Support](#boolean-support) 17. [EXPORT Block](#export-block) 18. [DEPLOY Block](#deploy-block) 19. [SECURITY Block](#security-block) 20. [LOGGING Block](#logging-block) 21. [Model Inheritance](#model-inheritance) 22. [Extension Points & Hooks](#extension-points--hooks) 23. [Validation Rules](#validation-rules) 24. [Troubleshooting](#troubleshooting) 25. [Terminal / Basic Types](#terminal--basic-types) 26. [Full Script Example](#full-script-example) --- ## Grammar Overview ```ebnf ::= [] [] [] [] [] [] [ | ] [] [] [] [] [] [] [] [] [] [] [] [] [] [] ``` **Note:** `TRAIN` and `FT_LORA` are mutually exclusive. Use `FT_LORA` for LoRA-based fine-tuning, or `TRAIN` for full fine-tuning. **Required blocks:** PROJECT, DATASET, MODEL, TRAIN **Optional blocks:** ENV, DESCRIPTION, VERSION, TAGS, AUTHOR, and all others --- ## Version Declaration (v1.1+) ```ebnf ::= "# okto_version:" ``` **Example:** ```okt # okto_version: "1.2" PROJECT "MyModel" ... ``` **Rules:** - Optional at the top of the file - If missing, defaults to v1.0 - Must be the first line (comments allowed before) - Format: `# okto_version: "1.2"`, `# okto_version: "1.1"`, or `# okto_version: "1.0"` --- ## Basic Metadata Blocks ### PROJECT Block ```ebnf ::= "PROJECT" ``` **Constraints:** - Project name must be a valid string (1-100 characters) - Cannot contain special characters: `{`, `}`, `[`, `]`, `:`, `"` **Example:** ```okt PROJECT "PizzaBot" ``` ### DESCRIPTION Block ```ebnf ::= "DESCRIPTION" ``` **Constraints:** - Maximum 500 characters - Can contain any UTF-8 characters **Example:** ```okt DESCRIPTION "AI specialized in pizza restaurant service" ``` ### VERSION Block ```ebnf ::= "VERSION" ``` **Constraints:** - Must follow semantic versioning (e.g., "1.0.0", "2.1.3") - Format: `major.minor.patch` or `major.minor` **Example:** ```okt VERSION "1.0" VERSION "2.1.3" ``` ### TAGS Block ```ebnf ::= "TAGS" "[" "]" ``` **Constraints:** - Maximum 10 tags - Each tag: 1-50 characters - Tags are case-insensitive **Example:** ```okt TAGS ["food", "restaurant", "chatbot"] ``` ### AUTHOR Block ```ebnf ::= "AUTHOR" ``` **Example:** ```okt AUTHOR "OktoSeek" ``` --- ## ENV Block The `ENV` block defines environment requirements, hardware expectations, and execution preferences for OktoEngine. It is fully abstract and does not expose underlying implementation details (Python, PyTorch, TensorFlow, etc.). OktoEngine uses this block to configure the execution environment before running any training or inference operations. **Purpose:** - Define minimum environment requirements for a project - Specify hardware preferences (CPU, GPU, TPU) - Set memory and precision requirements - Configure execution backend preferences - Enable automatic dependency installation - Specify platform and network requirements **Note:** ENV is not a dependency list. It is a high-level execution requirement description that allows OktoEngine to decide how to configure the real execution environment. ### ENV Block Syntax ```ebnf ::= "ENV" "{" [] [] [] [] [] [] [] "}" ::= "accelerator" ":" ("auto" | "cpu" | "gpu" | "tpu") ::= "min_memory" ":" ::= "4GB" | "8GB" | "16GB" | "32GB" | "64GB" ::= "precision" ":" ("auto" | "fp16" | "fp32" | "bf16") ::= "backend" ":" ("auto" | "oktoseek") ::= "install_missing" ":" ("true" | "false") ::= "platform" ":" ("windows" | "linux" | "mac" | "any") ::= "network" ":" ("online" | "offline" | "required") ``` ### ENV Block Fields | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `accelerator` | enum | ❌ No | `"auto"` | Preferred compute unit: `"auto"`, `"cpu"`, `"gpu"`, `"tpu"` | | `min_memory` | string | ❌ No | `"8GB"` | Required minimum RAM: `"4GB"`, `"8GB"`, `"16GB"`, `"32GB"`, `"64GB"` | | `precision` | enum | ❌ No | `"auto"` | Numerical precision: `"auto"`, `"fp16"`, `"fp32"`, `"bf16"` | | `backend` | enum | ❌ No | `"auto"` | Execution engine: `"auto"`, `"oktoseek"` | | `install_missing` | boolean | ❌ No | `false` | If `true`, engine attempts automatic dependency installation | | `platform` | enum | ❌ No | `"any"` | Target OS: `"windows"`, `"linux"`, `"mac"`, `"any"` | | `network` | enum | ❌ No | `"online"` | Internet requirement: `"online"`, `"offline"`, `"required"` | ### ENV Block Examples **Minimal ENV (uses defaults):** ```okt ENV { accelerator: "gpu" min_memory: "8GB" } ``` **Complete ENV configuration:** ```okt ENV { accelerator: "gpu" min_memory: "16GB" precision: "fp16" backend: "oktoseek" install_missing: true platform: "any" network: "online" } ``` **CPU-only training:** ```okt ENV { accelerator: "cpu" min_memory: "8GB" precision: "fp32" install_missing: true } ``` **Offline execution:** ```okt ENV { accelerator: "gpu" min_memory: "16GB" network: "offline" install_missing: false } ``` ### ENV Block Constraints 1. **Memory format:** Must use `GB` suffix (e.g., `"8GB"`, not `"8"` or `"8 GB"`) 2. **Enum values:** Only predefined values are allowed 3. **Boolean values:** Must be `true` or `false` (lowercase) 4. **String values:** Must be quoted ### ENV Block Validation Rules 1. If `accelerator = "gpu"` and `min_memory < "8GB"` → **warning** (GPU training typically requires at least 8GB) 2. If `network = "offline"` → export formats like `onnx` or `gguf` are allowed (pre-downloaded models) 3. If `backend = "oktoseek"` → preferred default for OktoSeek ecosystem 4. If `install_missing = true` → engine must attempt auto-setup of missing dependencies 5. If no ENV block exists → defaults to: ```okt ENV { accelerator: "auto" min_memory: "8GB" backend: "auto" } ``` ### Engine Behavior When OktoEngine encounters an ENV block, it must: 1. **Read ENV block first:** Before any other stage (dataset loading, model initialization, etc.) 2. **Check system compatibility:** Verify RAM, GPU availability, platform, etc. 3. **Return detailed errors:** If system is incompatible, return specific error messages 4. **Auto-install dependencies:** If `install_missing: true`, attempt automatic setup 5. **Generate environment report:** Log analysis to `runs/{model}/env_report.json` **Example env_report.json:** ```json { "gpu_found": true, "gpu_name": "NVIDIA RTX 3090", "ram": "32GB", "ram_available": "28GB", "platform": "linux", "status": "compatible", "auto_install": true, "warnings": [] } ``` --- ## DATASET Block ```ebnf ::= "DATASET" "{" [ | ] [] [] [] [] [] [] [] [] [] [] [] "}" ::= "train" ":" ::= "validation" ":" ::= "test" ":" ::= "format" ":" ("jsonl" | "csv" | "txt" | "parquet" | "image+caption" | "qa" | "instruction" | "multimodal") ::= "type" ":" ("classification" | "generation" | "qa" | "chat" | "vision" | "regression") ::= "language" ":" ("en" | "pt" | "es" | "fr" | "multilingual") ::= "augmentation" ":" "[" "]" ::= "dataset_percent" ":" ::= "mix_datasets" ":" "[" "]" ::= { "," } ::= "{" "path" ":" "," "weight" ":" "}" ::= "sampling" ":" ("weighted" | "random") ::= "shuffle" ":" ("true" | "false") ::= "input_field" ":" ::= ("output_field" | "target_field") ":" ::= "context_fields" ":" "[" "]" ``` **Allowed augmentation values:** - `"flip"` - Horizontal/vertical flip - `"rotate"` - Random rotation - `"brightness"` - Brightness adjustment - `"contrast"` - Contrast adjustment - `"noise"` - Add noise - `"crop"` - Random cropping - `"translate"` - Translation **Validation Rules:** - `train` path must exist and be readable - File format must match declared `format` - For `image+caption`, path must be a directory - For JSONL/CSV, path must be a file **Example (v1.0):** ```okt DATASET { train: "dataset/train.jsonl" validation: "dataset/val.jsonl" test: "dataset/test.jsonl" format: "jsonl" type: "chat" language: "en" augmentation: ["flip", "rotate", "brightness"] } ``` **Example (v1.1 - Dataset Mixing):** ```okt DATASET { mix_datasets: [ { path: "dataset/base.jsonl", weight: 70 }, { path: "dataset/extra.jsonl", weight: 30 } ] dataset_percent: 50 sampling: "weighted" shuffle: true format: "jsonl" type: "chat" } ``` **Example (v1.2 - Custom Field Names):** ```okt DATASET { train: "dataset/train.jsonl" validation: "dataset/val.jsonl" format: "jsonl" type: "chat" input_field: "input" output_field: "target" } ``` **Example (v1.2 - With Context Fields):** ```okt DATASET { train: "dataset/pizzaria.jsonl" validation: "dataset/val.jsonl" format: "jsonl" type: "chat" input_field: "input" output_field: "target" context_fields: ["menu", "drinks", "promotions"] } ``` **Dataset JSONL with context:** ```jsonl {"input": "What pizzas do you have?", "target": "We have Margherita, Pepperoni, and Four Cheese.", "menu": "Margherita: $34, Pepperoni: $39, Four Cheese: $45", "drinks": "Coke, Sprite, Water"} {"input": "Do you have drinks?", "target": "Yes, we have Coke, Sprite, and Water.", "menu": "Margherita: $34, Pepperoni: $39", "drinks": "Coke, Sprite, Water"} ``` The context fields will be automatically included in the prompt: - Input: `menu: Margherita: $34, Pepperoni: $39 | drinks: Coke, Sprite, Water | What pizzas do you have?` - Target: `We have Margherita, Pepperoni, and Four Cheese.` **Field Name Resolution (v1.2+):** - If `input_field` and `output_field` are specified, use those exact field names - If not specified, defaults are tried in order: 1. `"input"` + `"output"` (standard format) 2. `"input"` + `"target"` (common alternative) 3. `"text"` (single field, used for both input and output) 4. First string field in dataset (fallback) - `context_fields` are optional and will be included in the prompt if present - This ensures backward compatibility while allowing full customization **Dataset Mixing Rules:** - If `mix_datasets` is specified, it overrides `train` - Total weights in `mix_datasets` must equal 100 - `dataset_percent` limits total dataset usage (1-100) - `sampling: "weighted"` uses weights for sampling, `"random"` ignores weights - `shuffle` controls whether datasets are shuffled before mixing --- ## MODEL Block ```ebnf ::= "MODEL" "{" [] [] [] [] [] [] [] [] "}" ::= "name" ":" ::= "base" ":" ::= "architecture" ":" ("transformer" | "cnn" | "rnn" | "diffusion" | "vision-transformer" | "bert" | "gpt" | "t5") ::= "parameters" ":" ("M" | "B" | "K") ::= "context_window" ":" ::= "precision" ":" ("fp32" | "fp16" | "int8" | "int4") ::= "inherit" ":" ::= "device" ":" ("cuda" | "cpu" | "mps" | "auto") ::= "ADAPTER" "{" [] [] "}" ::= "type" ":" ("lora" | "qlora" | "adapter" | "peft") ::= "path" ":" ::= "rank" ":" ::= "alpha" ":" ``` **Model Inheritance:** - `inherit` allows reusing configuration from another model - Inherited model must be defined in the same project or imported - Child model can override any parent field - Example: `inherit: "base-transformer"` loads base config, then applies current block **Allowed base model formats:** - HuggingFace format: `"username/model-name"` - OktoSeek format: `"oktoseek/model-name"` - Local path: `"./models/my-model"` - URL: `"https://example.com/model"` **Parameter constraints:** - `parameters`: Must be positive number with suffix (K, M, B) - `context_window`: Must be power of 2 (128, 256, 512, 1024, 2048, 4096, 8192) - `precision`: Must match device capabilities **Example:** ```okt MODEL { name: "oktogpt" base: "oktoseek/pizza-small" architecture: "transformer" parameters: 120M context_window: 2048 precision: "fp16" device: "cuda" } ``` **Example with ADAPTER (LoRA/PEFT support):** ```okt MODEL { name: "oktogpt" base: "google/flan-t5-base" device: "cuda" ADAPTER { type: "lora" path: "D:/model_trainee/phase1_sharegpt/ep2" rank: 16 alpha: 32 } } ``` **Example with inheritance:** ```okt # Base model definition MODEL "base-transformer" { architecture: "transformer" context_window: 2048 precision: "fp16" } # Child model inheriting from base MODEL { inherit: "base-transformer" base: "oktoseek/custom-model" parameters: 250M } ``` **ADAPTER Block:** The `ADAPTER` sub-block enables parameter-efficient fine-tuning methods such as LoRA, QLoRA, PEFT, or other adapters. If an `ADAPTER` is defined, it is applied after the base model is loaded by the engine. **Adapter constraints:** - `type`: Must be one of `"lora"`, `"qlora"`, `"adapter"`, or `"peft"` - `path`: Must point to a valid adapter directory or file - `rank`: Optional, typically 4, 8, 16, 32, or 64 (for LoRA) - `alpha`: Optional, typically 16, 32, or 64 (for LoRA scaling) --- ## FT_LORA Block (v1.1+) Fine-tuning using LoRA (Low-Rank Adaptation) adapters. This block is an alternative to `TRAIN` for efficient fine-tuning. ```ebnf ::= "FT_LORA" "{" [] [] [] [] [] [] [] "}" ::= "base_model" ":" ::= "train_dataset" ":" ::= "lora_rank" ":" ::= "lora_alpha" ":" ::= "dataset_percent" ":" ::= "mix_datasets" ":" "[" "]" ::= "epochs" ":" ::= "batch_size" ":" ::= "learning_rate" ":" ::= "device" ":" ("cpu" | "cuda" | "mps" | "auto") ::= "target_modules" ":" "[" "]" ``` **Constraints:** - `lora_rank`: Must be > 0 and <= 256 (typical: 4, 8, 16, 32) - `lora_alpha`: Must be > 0 (typical: 16, 32, 64) - `dataset_percent`: Must be 1-100 - If `mix_datasets` is specified, it overrides `train_dataset` - Total weights in `mix_datasets` must equal 100 **Example:** ```okt FT_LORA { base_model: "oktoseek/base-mini" train_dataset: "dataset/main.jsonl" lora_rank: 4 lora_alpha: 16 dataset_percent: 50 mix_datasets: [ { path: "dataset/base.jsonl", weight: 70 }, { path: "dataset/extra.jsonl", weight: 30 } ] epochs: 3 batch_size: 16 learning_rate: 0.00003 device: "cuda" target_modules: ["q_proj", "v_proj"] } ``` **When to use FT_LORA vs TRAIN:** - **FT_LORA**: Efficient fine-tuning, smaller memory footprint, faster training, good for domain adaptation - **TRAIN**: Full fine-tuning, maximum flexibility, better for large architectural changes --- ## TRAIN Block ```ebnf ::= "TRAIN" "{" [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] "}" ::= "epochs" ":" ::= "batch_size" ":" ::= "learning_rate" ":" ::= "optimizer" ":" ( "adam" | "adamw" | "sgd" | "rmsprop" | "adafactor" | "lamb" ) ::= "scheduler" ":" ("linear" | "cosine" | "cosine_with_restarts" | "polynomial" | "constant" | "constant_with_warmup" | "step") ::= "device" ":" ("cpu" | "cuda" | "mps" | "auto") ::= "gradient_accumulation" ":" ::= "early_stopping" ":" ("true" | "false") ::= "checkpoint_steps" ":" ::= "checkpoint_path" ":" ::= "resume_from_checkpoint" ":" ::= "loss" ":" ("cross_entropy" | "mse" | "mae" | "bce" | "focal" | "huber" | "kl_divergence") ::= "weight_decay" ":" ::= "gradient_clip" ":" ::= "warmup_steps" ":" ::= "save_strategy" ":" ("steps" | "epoch" | "no") ::= "logging_steps" ":" ::= "save_steps" ":" ``` **Allowed values and constraints:** **Optimizers:** - `adam` - Adam optimizer (default) - `adamw` - Adam with weight decay - `sgd` - Stochastic Gradient Descent - `rmsprop` - RMSprop optimizer - `adafactor` - Adafactor (memory efficient) - `lamb` - LAMB optimizer (for large batches) **Schedulers:** - `linear` - Linear decay - `cosine` - Cosine annealing - `cosine_with_restarts` - Cosine with restarts - `polynomial` - Polynomial decay - `constant` - Constant learning rate - `constant_with_warmup` - Constant with warmup - `step` - Step decay **Loss functions:** - `cross_entropy` - Cross-entropy loss (classification) - `mse` - Mean Squared Error (regression) - `mae` - Mean Absolute Error (regression) - `bce` - Binary Cross-Entropy - `focal` - Focal loss (imbalanced data) - `huber` - Huber loss (robust regression) - `kl_divergence` - KL divergence **Constraints:** - `epochs`: Must be > 0 and <= 1000 - `batch_size`: Must be > 0 and <= 1024 - `learning_rate`: Must be > 0 and <= 1.0 - `gradient_accumulation`: Must be >= 1 - `checkpoint_steps`: Must be > 0 - `weight_decay`: Must be >= 0 and <= 1.0 - `gradient_clip`: Must be > 0 **Example:** ```okt TRAIN { epochs: 10 batch_size: 32 learning_rate: 0.00025 optimizer: "adamw" scheduler: "cosine" loss: "cross_entropy" device: "cuda" gradient_accumulation: 2 early_stopping: true checkpoint_steps: 100 checkpoint_path: "./checkpoints" weight_decay: 0.01 gradient_clip: 1.0 warmup_steps: 500 save_strategy: "steps" logging_steps: 5 # Log every 5 steps (default: 10) save_steps: 500 # Save checkpoint every 500 steps (default: 500) } ``` **Example with checkpoint resume:** ```okt TRAIN { epochs: 20 batch_size: 16 learning_rate: 0.0001 optimizer: "adamw" device: "cuda" resume_from_checkpoint: "./checkpoints/checkpoint-500" checkpoint_steps: 100 } ``` --- ## METRICS Block ```ebnf ::= "METRICS" "{" { | } "}" ::= "accuracy" | "loss" | "perplexity" | "f1" | "f1_macro" | "f1_micro" | "f1_weighted" | "bleu" | "rouge" | "rouge_l" | "rouge_1" | "rouge_2" | "mae" | "mse" | "rmse" | "cosine_similarity" | "token_efficiency" | "response_coherence" | "hallucination_score" | "precision" | "recall" | "confusion_matrix" ::= "custom" ``` **Metric-specific constraints:** - `accuracy`: Only for classification tasks - `perplexity`: Only for language models - `bleu`, `rouge`: Only for generation/translation tasks - `mae`, `mse`, `rmse`: Only for regression tasks - `confusion_matrix`: Only for classification, generates full matrix **Example:** ```okt METRICS { accuracy loss perplexity f1 f1_macro rouge_l cosine_similarity custom "toxicity_score" custom "context_alignment" } ``` --- ## VALIDATION Block ```ebnf ::= "VALIDATE" "{" [ "on_train" ":" ("true" | "false") ] [ "on_validation" ":" ("true" | "false") ] [ "frequency" ":" ] [ "save_best_model" ":" ("true" | "false") ] [ "metric_to_monitor" ":" ] "}" ``` **Constraints:** - `frequency`: Must be > 0 (validation every N steps) - `metric_to_monitor`: Must be a metric defined in METRICS block - `save_best_model`: If true, saves model when monitored metric improves **Example:** ```okt VALIDATE { on_train: false on_validation: true frequency: 1 save_best_model: true metric_to_monitor: "loss" } ``` --- ## INFERENCE Block The `INFERENCE` block defines how the model behaves during inference, prediction, or interactive chat. ```ebnf ::= "INFERENCE" "{" [] [] [] [] "}" ::= "mode" ":" ("chat" | "intent" | "translate" | "classify" | "custom") ::= "format" ":" ::= "exit_command" ":" ::= "params" "{" [] [] [] [] [] [] [] "}" ::= "max_length" ":" ::= "temperature" ":" ::= "top_p" ":" ::= "beams" ":" ::= "do_sample" ":" ::= "top_k" ":" ::= "repetition_penalty" ":" ::= "CONTROL" "{" { | | | | | | | | } "}" ``` **Supported format patterns:** The `format` field supports template strings with variables: | Use case | Example | |----------|---------| | Chat | `"User: {input}\nAssistant:"` | | Free | `"{input}"` | | Translation | `"translate English to Portuguese: {input}"` | | Intent | `"{input}"` | | QA/RAG | `"Context: {context}\nQuestion: {input}\nAnswer:"` | | LLaMA style | `"<|user|>\n{input}\n<|assistant|>\n"` | **Supported variables:** - `{input}` → user input - `{context}` → optional context (for RAG/QA) - `{labels}` → class list for classification **Constraints:** - `mode`: Defines the inference behavior type - `format`: Template string with variable placeholders - `max_length`: Must be > 0 and <= 8192 - `temperature`: Must be >= 0.0 and <= 2.0 - `top_p`: Must be > 0.0 and <= 1.0 - `top_k`: Must be >= 0 (0 = disabled) - `beams`: Must be >= 1 (for beam search) - `do_sample`: Boolean (true/false) - `repetition_penalty`: Must be > 0.0 and <= 2.0 **Example (Chat mode):** ```okt INFERENCE { mode: "chat" format: "User: {input}\nAssistant:" exit_command: "/exit" params { max_length: 120 temperature: 0.7 top_p: 0.9 beams: 2 do_sample: true } CONTROL { IF confidence < 0.3 { RETRY } IF repetition > 3 { REGENERATE } IF hallucination_score > 0.5 { REPLACE WITH "Desculpe, não tenho certeza." } } } ``` **Example (Translation mode):** ```okt INFERENCE { mode: "translate" format: "translate English to Portuguese: {input}" params { max_length: 200 temperature: 0.5 top_p: 0.95 } } ``` **Example (Classification mode):** ```okt INFERENCE { mode: "classify" format: "{input}" params { temperature: 0.1 top_k: 5 } } ``` --- ## CONTROL Block — Decision Engine The `CONTROL` block enables logical, conditional, event-based, and metric-based decisions during training and inference. It introduces a cognitive-level abstraction that allows AI models to take decisions, self-adjust, and self-regulate in a declarative and clean way. ```ebnf ::= "CONTROL" "{" { | | | | | | | | | | | | } "}" ::= | | | | | ::= "on_step_end" "{" { | | | | | } "}" ::= "on_epoch_end" "{" { | | | | | | } "}" ::= "on_memory_low" "{" { | | | } "}" ::= "on_nan" "{" { | | } "}" ::= "on_plateau" "{" { | | | | } "}" ::= "validate_every" ":" ::= "IF" "{" { | | | | | | | | | | | } "}" ::= "WHEN" "{" { | | | | } "}" ::= "EVERY" ("steps" | "epochs") "{" { | | | | } "}" ::= "SET" "=" ::= "STOP" ::= "LOG" ( | ) ::= "SAVE" ( "model" | "checkpoint" | ) ::= "RETRY" ::= "REGENERATE" ::= "STOP_TRAINING" ::= "DECREASE" "BY" ::= "INCREASE" "BY" ::= ::= ">" | "<" | ">=" | "<=" | "==" | "!=" ::= | | | | ::= "loss" | "val_loss" | "accuracy" | "val_accuracy" | "gpu_memory" | "ram_usage" | "confidence" | "hallucination_score" | ::= "LR" | "learning_rate" | "batch_size" | "temperature" | ``` **Supported events/hooks:** | Event | Description | |-------|-------------| | `on_step_end` | Executed at the end of each training step | | `on_epoch_end` | Executed at the end of each epoch | | `validate_every` | Execute validation every X steps | | `on_memory_low` | Triggered when GPU/RAM is low | | `on_nan` | Triggered when NaN values are detected | | `on_plateau` | Triggered when loss is stagnant (plateau) | **Supported directives:** - `IF` - Conditional logic based on metrics - `WHEN` - Event-based conditional logic - `EVERY` - Periodic actions (every N steps) - `SET` - Set parameter values dynamically - `STOP` - Stop current operation - `LOG` - Log metrics or messages - `SAVE` - Save model or checkpoint - `RETRY` - Retry inference generation - `REGENERATE` - Regenerate output - `STOP_TRAINING` - Stop training process - `DECREASE` - Decrease parameter value - `INCREASE` - Increase parameter value **Nested Blocks Support:** The CONTROL block in OktoScript supports nested logic, event-driven triggers, and conditional reasoning. You can nest IF / WHEN / EVERY statements inside lifecycle hooks like `on_step_end` and `on_epoch_end`, allowing dynamic, real-time decision making during training or inference. **Example (Basic):** ```okt CONTROL { on_step_end { LOG loss } on_epoch_end { SAVE model LOG "Epoch completed" } validate_every: 200 IF loss > 2.0 { SET LR = 0.0001 LOG "High loss detected" } IF val_loss > 2.5 { STOP_TRAINING } WHEN gpu_memory < 16GB { SET batch_size = 4 } EVERY 500 steps { SAVE checkpoint } IF accuracy < 0.4 { DECREASE LR BY 0.5 } } ``` **Example (Nested Blocks in Events):** ```okt CONTROL { on_epoch_end { IF loss > 2.0 { SET LR = 0.0001 LOG "High loss detected" } IF val_loss > 2.5 { STOP_TRAINING } IF accuracy > 0.9 { SAVE "best_model" LOG "High accuracy reached" } } } ``` **Example (Advanced Nested Logic):** ```okt CONTROL { on_epoch_end { EVERY 2 epochs { SAVE "checkpoint_epoch_{epoch}" } IF loss > 2.0 { SET LR = 0.00005 LOG "Loss still high after epoch" WHEN gpu_usage > 90% { SET batch_size = 2 LOG "Reducing batch size due to GPU pressure" } IF val_loss > 3.0 { STOP_TRAINING } } } } ``` **Example (Context-Based Control):** ```okt CONTROL { IF epoch == 1 { LOG "Warmup stage" } IF epoch > 5 AND accuracy < 0.6 { SET LR = 0.00001 LOG "Model is stagnated" } IF epoch > 10 AND loss > 1.8 { STOP_TRAINING } } ``` **Example (Inference CONTROL):** ```okt INFERENCE { mode: "chat" format: "User: {input}\nAssistant:" CONTROL { IF confidence < 0.3 { RETRY } IF repetition > 3 { REGENERATE } IF toxic == true { REPLACE WITH "Not allowed" } } } ``` **Example (Intent Classification CONTROL):** ```okt INFERENCE { mode: "intent-classification" labels: ["greeting", "order", "complaint", "bye"] } CONTROL { IF label == "complaint" { RETURN "I'm sorry to hear that. How can I help?" } IF confidence < 0.4 { RETURN "Could you please repeat?" } } ``` **Note:** OktoScript enables true declarative AI governance. CONTROL blocks can contain nested conditions and nested event triggers, making it a unique declarative decision-making language in the market. **Philosophy:** OktoScript keeps the surface clean and simple, while the engine behind it performs complex cognitive decision-making. - **CONTROL** defines logic - **MONITOR** defines awareness - **GUARD** defines safety - **BEHAVIOR** defines personality - **EXPLORER** defines optimization - **STABILITY** defines reliability Simple to read. Powerful to execute. --- ## EXPORT Block ```ebnf ::= "EXPORT" "{" "format" ":" "[" "]" "path" ":" [ "quantization" ":" ("int8" | "int4" | "fp16" | "fp32") ] [ "optimize_for" ":" ("speed" | "size" | "accuracy") ] "}" ::= "gguf" | "onnx" | "okm" | "safetensors" | "tflite" ``` **Format-specific constraints:** - `gguf`: Requires quantization (int8, int4, or fp16) - `onnx`: Best for production deployment - `okm`: OktoSeek optimized format (requires OktoSeek SDK) - `safetensors`: Standard PyTorch format - `tflite`: For mobile deployment (Android/iOS) **Example:** ```okt EXPORT { format: ["gguf", "onnx", "okm", "safetensors"] path: "export/" quantization: "int8" optimize_for: "speed" } ``` --- ## DEPLOY Block The `DEPLOY` block defines deployment configuration for the model. The engine will create the server, generate routes, export in the required format, and configure limits and authentication. ```ebnf ::= "DEPLOY" "{" "target" ":" ("local" | "cloud" | "edge" | "api" | "android" | "ios" | "web" | "desktop") [ "endpoint" ":" ] [ "host" ":" ] [ "requires_auth" ":" ("true" | "false") ] [ "port" ":" ] [ "max_concurrent_requests" ":" ] [ "protocol" ":" ("http" | "https" | "grpc" | "ws") ] [ "format" ":" ("onnx" | "tflite" | "gguf" | "pt" | "okm") ] "}" ``` **Target-specific requirements:** - `api`: Requires `endpoint`, `host`, and `port` - `android`, `ios`: Requires `.okm` or `.tflite` format - `web`: Requires ONNX format - `edge`: Requires quantized model (int8 or int4) **Protocol options:** - `http` - HTTP REST API - `https` - HTTPS REST API - `grpc` - gRPC protocol - `ws` - WebSocket protocol **Format options:** - `onnx` - ONNX format (production-ready) - `tflite` - TensorFlow Lite (mobile) - `gguf` - GGUF format (local inference) - `pt` - PyTorch format - `okm` - OktoModel format (OktoSeek optimized) **Example (API Deployment):** ```okt DEPLOY { target: "api" host: "0.0.0.0" endpoint: "/pizzabot" requires_auth: true port: 9000 max_concurrent_requests: 100 protocol: "http" format: "onnx" } ``` **Example (Mobile Deployment):** ```okt DEPLOY { target: "android" format: "tflite" } ``` --- ## SECURITY Block The `SECURITY` block defines security measures for input validation, output validation, rate limiting, and encryption. ```ebnf ::= "SECURITY" "{" [ ] [ ] [ ] [ ] "}" ::= "input_validation" "{" [ "max_length" ":" ] [ "disallow_patterns" ":" "[" "]" ] "}" ::= "output_validation" "{" [ "prevent_data_leak" ":" ("true" | "false") ] [ "mask_personal_info" ":" ("true" | "false") ] "}" ::= "rate_limit" "{" [ "max_requests_per_minute" ":"