OktoScript Validation Rules
Complete reference for validation rules and constraints in OktoScript.
File Structure Validation
Required Files
okt.yaml (in project root)
- Must exist
- Must be valid YAML
- Must contain
projectfield
Dataset Files
- All paths specified in DATASET block must exist
- Files must be readable
- Format must match declared format
Model Files (if using local paths)
- Base model path must exist (if local)
- Checkpoint paths must exist (if resuming)
Field Validation
PROJECT Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| PROJECT | string | β Yes | 1-100 chars, no special chars: {}[]:" |
ENV Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| accelerator | enum | β No | Must be: auto, cpu, gpu, tpu |
| min_memory | string | β No | Must be: "4GB", "8GB", "16GB", "32GB", "64GB" (quoted, GB suffix required) |
| precision | enum | β No | Must be: auto, fp16, fp32, bf16 |
| backend | enum | β No | Must be: auto, oktoseek |
| install_missing | boolean | β No | Must be: true or false (lowercase) |
| platform | enum | β No | Must be: windows, linux, mac, any |
| network | enum | β No | Must be: online, offline, required |
ENV Validation Rules:
Memory format validation:
- Must use
GBsuffix (e.g.,"8GB", not"8"or"8 GB") - Only values:
"4GB","8GB","16GB","32GB","64GB"are allowed - Must be quoted string
- Must use
Accelerator and memory compatibility:
- If
accelerator = "gpu"andmin_memory < "8GB"β warning (GPU training typically requires at least 8GB RAM) - If
accelerator = "tpu"βmin_memoryshould be at least"16GB"(recommended)
- If
Network and export compatibility:
- If
network = "offline"β export formats likeonnxorggufare allowed (pre-downloaded models) - If
network = "required"β engine must verify internet connectivity before proceeding
- If
Backend preferences:
- If
backend = "oktoseek"β preferred default for OktoSeek ecosystem - If
backend = "auto"β engine selects best available backend
- If
Auto-installation:
- If
install_missing = trueβ engine must attempt auto-setup of missing dependencies - If
install_missing = falseβ engine must fail with clear error if dependencies are missing
- If
Default values:
- If ENV block is missing, defaults to:
ENV { accelerator: "auto" min_memory: "8GB" backend: "auto" }
- If ENV block is missing, defaults to:
Platform validation:
- If
platform = "windows"β engine must verify Windows OS - If
platform = "linux"β engine must verify Linux OS - If
platform = "mac"β engine must verify macOS - If
platform = "any"β no platform check required
- If
DATASET Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| train | path | β Yes | File/dir must exist, readable |
| validation | path | β No | File/dir must exist if specified |
| test | path | β No | File/dir must exist if specified |
| format | enum | β No | Must be: jsonl, csv, txt, parquet, image+caption, qa, instruction, multimodal |
| type | enum | β No | Must be: classification, generation, qa, chat, vision, regression |
| language | enum | β No | Must be: en, pt, es, fr, multilingual |
| augmentation | array | β No | Each item must be valid augmentation type |
| dataset_percent | number | β No | Must be 1-100 (v1.1+) |
| mix_datasets | array | β No | Array of {path, weight} objects (v1.1+) |
| sampling | enum | β No | Must be: weighted, random (v1.1+) |
| shuffle | boolean | β No | true or false (v1.1+) |
MODEL Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| base | string | β Yes | Valid model identifier or path |
| architecture | enum | β No | Must be: transformer, cnn, rnn, diffusion, vision-transformer, bert, gpt, t5 |
| parameters | string | β No | Format: number + (K|M|B), e.g., "120M" |
| context_window | number | β No | Must be power of 2: 128, 256, 512, 1024, 2048, 4096, 8192 |
| precision | enum | β No | Must be: fp32, fp16, int8, int4 |
| inherit | string | β No | Must reference existing model name |
TRAIN Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| epochs | number | β Yes | > 0 and <= 1000 |
| batch_size | number | β Yes | > 0 and <= 1024 |
| learning_rate | decimal | β No | > 0 and <= 1.0 |
| optimizer | enum | β No | Must be: adam, adamw, sgd, rmsprop, adafactor, lamb |
| scheduler | enum | β No | Must be: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup, step |
| device | enum | β Yes | Must be: cpu, cuda, mps, auto |
| gradient_accumulation | number | β No | >= 1 |
| early_stopping | boolean | β No | true or false |
| checkpoint_steps | number | β No | > 0 |
| checkpoint_path | path | β No | Directory must exist if specified |
| resume_from_checkpoint | path | β No | Checkpoint must exist if specified |
| loss | enum | β No | Must be: cross_entropy, mse, mae, bce, focal, huber, kl_divergence |
| weight_decay | decimal | β No | >= 0 and <= 1.0 |
| gradient_clip | decimal | β No | > 0 |
| warmup_steps | number | β No | >= 0 |
| save_strategy | enum | β No | Must be: steps, epoch, no |
METRICS Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| Built-in metrics | identifier | β No | Must be valid metric name |
| custom | string | β No | Custom metric identifier |
Metric-task compatibility:
accuracy,precision,recall,f1,confusion_matrix: Only for classificationperplexity: Only for language modelsbleu,rouge: Only for generation/translationmae,mse,rmse: Only for regression
EXPORT Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| format | array | β Yes | Each item must be: gguf, onnx, okm, safetensors, tflite |
| path | path | β Yes | Directory must exist or be creatable |
| quantization | enum | β No | Must be: int8, int4, fp16, fp32 |
| optimize_for | enum | β No | Must be: speed, size, accuracy |
Format-specific requirements:
gguf: Requires quantizationtflite: Only for mobile-compatible architectures
FT_LORA Block (v1.1+)
| Field | Type | Required | Constraints |
|---|---|---|---|
| base_model | string | β Yes | Valid model identifier or path |
| train_dataset | path | β Yes | File/dir must exist if specified |
| lora_rank | number | β Yes | > 0 and <= 256 |
| lora_alpha | number | β Yes | > 0 |
| dataset_percent | number | β No | 1-100 |
| mix_datasets | array | β No | Array of {path, weight}, total weights = 100 |
| epochs | number | β No | > 0 and <= 1000 |
| batch_size | number | β No | > 0 and <= 1024 |
| learning_rate | decimal | β No | > 0 and <= 1.0 |
| device | enum | β No | Must be: cpu, cuda, mps, auto |
| target_modules | array | β No | Array of module names |
Validation Rules:
- If
mix_datasetsis specified, it overridestrain_dataset - Total weights in
mix_datasetsmust equal exactly 100 lora_ranktypically: 4, 8, 16, 32lora_alphatypically: 16, 32, 64- Cannot use both
TRAINandFT_LORAin same file
MODEL Block β ADAPTER Sub-block
| Field | Type | Required | Constraints |
|---|---|---|---|
| type | enum | β Yes | Must be: lora, qlora, adapter, peft |
| path | path | β Yes | Must exist and be valid adapter path |
| rank | number | β No | > 0, typically 4, 8, 16, 32, 64 |
| alpha | number | β No | > 0, typically 16, 32, 64 |
Validation Rules:
- If ADAPTER is defined, it is applied after base model is loaded
- Adapter path must exist and be readable
- ADAPTER is optional within MODEL block
INFERENCE Block (Expanded)
| Field | Type | Required | Constraints |
|---|---|---|---|
| mode | enum | β Yes | Must be: chat, intent, translate, classify, custom |
| format | string | β No | Template string with {input}, {context}, {labels} |
| exit_command | string | β No | Command to exit chat mode |
| params | object | β No | Inference parameters object |
| CONTROL | block | β No | Nested CONTROL block for inference |
INFERENCE params:
max_length: > 0 and <= 8192temperature: >= 0.0 and <= 2.0top_p: > 0.0 and <= 1.0beams: >= 1do_sample: boolean (true/false)top_k: >= 0 (0 = disabled)repetition_penalty: > 0.0 and <= 2.0
Validation Rules:
- IF INFERENCE exists THEN MODEL is required
- Format string must contain at least {input} for most modes
- CONTROL within INFERENCE can only use: RETRY, REGENERATE, REPLACE
CONTROL Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| IF | condition | β No | Conditional logic |
| WHEN | condition | β No | Event-based conditional |
| EVERY | number + steps | β No | Periodic actions |
| SET | assignment | β No | Set parameter value |
| STOP | action | β No | Stop operation |
| LOG | metric/string | β No | Log value or message |
| SAVE | target | β No | Save model/checkpoint |
| RETRY | action | β No | Retry inference |
| REGENERATE | action | β No | Regenerate output |
| STOP_TRAINING | action | β No | Stop training |
| DECREASE | parameter + BY + value | β No | Decrease parameter |
| INCREASE | parameter + BY + value | β No | Increase parameter |
| on_step_end | block | β No | Hook executed at step end |
| on_epoch_end | block | β No | Hook executed at epoch end |
| on_memory_low | block | β No | Hook executed when memory low |
| on_nan | block | β No | Hook executed on NaN |
| on_plateau | block | β No | Hook executed on loss plateau |
| validate_every | number | β No | Validate every N steps |
Validation Rules:
- IF CONTROL used THEN must contain at least one of: IF | WHEN | EVERY | on_step_end | on_epoch_end
- Boolean values accepted = true | false
- Allowed CONTROL keywords = IF | WHEN | EVERY | SET | STOP | LOG | SAVE | RETRY | REGENERATE | STOP_TRAINING | DECREASE | INCREASE
- validate_every must receive integer
- DECREASE LR requires numeric value
- Conditions must use valid comparison operators: >, <, >=, <=, ==, !=
MONITOR Block (v1.1+)
| Field | Type | Required | Constraints |
|---|---|---|---|
| metrics | array | β No | Array of metric names |
| notify_if | object | β No | Conditions for notifications |
| log_to | path | β No | Path to log file |
| level | enum | β No | Must be: basic, full |
| log_system | array | β No | Array of system metric names |
| log_speed | array | β No | Array of speed metric names |
| refresh_interval | string | β No | Format: number + "s" or "ms", >= 1s |
| export_to | path | β No | Directory must exist or be creatable |
| dashboard | boolean | β No | true or false |
System Metrics:
gpu_memory_used,gpu_memory_free,gpu_usage,gpu_temperature: Only if CUDA availabletemperature: Only if hardware supports it
Validation Rules:
- GPU metrics only validated if CUDA is available
refresh_intervalmust be >= 1sMONITORextendsMETRICSandLOGGING, does not replace themnotify_ifconditions must use valid comparison operators- Supported metrics: loss, accuracy, val_loss, val_accuracy, gpu_usage, ram_usage, throughput, latency, confidence, hallucination_score, and all custom metrics
GUARD Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| prevent | object | β No | Array of prevention types |
| on_violation | object | β No | Action on violation |
Prevention types:
hallucination,toxicity,bias,data_leak,unsafe_code
Validation Rules:
- GUARD.on_violation can only be STOP or ALERT or REPLACE or LOG
- Prevention types must be valid enum values
BEHAVIOR Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| personality | enum | β No | Must be: professional, friendly, assistant, casual, formal, creative |
| verbosity | enum | β No | Must be: low, medium, high |
| language | enum | β No | Must be: en, pt-BR, es, fr, de, it, ja, zh, multilingual |
| avoid | array | β No | Array of strings to avoid |
| fallback | string | β No | Fallback message |
Validation Rules:
- All enum values must match allowed values
- fallback must be a non-empty string if provided
EXPLORER Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| try | object | β Yes | Parameter combinations to test |
| max_tests | number | β No | Must be <= 50 |
| pick_best_by | string | β No | Must be valid metric name |
Validation Rules:
- EXPLORER.max_tests must be <= 50
- pick_best_by must be a valid metric (e.g., "val_loss", "accuracy")
- try object must contain at least one parameter array
- Parameter arrays must contain valid values for their type
STABILITY Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| stop_if_nan | boolean | β No | true or false |
| stop_if_diverges | boolean | β No | true or false |
| min_improvement | decimal | β No | Must be float >= 0 |
Validation Rules:
- STABILITY.min_improvement must be float
- Boolean values must be true or false (lowercase)
DEPLOY Block
| Field | Type | Required | Constraints |
|---|---|---|---|
| target | enum | β Yes | Must be: local, cloud, edge, api, android, ios, web, desktop |
| endpoint | string | β No | Required if target is "api" |
| requires_auth | boolean | β No | true or false |
| port | number | β No | Required if target is "api", must be 1024-65535 |
| max_concurrent_requests | number | β No | > 0 |
Dependency Validation
Model Inheritance
- If
inheritis specified, parent model must be defined - Circular inheritance is not allowed
- Inheritance chain depth limited to 10 levels
Checkpoint Resume
- If
resume_from_checkpointis specified:- Checkpoint directory must exist
- Checkpoint must contain valid model files
- Checkpoint must be compatible with current model architecture
Export Compatibility
- Model architecture must support export format
- Quantization required for certain formats (gguf)
- Mobile formats (tflite, okm) require compatible architectures
Runtime Validation
Dataset Validation
File existence:
- All dataset paths must exist
- Files must be readable
- Directories must be accessible
Format validation:
- JSONL: Each line must be valid JSON
- CSV: Must have header row, consistent columns
- Image+caption: Directory must contain image files and captions
Size limits:
- Maximum file size: 10GB per file
- Maximum total dataset size: 100GB
- Minimum examples: 10 for training
Dataset Mixing (v1.1+):
- If
mix_datasetsis specified,trainis ignored - All paths in
mix_datasetsmust exist - Total weights must equal exactly 100
dataset_percentapplies to the mixed datasetsampling: "weighted"uses weights,"random"ignores them
Model Validation
Base model:
- If local path: Must exist and be valid model directory
- If HuggingFace: Must be downloadable
- If URL: Must be accessible
Architecture compatibility:
- Model architecture must match dataset type
- Vision models require image datasets
- Language models require text datasets
Training Validation
Hardware requirements:
- GPU required if
device: "cuda"andgpu: true - Sufficient VRAM for batch size
- Sufficient disk space for checkpoints
Memory validation:
- Batch size must fit in available memory
- Effective batch size (batch_size Γ gradient_accumulation) validated
Error Codes
| Code | Error | Solution |
|---|---|---|
| V001 | Dataset file not found | Check file path, use absolute or relative path |
| V002 | Invalid optimizer | Use one of: adam, adamw, sgd, rmsprop, adafactor, lamb |
| V003 | Invalid scheduler | Use one of: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup, step |
| V004 | Model base not found | Verify model path or HuggingFace model name |
| V005 | Checkpoint not found | Check checkpoint path or remove resume_from_checkpoint |
| V006 | Insufficient memory | Reduce batch_size or enable gradient_accumulation |
| V007 | Invalid metric for task | Use appropriate metrics for task type |
| V008 | Invalid export format | Check format compatibility with model architecture |
| V009 | Circular inheritance | Remove circular model inheritance chain |
| V010 | Invalid field value | Check field constraints and allowed values |
| V011 | Dataset mixing weights invalid | Total weights in mix_datasets must equal 100 |
| V012 | FT_LORA and TRAIN conflict | Cannot use both TRAIN and FT_LORA in same file |
| V013 | Version declaration invalid | Version must be "1.0" or "1.1" |
| V014 | GPU metrics unavailable | GPU metrics requested but CUDA not available |
| V015 | CONTROL block empty | CONTROL must contain at least one directive |
| V016 | Invalid CONTROL keyword | Use only allowed CONTROL keywords |
| V017 | EXPLORER max_tests too high | max_tests must be <= 50 |
| V018 | Invalid boolean value | Boolean must be true or false (lowercase) |
| V019 | INFERENCE without MODEL | INFERENCE block requires MODEL block |
| V020 | Invalid adapter type | ADAPTER type must be: lora, qlora, adapter, peft |
| V021 | GUARD violation action invalid | on_violation must be: STOP, ALERT, REPLACE, or LOG |
Validation Commands
CLI Validation
# Validate syntax and structure
okto validate train.okt
# Validate with detailed output
okto validate train.okt --verbose
# Validate dataset only
okto validate train.okt --dataset-only
# Validate model only
okto validate train.okt --model-only
IDE Validation
OktoSeek IDE automatically validates:
- Real-time syntax checking
- Field completion suggestions
- Error highlighting
- Warning messages
Best Practices
Always validate before training
okto validate train.oktCheck dataset format
- Use
okto validate --dataset-onlyto verify dataset structure
- Use
Verify model compatibility
- Ensure model architecture matches dataset type
- Check export format compatibility
Test with small dataset first
- Use subset of data for initial validation
- Verify pipeline works before full training
Monitor resource usage
- Check available GPU memory
- Verify disk space for checkpoints
- Monitor training progress
For more information, see: