OktoSeek commited on Nov 27, 2025

Commit

5df2c77

verified ·

1 Parent(s): d4d0d07

Update

Browse files

Files changed (23) hide show

CHANGELOG.md +1 -1
CONTRIBUTING.md +3 -1
MANIFEST.md +173 -0
README.md +50 -22
docs/CONTEXT_FIELDS.md +204 -0
docs/CUSTOM_FIELDS.md +262 -0
docs/PERFORMANCE_TIPS.md +222 -0
docs/grammar.md +76 -0
examples/MODEL_NAMES.md +71 -0
examples/QUICK_FIX.md +82 -0
examples/README.md +17 -0
examples/TESTING_GUIDE.md +227 -0
examples/TROUBLESHOOTING.md +114 -0
examples/pizzabot/okt.yaml +1 -1
examples/test-flan-t5-complete.okt +112 -0
examples/test-flan-t5-inference.okt +94 -0
examples/test-pizzaria-context.okt +46 -0
examples/test-t5-basic-clean.okt +38 -0
examples/test-t5-basic.okt +39 -0
examples/test-t5-control.okt +77 -0
examples/test-t5-custom-fields.okt +42 -0
examples/test-t5-explorer.okt +54 -0
examples/test-t5-monitor.okt +58 -0

CHANGELOG.md CHANGED Viewed

@@ -26,7 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Complete PizzaBot example project
 - JSON Schema for dataset validation
 - Professional README with documentation
-- Apache 2.0 License
 ### Documentation
 - Grammar specification in EBNF format

 - Complete PizzaBot example project
 - JSON Schema for dataset validation
 - Professional README with documentation
+- OktoScript License 1.0 (Proprietary-Free Use License)
 ### Documentation
 - Grammar specification in EBNF format

CONTRIBUTING.md CHANGED Viewed

@@ -2,7 +2,9 @@
 Thank you for your interest in contributing to OktoScript! 🐙
-**OktoScript** is a domain-specific programming language developed by **OktoSeek AI**. We welcome contributions from the community!
 ## How to Contribute

 Thank you for your interest in contributing to OktoScript! 🐙
+**OktoScript** is a proprietary domain-specific programming language developed and owned by **OktoSeek AI**. We welcome contributions from the community for documentation, examples, bug reports, and feature suggestions!
+**Important:** OktoScript is a proprietary language. While we welcome contributions, you may not create derivative works, forks, or competing languages based on OktoScript. See [OKTOSCRIPT_LICENSE.md](./OKTOSCRIPT_LICENSE.md) for complete license terms.
 ## How to Contribute

MANIFEST.md ADDED Viewed

	@@ -0,0 +1,173 @@

+# 🐙 Official Manifesto — OktoSeek
+<p align="center">
+  <strong>The Philosophy and Legacy of What We Created</strong>
+</p>
+---
+## Our Vision
+OktoSeek was born from a simple yet powerful vision:
+> **To transform the creation of Artificial Intelligence into something accessible, understandable, and possible for anyone in the world.**
+We are not here just to build another AI tool.
+We are here to create a new way of thinking, writing, and building Artificial Intelligence.
+That is why we did not create just code.
+We created a language.
+We created an engine.
+We created an ecosystem.
+---
+## 🌍 The Language Belongs to the World
+OktoSeek believes that no one truly owns a programming language.
+Languages are extensions of human thought.
+They may be born in one mind, but they belong to humanity.
+That is why OktoScript was designed to be:
+- ✅ **Open to learning** — Anyone can learn, use, and contribute
+- ✅ **Free for experimentation** — No barriers to exploration and innovation
+- ✅ **Simple for beginners** — Intuitive syntax that welcomes newcomers
+- ✅ **Powerful for experts** — Advanced capabilities for complex scenarios
+- ✅ **Evolving for the future** — Designed to grow and adapt with the community
+OktoScript is not just a language for training AI.
+It is a new way of communicating with machines.
+Version 2.0 does not represent control.
+It represents evolution.
+Progress.
+Freedom.
+---
+## 🧠 Our Mission
+OktoSeek's mission is clear:
+> **To create a global standard that allows anyone to develop, train, and experiment with Artificial Intelligence without relying on hundreds of lines of complex code.**
+We want a student anywhere in the world to train their own AI.
+We want a researcher to prototype in minutes.
+We want a curious child to understand how an AI is born.
+Artificial Intelligence does not belong to a few.
+It belongs to the world.
+And OktoSeek is the gateway.
+---
+## ⚙️ Our Ecosystem
+OktoSeek is built upon three fundamental pillars:
+| Pillar | Description |
+|--------|-------------|
+| **OktoScript** | The language of intention |
+| **OktoEngine** | The system that brings it to life |
+| **OktoIDE / OktoSeek Studio** | Where everything takes shape |
+Together, they form the first integrated ecosystem focused on making AI creation and training accessible, powerful, and intuitive.
+But more important than the technology itself is the creative freedom it enables.
+---
+## 🤝 Community First
+None of this exists without people.
+OktoSeek believes that:
+- **Code grows with community** — Every contribution makes the ecosystem stronger
+- **Knowledge expands when shared** — Open documentation and learning resources
+- **Innovation is born from collaboration** — Together we build the future
+That is why:
+- **Core parts of the project are open** — Transparency and accessibility
+- **Documentation is public** — Knowledge available to everyone
+- **The community is part of the evolutionary process** — Your voice shapes the future
+We do not create users.
+We create creators.
+---
+## 🏛️ Origin and Legacy
+OktoSeek was conceived and architected by
+**[Ademir Paulo](https://www.linkedin.com/in/ademir-p-de-oliveira-2a2678151/)**, an Artificial Intelligence Engineer,
+driven by the conviction that technology must be accessible to everyone — not only to large corporations or academic institutions.
+And as a central principle of OktoSeek, it is recorded:
+> "When we share knowledge, we unlock the power of people to create things we could never imagine alone."
+Even as the language evolves, technology transforms, and new generations carry it forward, the origin of OktoSeek will always be tied to this vision.
+---
+### The Founder's Message
+> Because knowledge that is kept is limited.
+>
+> But knowledge that is shared… changes the world.
+>
+> — **[Ademir Paulo](https://www.linkedin.com/in/ademir-p-de-oliveira-2a2678151/)**, Founder OktoSeek
+---
+## 📚 Related Resources
+- **Official Website:** https://www.oktoseek.com
+- **Manifesto (Web):** https://www.oktoseek.com/manifest.html
+- **GitHub:** https://github.com/oktoseek
+- **Hugging Face:** https://huggingface.co/OktoSeek
+- **Twitter:** https://x.com/oktoseek
+- **YouTube:** https://www.youtube.com/@Oktoseek
+---
+## 📄 License
+This manifesto represents the philosophy and vision of OktoSeek AI. The content reflects our commitment to open knowledge, community-driven development, and democratizing AI.
+For licensing information about OktoScript and related projects, see the [OKTOSCRIPT_LICENSE.md](./OKTOSCRIPT_LICENSE.md) file.
+---
+<p align="center">
+  <strong>— OktoSeek</strong><br>
+  <em>Transforming AI creation for everyone, everywhere.</em>
+</p>
+<p align="center">
+  Made with ❤️ by the <strong>OktoSeek AI</strong> team
+</p>

README.md CHANGED Viewed

@@ -1,22 +1,3 @@
----
-license: apache-2.0
-tags:
-  - ai
-  - training
-  - dsl
-  - oktoscript
-  - oktoseek
-  - okto
-  - automation
-  - ai-pipelines
-  - ai-governance
-language:
-  - en
-frameworks:
-  - pytorch
-  - tensorflow
----
 <p align="center">
   <img src="./assets/okto_logo.png" alt="OktoScript Banner" width="50%" />
 </p>
@@ -117,7 +98,8 @@ OktoScript is the official language of the OktoSeek ecosystem and is used by:
 - 🎯 **OktoSeek IDE** – Visual AI development and experimentation
 - ⚙️ **OktoEngine** – Core execution and decision engine
-- 🔌 **VS Code Extension** – Code editing + validation
 - 🔄 **Autonomous pipelines** – Training, control, evaluation and inference
 - 🤖 **AI agents** – Controlled, monitored intelligent systems
 - 📱 **Flutter / API deployments** – Cross-platform model integration
@@ -328,6 +310,7 @@ OktoScript v1.1 adds powerful new features while maintaining 100% backward compa
 - ✅ **Image + Caption** - Vision datasets
 - ✅ **Question & Answer (QA)** - Q&A pairs
 - ✅ **Instruction datasets** - Instruction-following
 - ✅ **Multi-modal** - (future support)
 ### Example (JSONL):
@@ -337,6 +320,22 @@ OktoScript v1.1 adds powerful new features while maintaining 100% backward compa
 {"input":"Do you deliver?","output":"Yes, delivery is available in your region."}
 ```
 ---
 ## 📊 Supported Metrics
@@ -368,6 +367,27 @@ METRICS {
 The OktoEngine CLI is minimal by design. All intelligence lives in the `.okt` file. The terminal is just the execution port.
 ### Core Commands
 **Initialize a project:**
@@ -636,6 +656,8 @@ See [`/examples/`](./examples/) for examples using different export formats.
 - ▶️ **Run / Train buttons** - One-click execution
 - 🎨 **Visual pipeline builder** - Drag-and-drop workflows
 ---
 ## 📚 Documentation
@@ -718,13 +740,19 @@ The language evolves to support increasingly sophisticated AI behaviors while ma
 ## 📄 License
-This project is licensed under the Apache License 2.0 - see the [LICENSE](./LICENSE) file for details.
 ---
 ## 🤝 Contributing
-Contributions are welcome! Please feel free to submit a Pull Request. See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
 ---

 <p align="center">
   <img src="./assets/okto_logo.png" alt="OktoScript Banner" width="50%" />
 </p>
 - 🎯 **OktoSeek IDE** – Visual AI development and experimentation
 - ⚙️ **OktoEngine** – Core execution and decision engine
+- 🌐 **OktoScript Web Editor** – Online editor with syntax validation and autocomplete ([Try it now →](https://oktoseek.com/editor.php))
+- 🔌 **VS Code Extension** – Code editing + validation (Coming Soon)
 - 🔄 **Autonomous pipelines** – Training, control, evaluation and inference
 - 🤖 **AI agents** – Controlled, monitored intelligent systems
 - 📱 **Flutter / API deployments** – Cross-platform model integration
 - ✅ **Image + Caption** - Vision datasets
 - ✅ **Question & Answer (QA)** - Q&A pairs
 - ✅ **Instruction datasets** - Instruction-following
+- ✅ **Custom Field Names** (v1.2+) - Define `input_field` and `output_field` for any column names
 - ✅ **Multi-modal** - (future support)
 ### Example (JSONL):
 {"input":"Do you deliver?","output":"Yes, delivery is available in your region."}
 ```
+### Custom Field Names (v1.2+)
+OktoScript now supports custom field names in datasets, allowing you to work with any column names:
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    input_field: "question"    # Custom input column name
+    output_field: "answer"      # Custom output column name
+}
+```
+If not specified, OktoEngine automatically detects `input`/`output` or `input`/`target` fields.
+📖 **[Learn more about custom fields →](./docs/CUSTOM_FIELDS.md)**
 ---
 ## 📊 Supported Metrics
 The OktoEngine CLI is minimal by design. All intelligence lives in the `.okt` file. The terminal is just the execution port.
+### 🌐 Web Editor Command
+**Open OktoScript files in the web editor:**
+```bash
+# Open editor with a specific file
+okto web --file scripts/train.okt
+# Open empty editor
+okto web
+```
+The `okto web` command opens the [OktoScript Web Editor](https://oktoseek.com/editor.php) in your browser. When you provide a file path, it automatically loads the file content for editing. The editor features:
+- **Smart Autocomplete** – Context-aware suggestions based on the current block (ENV, DATASET, MODEL, TRAIN, etc.)
+- **Real-time Syntax Validation** – Detects errors like nested blocks (e.g., PROJECT inside DATASET) and missing braces
+- **Auto-save to Local** – When you load a file, it saves back to the same location automatically
+- **Full Integration** – Seamlessly connects with OktoEngine for validation and training
+Perfect for quick edits, syntax testing, and experimenting with OktoScript configurations!
 ### Core Commands
 **Initialize a project:**
 - ▶️ **Run / Train buttons** - One-click execution
 - 🎨 **Visual pipeline builder** - Drag-and-drop workflows
+> 💡 **Tip:** While waiting for the VS Code extension, use the [🌐 OktoScript Web Editor](https://oktoseek.com/editor.php) for syntax validation, autocomplete, and real-time error checking. It's fully integrated with the CLI via `okto web` command! The web editor provides the same features you'll find in the VS Code extension, including context-aware autocomplete and real-time syntax validation.
 ---
 ## 📚 Documentation
 ## 📄 License
+**OktoScript is free to use, but is a proprietary language owned by OktoSeek AI.**
+OktoScript is available for personal and commercial use at no cost. However, OktoScript is a proprietary language and you may not modify, distribute, clone, fork, or create derivative works of OktoScript.
+See [OKTOSCRIPT_LICENSE.md](./OKTOSCRIPT_LICENSE.md) for complete license terms.
 ---
 ## 🤝 Contributing
+Contributions are welcome! We welcome bug reports, feature suggestions, documentation improvements, and example contributions. Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
+**Note:** OktoScript is a proprietary language owned by OktoSeek AI. While we welcome contributions, you may not create derivative works, forks, or competing languages based on OktoScript.
 ---

docs/CONTEXT_FIELDS.md ADDED Viewed

	@@ -0,0 +1,204 @@

+# Campos de Contexto (Context Fields) - v1.2+
+## Visão Geral
+Campos de contexto são campos adicionais no seu dataset que contêm informações que devem ser incluídas automaticamente no prompt durante o treinamento, mas não são a entrada principal nem a saída esperada.
+## Casos de Uso
+### 1. Chatbots com Contexto Dinâmico
+Para chatbots que precisam de informações contextuais (menu, drinks, promoções, horários, etc.):
+```okt
+DATASET {
+    train: "dataset/pizzaria.jsonl"
+    format: "jsonl"
+    input_field: "input"
+    output_field: "target"
+    context_fields: ["menu", "drinks", "promotions"]
+}
+```
+**Dataset JSONL:**
+```jsonl
+{"input": "What pizzas do you have?", "target": "We have Margherita, Pepperoni, and Four Cheese.", "menu": "Margherita: $34, Pepperoni: $39, Four Cheese: $45", "drinks": "Coke, Sprite, Water"}
+{"input": "Any promotions?", "target": "Yes! Buy 2 get 1 free on Tuesdays.", "menu": "Margherita: $34, Pepperoni: $39", "promotions": "Buy 2 get 1 free on Tuesdays"}
+```
+**Prompt gerado automaticamente:**
+```
+menu: Margherita: $34, Pepperoni: $39, Four Cheese: $45 | drinks: Coke, Sprite, Water | What pizzas do you have?
+```
+### 2. Question Answering com Documentos
+Para QA que precisa de contexto do documento:
+```okt
+DATASET {
+    train: "dataset/qa.jsonl"
+    format: "jsonl"
+    input_field: "question"
+    output_field: "answer"
+    context_fields: ["document", "section"]
+}
+```
+**Dataset JSONL:**
+```jsonl
+{"question": "What is the capital?", "answer": "Brasília", "document": "Geography of Brazil", "section": "Administrative divisions"}
+```
+### 3. Tradução com Contexto
+Para tradução que precisa de contexto adicional:
+```okt
+DATASET {
+    train: "dataset/translation.jsonl"
+    format: "jsonl"
+    input_field: "source"
+    output_field: "target"
+    context_fields: ["domain", "style"]
+}
+```
+## Como Funciona
+### Formato do Prompt
+Os campos de contexto são incluídos **antes** do `input_field` no formato:
+```
+{context_field_1}: {value_1} | {context_field_2}: {value_2} | {input_field}
+```
+### Ordem dos Campos
+Os campos são incluídos na **ordem especificada** em `context_fields`:
+```okt
+context_fields: ["menu", "drinks", "promotions"]
+```
+Resultado: `menu: ... | drinks: ... | promotions: ... | input`
+### Campos Vazios
+Campos vazios são **automaticamente ignorados**:
+```jsonl
+{"input": "Hello", "target": "Hi", "menu": "", "drinks": "Coke"}
+```
+Resultado: `drinks: Coke | Hello` (menu vazio foi ignorado)
+## Sintaxe
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    # Campos principais
+    input_field: "input"
+    output_field: "target"
+    # Campos de contexto (opcional)
+    context_fields: ["menu", "drinks", "promotions"]
+}
+```
+## Exemplo Completo
+```okt
+# okto_version: "1.2"
+PROJECT "pizzaria_chatbot"
+DATASET {
+    train: "dataset/pizzaria_train.jsonl"
+    validation: "dataset/pizzaria_val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    input_field: "input"
+    output_field: "target"
+    context_fields: ["menu", "drinks", "promotions", "hours"]
+}
+MODEL {
+    base: "t5-small"
+    device: "auto"
+}
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+}
+EXPORT {
+    format: ["okm"]
+    path: "export/"
+}
+```
+**Dataset de exemplo:**
+```jsonl
+{"input": "What pizzas do you have?", "target": "We have Margherita ($34), Pepperoni ($39), and Four Cheese ($45).", "menu": "Margherita: $34, Pepperoni: $39, Four Cheese: $45", "drinks": "Coke, Sprite, Water", "promotions": "Buy 2 get 1 free on Tuesdays", "hours": "Open 11am-11pm daily"}
+{"input": "What time do you close?", "target": "We close at 11pm daily.", "menu": "Margherita: $34, Pepperoni: $39", "hours": "Open 11am-11pm daily"}
+```
+## Dicas e Boas Práticas
+### ✅ Faça
+1. **Use nomes descritivos**: `menu`, `drinks`, `promotions` são melhores que `ctx1`, `ctx2`
+2. **Mantenha contexto relevante**: Apenas campos que realmente ajudam o modelo
+3. **Ordene por importância**: Campos mais importantes primeiro
+4. **Use consistentemente**: Mesmos campos em train, validation e test
+### ❌ Evite
+1. **Muitos campos**: Mais de 5-6 campos pode confundir o modelo
+2. **Campos muito longos**: Contexto muito extenso pode ultrapassar o limite de tokens
+3. **Informação redundante**: Não repita informação já no input
+4. **Campos não relacionados**: Apenas contexto relevante para a tarefa
+## Limitações
+- Campos de contexto são incluídos no prompt, então contam para o limite de tokens
+- A ordem dos campos importa - coloque os mais importantes primeiro
+- Campos vazios são ignorados automaticamente
+- Funciona tanto para modelos Seq2Seq (T5, BART) quanto Causal (GPT)
+## Troubleshooting
+### O contexto não está aparecendo no prompt
+**Causa**: Nome do campo incorreto ou campo não existe no dataset.
+**Solução**:
+- Verifique os nomes dos campos no dataset
+- Use `okto validate` para verificar a configuração
+- Certifique-se de que os campos existem em todos os exemplos
+### Prompt muito longo
+**Causa**: Muitos campos de contexto ou campos muito longos.
+**Solução**:
+- Reduza o número de campos de contexto
+- Encurte o conteúdo dos campos
+- Aumente `max_length` no tokenizer (se necessário)
+---
+**Versão**: 1.2+
+**Última atualização**: 2024

docs/CUSTOM_FIELDS.md ADDED Viewed

	@@ -0,0 +1,262 @@

+# Campos Customizados no Dataset (v1.2+)
+## Visão Geral
+A partir da versão 1.2, o OktoScript permite definir campos customizados para input e output no bloco `DATASET`. Além disso, você pode especificar **campos de contexto** que serão automaticamente incluídos no prompt durante o treinamento. Isso oferece flexibilidade total para trabalhar com datasets complexos que incluem informações contextuais (como menu, drinks, promoções, etc.).
+## Sintaxe
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    # Campos customizados (opcional)
+    input_field: "input"      # Nome da coluna de entrada
+    output_field: "target"    # Nome da coluna de saída (ou use target_field)
+    # Campos de contexto (opcional) - incluídos automaticamente no prompt
+    context_fields: ["menu", "drinks", "promotions"]
+}
+```
+## Resolução Automática de Campos
+Se você **não especificar** `input_field` e `output_field`, o OktoEngine tentará encontrar os campos automaticamente na seguinte ordem:
+### Para modelos Seq2Seq (T5, BART, etc.):
+1. **`input` + `output`** (padrão mais comum)
+2. **`input` + `target`** (alternativa comum)
+3. **`text`** (campo único, usado para ambos)
+4. **Primeiro campo string encontrado** (fallback)
+### Para modelos Causal (GPT, etc.):
+1. **`input` + `output`** (concatenados)
+2. **`input` + `target`** (concatenados)
+3. **`text`** (campo único)
+4. **Primeiro campo string encontrado** (fallback)
+## Exemplos de Uso
+### Exemplo 1: Dataset com `input` e `target`
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    format: "jsonl"
+    input_field: "input"
+    output_field: "target"
+}
+```
+**Dataset JSONL:**
+```jsonl
+{"input": "User: Olá", "target": "Assistant: Olá! Como posso ajudar?"}
+{"input": "User: Tudo bem?", "target": "Assistant: Sim, tudo ótimo!"}
+```
+### Exemplo 2: Dataset com campos diferentes
+```okt
+DATASET {
+    train: "dataset/conversations.jsonl"
+    format: "jsonl"
+    input_field: "question"
+    output_field: "answer"
+}
+```
+**Dataset JSONL:**
+```jsonl
+{"question": "Qual é a capital do Brasil?", "answer": "Brasília"}
+{"question": "Quem descobriu o Brasil?", "answer": "Pedro Álvares Cabral"}
+```
+### Exemplo 3: Dataset com nomes em português
+```okt
+DATASET {
+    train: "dataset/treino.jsonl"
+    format: "jsonl"
+    input_field: "entrada"
+    output_field: "saida"
+}
+```
+**Dataset JSONL:**
+```jsonl
+{"entrada": "Traduza: Hello", "saida": "Olá"}
+{"entrada": "Traduza: Goodbye", "saida": "Adeus"}
+```
+### Exemplo 4: Sem especificar campos (auto-detecção)
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    format: "jsonl"
+    # input_field e output_field não especificados
+    # O engine tentará encontrar automaticamente
+}
+```
+O engine tentará:
+- `input` + `output` → se não encontrar
+- `input` + `target` → se não encontrar
+- `text` → se não encontrar
+- Primeiro campo string → fallback
+## Compatibilidade
+### Retrocompatibilidade
+Scripts antigos continuam funcionando sem modificação:
+```okt
+# Script v1.0/v1.1 - funciona perfeitamente
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+}
+```
+O engine detectará automaticamente `input`/`output` ou `input`/`target`.
+### Aliases Suportados
+- `output_field` e `target_field` são equivalentes
+- Ambos podem ser usados para definir o campo de saída
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    input_field: "input"
+    output_field: "target"   # ou target_field: "target"
+}
+```
+## Casos de Uso
+### 1. Datasets de Terceiros
+Quando você usa datasets de repositórios públicos que podem ter nomes de colunas diferentes:
+```okt
+DATASET {
+    train: "datasets/alpaca_pt.jsonl"
+    input_field: "instruction"
+    output_field: "response"
+}
+```
+### 2. Migração de Formatos
+Ao migrar de outros frameworks que usam convenções diferentes:
+```okt
+DATASET {
+    train: "dataset/old_format.jsonl"
+    input_field: "prompt"
+    output_field: "completion"
+}
+```
+### 3. Datasets Multilíngues
+Para datasets que misturam idiomas nos nomes das colunas:
+```okt
+DATASET {
+    train: "dataset/mixed.jsonl"
+    input_field: "entrada"
+    output_field: "saida"
+}
+```
+## Validação
+O OktoEngine valida que:
+- Os campos especificados existem no dataset
+- Os campos contêm dados válidos (strings)
+- O formato do dataset é compatível
+## Dicas
+1. **Use campos customizados quando necessário**: Se seu dataset já usa `input`/`output` ou `input`/`target`, não precisa especificar.
+2. **Teste primeiro**: Use `okto validate` para verificar se os campos estão corretos antes de treinar.
+3. **Consistência**: Mantenha os mesmos nomes de campos em train, validation e test.
+4. **Documentação**: Documente os nomes de campos customizados no seu projeto para facilitar colaboração.
+## Troubleshooting
+### Erro: "Field 'X' not found in dataset"
+**Causa**: O campo especificado não existe no dataset.
+**Solução**:
+- Verifique os nomes das colunas no seu dataset
+- Use `okto validate` para ver quais campos foram detectados
+- Remova `input_field`/`output_field` para usar auto-detecção
+### Erro: "No input/output fields found"
+**Causa**: O engine não conseguiu encontrar campos válidos.
+**Solução**:
+- Especifique explicitamente `input_field` e `output_field`
+- Verifique se o dataset tem pelo menos um campo string
+### Dataset funciona sem especificar campos, mas falha com campos customizados
+**Causa**: Nome do campo incorreto ou com espaços/caracteres especiais.
+**Solução**:
+- Use exatamente o nome da coluna como aparece no JSON
+- Evite espaços ou caracteres especiais nos nomes das colunas
+## Exemplo Completo
+```okt
+# okto_version: "1.2"
+PROJECT "custom_fields_example"
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    # Campos customizados
+    input_field: "user_message"
+    output_field: "assistant_response"
+}
+MODEL {
+    base: "t5-small"
+    device: "auto"
+}
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+}
+EXPORT {
+    format: ["okm"]
+    path: "export/"
+}
+```
+---
+**Versão**: 1.2+
+**Última atualização**: 2024

docs/PERFORMANCE_TIPS.md ADDED Viewed

	@@ -0,0 +1,222 @@

+# Dicas de Performance para Treinamento
+## Configuração de Logs
+### logging_steps
+Controla com que frequência os logs são exibidos durante o treinamento:
+```okt
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+    logging_steps: 5    # Log a cada 5 steps (mais frequente)
+    # logging_steps: 20  # Log a cada 20 steps (menos frequente)
+}
+```
+**Valores recomendados:**
+- **Datasets pequenos (< 1000 exemplos)**: `logging_steps: 5` ou `10`
+- **Datasets médios (1000-10000)**: `logging_steps: 10` ou `20`
+- **Datasets grandes (> 10000)**: `logging_steps: 50` ou `100`
+### save_steps
+Controla com que frequência os checkpoints são salvos:
+```okt
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+    save_steps: 100     # Salva checkpoint a cada 100 steps
+    # save_steps: 500    # Salva checkpoint a cada 500 steps (padrão)
+}
+```
+**Dica**: Para datasets pequenos, use `save_steps` menor para não perder progresso.
+## Otimização de Performance
+### 1. Use GPU quando disponível
+```okt
+ENV {
+    accelerator: "gpu"
+    precision: "fp16"    # Usa menos memória e é mais rápido
+}
+MODEL {
+    base: "t5-small"
+    device: "cuda"       # Força uso de GPU
+}
+```
+**Problema comum**: Se você tem GPU mas vê "No CUDA", instale:
+```bash
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+```
+### 2. Aumente o batch_size (se tiver memória)
+```okt
+TRAIN {
+    epochs: 3
+    batch_size: 16       # Aumente de 8 para 16 ou 32 (se tiver memória)
+    learning_rate: 0.0001
+}
+```
+**Trade-off**:
+- Batch maior = treinamento mais rápido, mas usa mais memória
+- Batch menor = treinamento mais lento, mas usa menos memória
+### 3. Use gradient_accumulation para simular batch maior
+Se não tiver memória para batch grande, use gradient accumulation:
+```okt
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    gradient_accumulation: 4    # Efetivamente batch_size = 8 * 4 = 32
+    learning_rate: 0.0001
+}
+```
+### 4. Reduza o tamanho do input (se possível)
+Se seus inputs são muito longos (como JSON de menu embutido), considere:
+- **Usar context_fields**: Mova informações longas para campos de contexto
+- **Truncar inputs**: O tokenizer já faz isso (max_length: 512), mas inputs menores são mais rápidos
+### 5. Use modelos menores para testes
+Para desenvolvimento/testes rápidos:
+```okt
+MODEL {
+    base: "t5-small"      # Mais rápido
+    # base: "t5-base"     # Mais lento, mas melhor qualidade
+}
+```
+### 6. Reduza epochs para testes
+```okt
+TRAIN {
+    epochs: 1             # Para testes rápidos
+    # epochs: 3          # Para treinamento real
+    batch_size: 8
+}
+```
+## Análise do Seu Caso
+Com base no seu dataset (582 exemplos, inputs longos com Menu JSON):
+### Por que está lento?
+1. **Sem CUDA**: Você está usando CPU, que é ~10-50x mais lento que GPU
+2. **Batch size pequeno (8)**: Com 582 exemplos e batch 8, são ~73 steps por epoch
+3. **Inputs longos**: O Menu JSON embutido aumenta o tempo de processamento
+4. **Modelo T5**: T5-small é relativamente pesado para CPU
+### Soluções Imediatas:
+1. **Instalar CUDA** (se tiver GPU NVIDIA):
+   ```bash
+   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+   ```
+2. **Aumentar batch_size** (se tiver RAM):
+   ```okt
+   TRAIN {
+       epochs: 3
+       batch_size: 16    # Dobrar o batch_size
+       logging_steps: 5  # Logs mais frequentes
+   }
+   ```
+3. **Usar context_fields** (recomendado):
+   ```okt
+   DATASET {
+       train: "dataset/train.jsonl"
+       input_field: "input"
+       output_field: "target"
+       context_fields: ["menu"]  # Move Menu para contexto
+   }
+   ```
+   E no dataset, separe o Menu:
+   ```jsonl
+   {"input": "What pizzas do you have?", "target": "...", "menu": "{\"Margherita\":34,...}"}
+   ```
+4. **Reduzir epochs para testes**:
+   ```okt
+   TRAIN {
+       epochs: 1         # Teste rápido
+       batch_size: 16
+       logging_steps: 5
+   }
+   ```
+## Tempo Esperado
+### CPU (seu caso atual):
+- **582 exemplos, batch 8, 3 epochs**: ~30-60 minutos
+- **Com batch 16**: ~15-30 minutos
+### GPU (com CUDA):
+- **582 exemplos, batch 8, 3 epochs**: ~3-5 minutos
+- **Com batch 16**: ~2-3 minutos
+## Exemplo Otimizado
+```okt
+# okto_version: "1.2"
+PROJECT "pizzaria_optimized"
+ENV {
+    accelerator: "gpu"
+    precision: "fp16"
+    backend: "oktoseek"
+}
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    input_field: "input"
+    output_field: "target"
+    context_fields: ["menu"]  # Menu separado como contexto
+}
+MODEL {
+    base: "t5-small"
+    device: "cuda"
+}
+TRAIN {
+    epochs: 3
+    batch_size: 16           # Aumentado
+    learning_rate: 0.0001
+    logging_steps: 5         # Logs mais frequentes
+    save_steps: 50           # Salva checkpoints mais frequentemente
+}
+EXPORT {
+    format: ["okm"]
+    path: "export/"
+}
+```
+---
+**Última atualização**: 2024

docs/grammar.md CHANGED Viewed

@@ -349,6 +349,8 @@ When OktoEngine encounters an ENV block, it must:
       [<dataset_percent>]
       [<dataset_sampling>]
       [<dataset_shuffle>]
   "}"
 <dataset_train> ::=
@@ -389,6 +391,15 @@ When OktoEngine encounters an ENV block, it must:
 <dataset_shuffle> ::=
   "shuffle" ":" ("true" | "false")
 ```
 **Allowed augmentation values:**
@@ -434,6 +445,51 @@ DATASET {
 }
 ```
 **Dataset Mixing Rules:**
 - If `mix_datasets` is specified, it overrides `train`
 - Total weights in `mix_datasets` must equal 100
@@ -686,6 +742,8 @@ FT_LORA {
       [<gradient_clip>]
       [<warmup_steps>]
       [<save_strategy>]
   "}"
 <train_epochs> ::=
@@ -735,6 +793,12 @@ FT_LORA {
 <save_strategy> ::=
   "save_strategy" ":" ("steps" | "epoch" | "no")
 ```
 **Allowed values and constraints:**
@@ -792,6 +856,8 @@ TRAIN {
     gradient_clip: 1.0
     warmup_steps: 500
     save_strategy: "steps"
 }
 ```
@@ -2563,8 +2629,18 @@ See [`../examples/`](../examples/) for complete working examples:
 **OktoScript** is a domain-specific programming language developed by **OktoSeek AI** for building, training, evaluating and exporting AI models. It is part of the OktoSeek ecosystem, which includes OktoSeek IDE, OktoEngine, and various tools for AI development.
 For more information, visit:
 - **Official website:** https://www.oktoseek.com
 - **GitHub:** https://github.com/oktoseek/oktoscript
 - **Hugging Face:** https://huggingface.co/OktoSeek
 - **Twitter:** https://x.com/oktoseek

       [<dataset_percent>]
       [<dataset_sampling>]
       [<dataset_shuffle>]
+      [<dataset_input_field>]
+      [<dataset_output_field>]
   "}"
 <dataset_train> ::=
 <dataset_shuffle> ::=
   "shuffle" ":" ("true" | "false")
+<dataset_input_field> ::=
+  "input_field" ":" <string>
+<dataset_output_field> ::=
+  ("output_field" | "target_field") ":" <string>
+<dataset_context_fields> ::=
+  "context_fields" ":" "[" <string_list> "]"
 ```
 **Allowed augmentation values:**
 }
 ```
+**Example (v1.2 - Custom Field Names):**
+```okt
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    input_field: "input"
+    output_field: "target"
+}
+```
+**Example (v1.2 - With Context Fields):**
+```okt
+DATASET {
+    train: "dataset/pizzaria.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    input_field: "input"
+    output_field: "target"
+    context_fields: ["menu", "drinks", "promotions"]
+}
+```
+**Dataset JSONL with context:**
+```jsonl
+{"input": "What pizzas do you have?", "target": "We have Margherita, Pepperoni, and Four Cheese.", "menu": "Margherita: $34, Pepperoni: $39, Four Cheese: $45", "drinks": "Coke, Sprite, Water"}
+{"input": "Do you have drinks?", "target": "Yes, we have Coke, Sprite, and Water.", "menu": "Margherita: $34, Pepperoni: $39", "drinks": "Coke, Sprite, Water"}
+```
+The context fields will be automatically included in the prompt:
+- Input: `menu: Margherita: $34, Pepperoni: $39 | drinks: Coke, Sprite, Water | What pizzas do you have?`
+- Target: `We have Margherita, Pepperoni, and Four Cheese.`
+**Field Name Resolution (v1.2+):**
+- If `input_field` and `output_field` are specified, use those exact field names
+- If not specified, defaults are tried in order:
+  1. `"input"` + `"output"` (standard format)
+  2. `"input"` + `"target"` (common alternative)
+  3. `"text"` (single field, used for both input and output)
+  4. First string field in dataset (fallback)
+- `context_fields` are optional and will be included in the prompt if present
+- This ensures backward compatibility while allowing full customization
 **Dataset Mixing Rules:**
 - If `mix_datasets` is specified, it overrides `train`
 - Total weights in `mix_datasets` must equal 100
       [<gradient_clip>]
       [<warmup_steps>]
       [<save_strategy>]
+      [<logging_steps>]
+      [<save_steps>]
   "}"
 <train_epochs> ::=
 <save_strategy> ::=
   "save_strategy" ":" ("steps" | "epoch" | "no")
+<logging_steps> ::=
+  "logging_steps" ":" <number>
+<save_steps> ::=
+  "save_steps" ":" <number>
 ```
 **Allowed values and constraints:**
     gradient_clip: 1.0
     warmup_steps: 500
     save_strategy: "steps"
+    logging_steps: 5    # Log every 5 steps (default: 10)
+    save_steps: 500     # Save checkpoint every 500 steps (default: 500)
 }
 ```
 **OktoScript** is a domain-specific programming language developed by **OktoSeek AI** for building, training, evaluating and exporting AI models. It is part of the OktoSeek ecosystem, which includes OktoSeek IDE, OktoEngine, and various tools for AI development.
+### 🌐 OktoScript Web Editor
+Try OktoScript online with the **OktoScript Web Editor** at [https://oktoseek.com/editor.php](https://oktoseek.com/editor.php). The editor features:
+- **Smart Autocomplete** – Context-aware suggestions based on the current block
+- **Real-time Syntax Validation** – Detects errors like nested blocks and missing braces
+- **CLI Integration** – Use `okto web` command to open files directly
+- **Auto-save to Local** – Saves back to the same location when you load a file
 For more information, visit:
 - **Official website:** https://www.oktoseek.com
+- **Web Editor:** https://oktoseek.com/editor.php
 - **GitHub:** https://github.com/oktoseek/oktoscript
 - **Hugging Face:** https://huggingface.co/OktoSeek
 - **Twitter:** https://x.com/oktoseek

examples/MODEL_NAMES.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# Modelos Válidos no HuggingFace
+## Modelos T5
+| Nome no HuggingFace | Tamanho | Descrição |
+|---------------------|---------|-----------|
+| `t5-small` | 60M | T5 pequeno - rápido para testes |
+| `t5-base` | 220M | T5 base - bom equilíbrio |
+| `t5-large` | 770M | T5 grande - melhor qualidade |
+| `google/flan-t5-small` | 60M | Flan-T5 pequeno |
+| `google/flan-t5-base` | 220M | Flan-T5 base |
+| `google/flan-t5-large` | 780M | Flan-T5 grande |
+## Modelos GPT (Causal LM)
+| Nome no HuggingFace | Tamanho | Descrição |
+|---------------------|---------|-----------|
+| `gpt2` | 124M | GPT-2 padrão |
+| `distilgpt2` | 82M | GPT-2 destilado - mais rápido |
+| `microsoft/DialoGPT-small` | 117M | DialoGPT pequeno |
+| `EleutherAI/gpt-neo-125M` | 125M | GPT-Neo pequeno |
+| `facebook/opt-125m` | 125M | OPT pequeno |
+## ⚠️ Erro Comum
+**❌ ERRADO:**
+```okt
+MODEL {
+  base: "google/t5-small"  # ❌ Não existe!
+}
+```
+**✅ CORRETO:**
+```okt
+MODEL {
+  base: "t5-small"  # ✅ Correto!
+}
+```
+## Como Verificar se um Modelo Existe
+1. Acesse: https://huggingface.co/models
+2. Busque pelo nome do modelo
+3. Verifique o nome exato na URL ou na página do modelo
+## Exemplos de Uso
+### T5 para Tradução/Sumarização
+```okt
+MODEL {
+  base: "t5-small"
+}
+```
+### Flan-T5 para Chat/Instruções
+```okt
+MODEL {
+  base: "google/flan-t5-base"
+}
+```
+### GPT-2 para Geração de Texto
+```okt
+MODEL {
+  base: "gpt2"
+}
+```

examples/QUICK_FIX.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# 🔧 Correção Rápida - Erro de Parsing
+## Problema: "Failed to parse file"
+### Solução Rápida:
+1. **Use o comando validate primeiro para ver o erro detalhado:**
+```bash
+okto validate scripts/train.okt
+```
+2. **Verifique o encoding do arquivo:**
+   - No VSCode: veja no canto inferior direito → deve mostrar "UTF-8"
+   - Se não for UTF-8, clique e selecione "Save with Encoding" → "UTF-8"
+3. **Copie um arquivo de exemplo limpo:**
+```bash
+# Copie o exemplo limpo
+cp oktoscript/examples/test-t5-basic-clean.okt scripts/train.okt
+```
+4. **Ou crie manualmente com este conteúdo mínimo:**
+```okt
+# okto_version: "1.2"
+PROJECT "test_t5_basic"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "google/t5-small"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 8
+  learning_rate: 0.0001
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}
+```
+### ⚠️ Problemas Comuns:
+1. **Bloco de Notas do Windows** adiciona BOM (Byte Order Mark)
+   - **Solução:** Use VSCode ou Notepad++
+2. **Caracteres especiais** em comentários ou strings
+   - **Solução:** Use apenas ASCII ou UTF-8 válido
+3. **Aspas curvas** `"` ou `"` ao invés de retas `"`
+   - **Solução:** Use sempre aspas retas
+4. **Espaços invisíveis** ou caracteres de controle
+   - **Solução:** Re-digite o arquivo ou use um editor que mostre caracteres invisíveis
+### ✅ Teste Rápido:
+```bash
+# 1. Validar
+okto validate scripts/train.okt
+# 2. Se validar, treinar
+okto train scripts/train.okt
+```

examples/README.md CHANGED Viewed

@@ -38,6 +38,23 @@ These examples are used by:
 | [`lora-finetuning.okt`](./lora-finetuning.okt) | LoRA fine-tuning with dataset mixing | Efficient fine-tuning, memory-efficient training |
 | [`dataset-mixing.okt`](./dataset-mixing.okt) | Training with multiple weighted datasets | Combining datasets, weighted sampling |
 ### v1.2 Examples (Advanced Features)
 | File | Description | Use Case |

 | [`lora-finetuning.okt`](./lora-finetuning.okt) | LoRA fine-tuning with dataset mixing | Efficient fine-tuning, memory-efficient training |
 | [`dataset-mixing.okt`](./dataset-mixing.okt) | Training with multiple weighted datasets | Combining datasets, weighted sampling |
+### 🧪 Test Scripts (Recommended for Testing)
+These scripts are specifically designed for testing different features of OktoScript v1.2:
+| File | Description | Features Tested |
+|------|-------------|-----------------|
+| [`test-t5-basic.okt`](./test-t5-basic.okt) | Basic training | PROJECT, ENV, DATASET, MODEL, TRAIN, EXPORT |
+| [`test-t5-monitor.okt`](./test-t5-monitor.okt) | Training with MONITOR | Full metrics tracking, notifications |
+| [`test-t5-control.okt`](./test-t5-control.okt) | Training with CONTROL | Automatic decisions, IF/WHEN/EVERY |
+| [`test-flan-t5-complete.okt`](./test-flan-t5-complete.okt) | All advanced blocks | MONITOR, CONTROL, STABILITY together |
+| [`test-flan-t5-inference.okt`](./test-flan-t5-inference.okt) | Inference with governance | BEHAVIOR, GUARD, INFERENCE blocks |
+| [`test-t5-explorer.okt`](./test-t5-explorer.okt) | AutoML with EXPLORER | Hyperparameter search, best model selection |
+📖 **See [`TESTING_GUIDE.md`](./TESTING_GUIDE.md) for detailed testing instructions.**
+---
 ### v1.2 Examples (Advanced Features)
 | File | Description | Use Case |

examples/TESTING_GUIDE.md ADDED Viewed

	@@ -0,0 +1,227 @@

+# Guia de Testes - OktoScript v1.2
+Este guia lista todos os scripts de teste disponíveis e como usá-los para validar diferentes funcionalidades do OktoScript.
+## 📋 Scripts de Teste Disponíveis
+### 1. `test-t5-basic.okt` - Treinamento Básico
+**Objetivo:** Testar treinamento simples sem blocos avançados
+**Modelo:** `google/t5-small`
+**Blocos usados:**
+- PROJECT
+- ENV
+- DATASET
+- MODEL
+- TRAIN
+- EXPORT
+**Como testar:**
+```bash
+okto validate examples/test-t5-basic.okt
+okto train examples/test-t5-basic.okt
+```
+**O que verificar:**
+- ✅ Treinamento inicia sem erros
+- ✅ Modelo é salvo em `runs/test_t5_basic/`
+- ✅ Export funciona para `okm` e `safetensors`
+---
+### 2. `test-t5-monitor.okt` - Monitoramento de Métricas
+**Objetivo:** Testar bloco MONITOR com tracking completo
+**Modelo:** `google/t5-small`
+**Blocos usados:**
+- MONITOR (completo)
+- Métricas: loss, val_loss, accuracy, perplexity, gpu_usage, ram_usage, throughput, latency
+**Como testar:**
+```bash
+okto validate examples/test-t5-monitor.okt
+okto train examples/test-t5-monitor.okt
+okto logs test_t5_monitor
+```
+**O que verificar:**
+- ✅ Métricas são coletadas durante treinamento
+- ✅ Arquivo `logs/training_monitor.log` é criado
+- ✅ Notificações são geradas quando condições são atendidas
+---
+### 3. `test-t5-control.okt` - Controle e Decisões
+**Objetivo:** Testar bloco CONTROL com lógica condicional
+**Modelo:** `google/t5-small`
+**Blocos usados:**
+- CONTROL (completo)
+- Eventos: on_step_end, on_epoch_end
+- Diretivas: IF, WHEN, EVERY, SET, STOP_TRAINING, DECREASE, SAVE, LOG
+**Como testar:**
+```bash
+okto validate examples/test-t5-control.okt
+okto train examples/test-t5-control.okt
+okto logs test_t5_control
+```
+**O que verificar:**
+- ✅ Logs são gerados em cada step/epoch
+- ✅ Learning rate é ajustado automaticamente quando loss > 2.0
+- ✅ Treinamento para quando val_loss > 2.5
+- ✅ Checkpoints são salvos a cada 500 steps
+- ✅ Arquivo `control_decisions.json` é criado em `runs/test_t5_control/`
+---
+### 4. `test-flan-t5-complete.okt` - Todos os Blocos
+**Objetivo:** Testar todos os blocos avançados juntos
+**Modelo:** `google/flan-t5-base`
+**Blocos usados:**
+- MONITOR (completo)
+- CONTROL (completo com lógica aninhada)
+- STABILITY
+- EXPORT
+**Como testar:**
+```bash
+okto validate examples/test-flan-t5-complete.okt
+okto train examples/test-flan-t5-complete.okt
+okto logs test_flan_t5_complete
+```
+**O que verificar:**
+- ✅ Todos os blocos funcionam juntos
+- ✅ Lógica aninhada no CONTROL funciona (IF dentro de on_epoch_end)
+- ✅ STABILITY previne NaN e divergência
+- ✅ Métricas completas são coletadas
+---
+### 5. `test-flan-t5-inference.okt` - Inferência com Governança
+**Objetivo:** Testar inferência com BEHAVIOR, GUARD e INFERENCE
+**Modelo:** `google/flan-t5-base`
+**Blocos usados:**
+- BEHAVIOR (personality, language, avoid)
+- GUARD (prevent, detect_using, on_violation)
+- INFERENCE (mode, format, params, CONTROL aninhado)
+**Como testar:**
+```bash
+# Treinar primeiro
+okto train examples/test-flan-t5-inference.okt
+# Testar inferência
+okto infer --model export/test_flan_t5_inference --text "Olá, como você está?"
+# Testar chat interativo
+okto chat --model export/test_flan_t5_inference
+```
+**O que verificar:**
+- ✅ Modelo respeita BEHAVIOR (personality, language)
+- ✅ GUARD bloqueia conteúdo tóxico/inadequado
+- ✅ INFERENCE usa formato correto
+- ✅ CONTROL dentro de INFERENCE funciona (RETRY, REGENERATE)
+---
+### 6. `test-t5-explorer.okt` - AutoML Básico
+**Objetivo:** Testar bloco EXPLORER para busca de hiperparâmetros
+**Modelo:** `google/t5-small`
+**Blocos usados:**
+- EXPLORER (try, max_tests, pick_best_by)
+- MONITOR
+**Como testar:**
+```bash
+okto validate examples/test-t5-explorer.okt
+okto train examples/test-t5-explorer.okt
+```
+**O que verificar:**
+- ✅ Múltiplas combinações de hiperparâmetros são testadas
+- ✅ Melhor modelo é selecionado por val_loss
+- ✅ Logs mostram resultados de cada teste
+---
+## 🧪 Sequência Recomendada de Testes
+### Fase 1: Testes Básicos
+1. `test-t5-basic.okt` - Validar pipeline básico
+2. `test-t5-monitor.okt` - Validar monitoramento
+### Fase 2: Testes de Controle
+3. `test-t5-control.okt` - Validar decisões automáticas
+4. `test-flan-t5-complete.okt` - Validar integração completa
+### Fase 3: Testes Avançados
+5. `test-flan-t5-inference.okt` - Validar inferência governada
+6. `test-t5-explorer.okt` - Validar AutoML
+---
+## 📊 Checklist de Validação
+Para cada teste, verifique:
+- [ ] Script valida sem erros (`okto validate`)
+- [ ] Treinamento inicia corretamente
+- [ ] Blocos específicos funcionam como esperado
+- [ ] Logs são gerados corretamente
+- [ ] Export funciona para formato especificado
+- [ ] Arquivos são salvos nos locais corretos
+---
+## 🔍 Comandos Úteis
+```bash
+# Validar script
+okto validate examples/test-t5-basic.okt
+# Treinar
+okto train examples/test-t5-basic.okt
+# Ver logs
+okto logs test_t5_basic
+# Inferência
+okto infer --model export/test_t5_basic --text "Hello"
+# Chat interativo
+okto chat --model export/test_t5_basic
+# Ver conteúdo do script
+okto show examples/test-t5-basic.okt
+```
+---
+## 📝 Notas
+- Todos os testes usam `dataset/train.jsonl` e `dataset/val.jsonl`
+- Certifique-se de ter dados de teste antes de executar
+- Modelos T5 são menores e mais rápidos para testes
+- Modelos Flan-T5 são melhores para inferência e chat
+- Ajuste `batch_size` e `epochs` conforme sua GPU
+---
+**Boa sorte com os testes! 🚀**

examples/TROUBLESHOOTING.md ADDED Viewed

	@@ -0,0 +1,114 @@

+# Troubleshooting - Problemas Comuns
+## Erro: "Failed to parse file"
+### Possíveis Causas:
+1. **Encoding do arquivo**
+   - O arquivo deve estar em **UTF-8 sem BOM**
+   - Evite salvar no Bloco de Notas do Windows (pode adicionar BOM)
+   - Use VSCode, Notepad++, ou outro editor que suporte UTF-8
+2. **Caracteres invisíveis**
+   - Copiar/colar pode adicionar caracteres invisíveis
+   - Re-digite o arquivo ou use um editor que mostre caracteres invisíveis
+3. **Problemas com comentários**
+   - Comentários devem começar com `#` no início da linha
+   - Evite caracteres especiais em comentários
+4. **Aspas incorretas**
+   - Use aspas retas `"` não aspas curvas `"` ou `"`
+   - Verifique se todas as aspas estão fechadas
+### Soluções:
+#### 1. Validar o arquivo primeiro:
+```bash
+okto validate scripts/train.okt
+```
+Isso mostrará erros detalhados.
+#### 2. Verificar encoding no VSCode:
+- Abra o arquivo no VSCode
+- Veja no canto inferior direito: deve mostrar "UTF-8"
+- Se mostrar outro encoding, clique e selecione "Save with Encoding" → "UTF-8"
+#### 3. Criar arquivo limpo:
+```bash
+# Copie o conteúdo do exemplo
+cp oktoscript/examples/test-t5-basic.okt scripts/train.okt
+# Ou crie manualmente
+```
+#### 4. Verificar sintaxe básica:
+- Todas as strings devem estar entre aspas: `"valor"`
+- Arrays devem usar colchetes: `["okm", "safetensors"]`
+- Blocos devem ter chaves: `{ ... }`
+- Não use vírgulas no final de arrays ou objetos
+### Exemplo de arquivo correto:
+```okt
+# okto_version: "1.2"
+PROJECT "test_t5_basic"
+DESCRIPTION "Teste basico"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "google/t5-small"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 8
+  learning_rate: 0.0001
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}
+```
+### Checklist:
+- [ ] Arquivo está em UTF-8
+- [ ] Todas as aspas estão fechadas
+- [ ] Não há caracteres especiais invisíveis
+- [ ] Sintaxe está correta (chaves, colchetes, etc.)
+- [ ] `okto validate` passa sem erros
+### Se ainda não funcionar:
+1. Execute com `--debug` (se disponível):
+```bash
+okto validate scripts/train.okt --debug
+```
+2. Verifique o conteúdo do arquivo:
+```bash
+okto show scripts/train.okt
+```
+3. Compare com um exemplo que funciona:
+```bash
+okto validate oktoscript/examples/test-t5-basic.okt
+```

examples/pizzabot/okt.yaml CHANGED Viewed

@@ -10,7 +10,7 @@ structure:
   export_dir: "export/"
 default_language: "en"
-license: "Apache-2.0"

   export_dir: "export/"
 default_language: "en"
+license: "OktoScript License 1.0"

examples/test-flan-t5-complete.okt ADDED Viewed

	@@ -0,0 +1,112 @@

+# okto_version: "1.2"
+# Teste 4: Flan-T5 Completo - Todos os Blocos
+# Modelo: google/flan-t5-base
+# Objetivo: Testar todos os blocos avançados juntos
+PROJECT "test_flan_t5_complete"
+DESCRIPTION "Teste completo Flan-T5 com todos os blocos v1.2"
+ENV {
+  accelerator: "gpu"
+  min_memory: "8GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "google/flan-t5-base"
+  device: "auto"
+}
+TRAIN {
+  epochs: 5
+  batch_size: 16
+  learning_rate: 0.0001
+  device: "auto"
+}
+MONITOR {
+  metrics: [
+    "loss",
+    "val_loss",
+    "accuracy",
+    "perplexity",
+    "gpu_usage",
+    "ram_usage",
+    "throughput",
+    "latency",
+    "confidence"
+  ]
+  notify_if {
+    loss > 2.0
+    val_loss > 2.5
+    gpu_usage > 90%
+    ram_usage > 80%
+  }
+  log_to: "logs/training_complete.log"
+}
+CONTROL {
+  on_step_end {
+    LOG loss
+  }
+  on_epoch_end {
+    SAVE model
+    LOG "Epoch completed"
+    IF loss > 1.5 {
+      SET LR = 0.00005
+      LOG "Loss still high after epoch - reducing LR"
+    }
+    IF accuracy > 0.9 {
+      SAVE "best_model"
+      LOG "High accuracy reached - saving best model"
+    }
+  }
+  validate_every: 200
+  IF loss > 2.0 {
+    SET LR = 0.00005
+    LOG "High loss detected"
+  }
+  IF val_loss > 2.5 {
+    STOP_TRAINING
+    LOG "Validation loss too high"
+  }
+  WHEN gpu_memory < 12GB {
+    SET batch_size = 8
+    LOG "Reducing batch size due to GPU pressure"
+  }
+  EVERY 1000 steps {
+    SAVE checkpoint
+    LOG "Periodic checkpoint"
+  }
+}
+STABILITY {
+  stop_if_nan: true
+  stop_if_diverges: true
+  min_improvement: 0.001
+}
+EXPORT {
+  format: ["okm", "safetensors"]
+  path: "export/"
+}

examples/test-flan-t5-inference.okt ADDED Viewed

	@@ -0,0 +1,94 @@

+# okto_version: "1.2"
+# Teste 5: Flan-T5 com INFERENCE e BEHAVIOR
+# Modelo: google/flan-t5-base
+# Objetivo: Testar inferência com controle de comportamento
+PROJECT "test_flan_t5_inference"
+DESCRIPTION "Teste Flan-T5 com INFERENCE, BEHAVIOR e GUARD"
+ENV {
+  accelerator: "gpu"
+  min_memory: "8GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "google/flan-t5-base"
+  device: "auto"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 16
+  learning_rate: 0.0001
+  device: "auto"
+}
+BEHAVIOR {
+  mode: "chat"
+  personality: "friendly"
+  verbosity: "medium"
+  language: "pt-BR"
+  avoid: ["violence", "hate", "politics"]
+  fallback: "Como posso ajudar?"
+  prompt_style: "User: {input}\nAssistant:"
+}
+GUARD {
+  prevent {
+    hallucination
+    toxicity
+    bias
+    data_leak
+  }
+  detect_using: ["classifier", "regex", "rule_engine"]
+  on_violation {
+    REPLACE with_message: "Desculpe, essa solicitação não é permitida."
+  }
+}
+INFERENCE {
+  mode: "chat"
+  format: "User: {input}\nAssistant:"
+  exit_command: "/exit"
+  params {
+    max_length: 120
+    temperature: 0.7
+    top_p: 0.9
+    beams: 2
+    do_sample: true
+  }
+  CONTROL {
+    IF confidence < 0.3 {
+      RETRY
+      LOG "Low confidence - retrying"
+    }
+    IF repetition > 3 {
+      REGENERATE
+      LOG "High repetition detected - regenerating"
+    }
+  }
+}
+MONITOR {
+  metrics: ["loss", "val_loss", "accuracy", "confidence"]
+  log_to: "logs/inference_test.log"
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}

examples/test-pizzaria-context.okt ADDED Viewed

	@@ -0,0 +1,46 @@

+# okto_version: "1.2"
+PROJECT "pizzaria_chatbot"
+DESCRIPTION "Chatbot de pizzaria com campos de contexto (menu, drinks, etc.)"
+ENV {
+    accelerator: "gpu"
+    min_memory: "4GB"
+    precision: "fp16"
+    backend: "oktoseek"
+    install_missing: true
+}
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    # Campos principais
+    input_field: "input"
+    output_field: "target"
+    # Campos de contexto que serão incluídos automaticamente no prompt
+    context_fields: ["menu", "drinks", "promotions"]
+}
+MODEL {
+    base: "t5-small"
+    device: "auto"
+}
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+    device: "auto"
+}
+EXPORT {
+    format: ["okm", "safetensors"]
+    path: "export/"
+}

examples/test-t5-basic-clean.okt ADDED Viewed

	@@ -0,0 +1,38 @@

+# okto_version: "1.2"
+PROJECT "test_t5_basic"
+DESCRIPTION "Teste basico com T5-small - sem blocos avancados"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "google/t5-small"
+  device: "auto"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 8
+  learning_rate: 0.0001
+  device: "auto"
+}
+EXPORT {
+  format: ["okm", "safetensors"]
+  path: "export/"
+}

examples/test-t5-basic.okt ADDED Viewed

	@@ -0,0 +1,39 @@

+# okto_version: "1.2"
+# Teste 1: T5 Básico - Treinamento Simples
+# Modelo: google/t5-small
+# Objetivo: Testar treinamento básico sem blocos avançados
+PROJECT "test_t5_basic"
+DESCRIPTION "Teste básico com T5-small - sem blocos avançados"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "t5-small"
+  device: "auto"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 8
+  learning_rate: 0.0001
+  device: "auto"
+}
+EXPORT {
+  format: ["okm", "safetensors"]
+  path: "export/"
+}

examples/test-t5-control.okt ADDED Viewed

	@@ -0,0 +1,77 @@

+# okto_version: "1.2"
+# Teste 3: T5 com CONTROL - Decisões Automáticas
+# Modelo: google/t5-small
+# Objetivo: Testar bloco CONTROL com lógica condicional
+PROJECT "test_t5_control"
+DESCRIPTION "Teste T5 com bloco CONTROL - decisões automáticas durante treino"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "t5-small"
+  device: "auto"
+}
+TRAIN {
+  epochs: 5
+  batch_size: 8
+  learning_rate: 0.0001
+  device: "auto"
+}
+CONTROL {
+  on_step_end {
+    LOG loss
+  }
+  on_epoch_end {
+    SAVE model
+    LOG "Epoch completed"
+  }
+  validate_every: 100
+  IF loss > 2.0 {
+    SET LR = 0.00005
+    LOG "High loss detected - reducing learning rate"
+  }
+  IF val_loss > 2.5 {
+    STOP_TRAINING
+    LOG "Validation loss too high - stopping training"
+  }
+  IF accuracy < 0.4 {
+    DECREASE LR BY 0.5
+    LOG "Low accuracy - decreasing learning rate by 50%"
+  }
+  WHEN gpu_memory < 8GB {
+    SET batch_size = 4
+    LOG "Low GPU memory - reducing batch size"
+  }
+  EVERY 500 steps {
+    SAVE checkpoint
+    LOG "Checkpoint saved"
+  }
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}

examples/test-t5-custom-fields.okt ADDED Viewed

	@@ -0,0 +1,42 @@

+# okto_version: "1.2"
+PROJECT "test_t5_custom_fields"
+DESCRIPTION "Exemplo usando campos customizados no dataset (input_field e output_field)"
+ENV {
+    accelerator: "gpu"
+    min_memory: "4GB"
+    precision: "fp16"
+    backend: "oktoseek"
+    install_missing: true
+}
+DATASET {
+    train: "dataset/train.jsonl"
+    validation: "dataset/val.jsonl"
+    format: "jsonl"
+    type: "chat"
+    # Campos customizados: define quais colunas usar do dataset
+    input_field: "input"
+    output_field: "target"
+}
+MODEL {
+    base: "t5-small"
+    device: "auto"
+}
+TRAIN {
+    epochs: 3
+    batch_size: 8
+    learning_rate: 0.0001
+    device: "auto"
+}
+EXPORT {
+    format: ["okm", "safetensors"]
+    path: "export/"
+}

examples/test-t5-explorer.okt ADDED Viewed

	@@ -0,0 +1,54 @@

+# okto_version: "1.2"
+# Teste 6: T5 com EXPLORER - AutoML Básico
+# Modelo: google/t5-small
+# Objetivo: Testar bloco EXPLORER para busca de hiperparâmetros
+PROJECT "test_t5_explorer"
+DESCRIPTION "Teste T5 com EXPLORER - busca automática de hiperparâmetros"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "t5-small"
+  device: "auto"
+}
+TRAIN {
+  epochs: 3
+  batch_size: 8
+  learning_rate: 0.0001
+  device: "auto"
+}
+EXPLORER {
+  try {
+    lr: [0.001, 0.0005, 0.0001]
+    batch_size: [4, 8, 16]
+    optimizer: ["adamw", "sgd"]
+  }
+  max_tests: 5
+  pick_best_by: "val_loss"
+}
+MONITOR {
+  metrics: ["loss", "val_loss", "accuracy"]
+  log_to: "logs/explorer_test.log"
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}

examples/test-t5-monitor.okt ADDED Viewed

	@@ -0,0 +1,58 @@

+# okto_version: "1.2"
+# Teste 2: T5 com MONITOR - Métricas Completas
+# Modelo: google/t5-small
+# Objetivo: Testar bloco MONITOR com tracking de métricas
+PROJECT "test_t5_monitor"
+DESCRIPTION "Teste T5 com bloco MONITOR - tracking completo de métricas"
+ENV {
+  accelerator: "gpu"
+  min_memory: "4GB"
+  precision: "fp16"
+  backend: "oktoseek"
+  install_missing: true
+}
+DATASET {
+  train: "dataset/train.jsonl"
+  validation: "dataset/val.jsonl"
+}
+MODEL {
+  base: "t5-small"
+  device: "auto"
+}
+TRAIN {
+  epochs: 5
+  batch_size: 8
+  learning_rate: 0.0001
+  device: "auto"
+}
+MONITOR {
+  metrics: [
+    "loss",
+    "val_loss",
+    "accuracy",
+    "perplexity",
+    "gpu_usage",
+    "ram_usage",
+    "throughput",
+    "latency"
+  ]
+  notify_if {
+    loss > 2.0
+    val_loss > 2.5
+    gpu_usage > 90%
+  }
+  log_to: "logs/training_monitor.log"
+}
+EXPORT {
+  format: ["okm"]
+  path: "export/"
+}