Upload 12 files

Browse files

Files changed (12) hide show

kairox_hf_dataset_repo/.DS_Store +0 -0
kairox_hf_dataset_repo/DATASET_CARD.md +18 -0
kairox_hf_dataset_repo/README.md +48 -0
kairox_hf_dataset_repo/configs/model_selection.md +11 -0
kairox_hf_dataset_repo/data/test.jsonl +17 -0
kairox_hf_dataset_repo/data/train.jsonl +0 -0
kairox_hf_dataset_repo/data/validation.jsonl +16 -0
kairox_hf_dataset_repo/eval/validate_dataset.py +11 -0
kairox_hf_dataset_repo/requirements.txt +8 -0
kairox_hf_dataset_repo/scripts/build_dataset.py +8 -0
kairox_hf_dataset_repo/scripts/evaluate.py +32 -0
kairox_hf_dataset_repo/scripts/train_qlora.py +99 -0

kairox_hf_dataset_repo/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

kairox_hf_dataset_repo/DATASET_CARD.md ADDED Viewed

	@@ -0,0 +1,18 @@

+# Dataset Card: Kairox Instruction Dataset
+## Summary
+Automatically generated supervised fine-tuning dataset from the Kairox repository and uploaded role specifications.
+- Train examples: 150
+- Validation examples: 16
+- Test examples: 17
+- Format: chat JSONL with `messages` and `metadata`
+## Intended Use
+Fine-tuning an instruction model to understand Kairox architecture, coding-agent behavior, pandas-analysis conventions, and production-grade software-development response structure.
+## Limitations
+Artifact-derived examples are useful for bootstrapping style and repo knowledge, but should be augmented with human-authored task/answer pairs and a locked evaluation set.

kairox_hf_dataset_repo/README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# Kairox Hugging Face Fine-Tuning Dataset
+Generated from the uploaded Kairox project. It converts documentation, source modules, tests, built-in skills, and role specifications into chat-format JSONL for supervised fine-tuning.
+## Contents
+- `data/train.jsonl`: 150 examples
+- `data/validation.jsonl`: 16 examples
+- `data/test.jsonl`: 17 examples
+- `scripts/train_qlora.py`: QLoRA SFT training script
+- `scripts/evaluate.py`: smoke evaluation script
+- `configs/model_selection.md`: GPU-to-model guidance
+## Dataset Format
+Each line is chat JSONL:
+```json
+{"messages":[{"role":"system","content":"..."},{"role":"user","content":"..."},{"role":"assistant","content":"..."}],"metadata":{"source":"README.md","task_type":"repo_documentation"}}
+```
+## Setup
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+## Train
+```bash
+python scripts/train_qlora.py --model_name Qwen/Qwen2.5-7B-Instruct --data_dir data --output_dir outputs/kairox-qwen25-7b-qlora
+```
+## Evaluate
+```bash
+python scripts/evaluate.py --model outputs/kairox-qwen25-7b-qlora --test data/test.jsonl
+```
+## Recommended Model
+Start with `Qwen/Qwen2.5-7B-Instruct` on a 16 GB GPU. Use `Qwen/Qwen2.5-3B-Instruct` for 12 GB, and `Qwen/Qwen2.5-14B-Instruct` for 24 GB.
+## Limitation
+This is a bootstrap dataset automatically derived from project artifacts. Add human-written gold conversations before serious release training.

kairox_hf_dataset_repo/configs/model_selection.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# Base Model Selection Guide
+| GPU / VRAM | Recommended base model | Method | Notes |
+|---|---|---|---|
+| CPU only / <=8 GB | Qwen/Qwen2.5-0.5B-Instruct or Qwen/Qwen2.5-1.5B-Instruct | LoRA or full fine-tune if feasible | Good for format imitation, weak for deep coding. |
+| 12 GB | Qwen/Qwen2.5-3B-Instruct | QLoRA 4-bit | Best practical starter tier. |
+| 16 GB | Qwen/Qwen2.5-7B-Instruct or mistralai/Mistral-7B-Instruct-v0.3 | QLoRA 4-bit | Recommended default for Kairox-style coding/agent behavior. |
+| 24 GB | Qwen/Qwen2.5-14B-Instruct | QLoRA 4-bit | Better reasoning, slower iteration. |
+| 48 GB+ | Qwen/Qwen2.5-32B-Instruct or Llama-3.1-70B-Instruct with multi-GPU | QLoRA / FSDP | Use only after dataset/eval loop is mature. |
+Default: start with `Qwen/Qwen2.5-7B-Instruct` using QLoRA if you have 16 GB VRAM. If your GPU is smaller, use the 3B or 1.5B tier first.

kairox_hf_dataset_repo/data/test.jsonl ADDED Viewed

	@@ -0,0 +1,17 @@

+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `__init__` be used?"}, {"role": "assistant", "content": "Use `__init__` when:   init  . Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/__init__.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `advanced_planning` be used?"}, {"role": "assistant", "content": "Use `advanced_planning` when: Requires structured 6-section planning before any large task. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/advanced_planning.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `architecture` be used?"}, {"role": "assistant", "content": "Use `architecture` when: Show design/refactoring guidance for architecture-related prompts. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/architecture.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `autolint` be used?"}, {"role": "assistant", "content": "Use `autolint` when: Suggest linting commands after file saves. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/autolint.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `autotest` be used?"}, {"role": "assistant", "content": "Use `autotest` when: Suggest running tests when a Python file is modified. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/autotest.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `bug_alert` be used?"}, {"role": "assistant", "content": "Use `bug_alert` when: Show debugging tips when bug/error keywords are detected in messages or tool output. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/bug_alert.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `core_system_prompt` be used?"}, {"role": "assistant", "content": "Use `core_system_prompt` when: Base system prompt booster with core agent behavior rules. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/core_system_prompt.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `hf_workflow` be used?"}, {"role": "assistant", "content": "Use `hf_workflow` when: Guides HuggingFace model training, evaluation, and hub workflows. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/hf_workflow.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `kernel_output` be used?"}, {"role": "assistant", "content": "Use `kernel_output` when: Enforces KERNEL structured output format on all responses. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/kernel_output.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `multi_agent` be used?"}, {"role": "assistant", "content": "Use `multi_agent` when: Orchestrates Planner → Executor → Reviewer sub-agent workflow. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/multi_agent.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `performance` be used?"}, {"role": "assistant", "content": "Use `performance` when: Show performance optimization tips when performance keywords are detected. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/performance.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `persistent_memory` be used?"}, {"role": "assistant", "content": "Use `persistent_memory` when: Maintains and updates project memory file across sessions. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/persistent_memory.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `react_mode` be used?"}, {"role": "assistant", "content": "Use `react_mode` when: Enforces ReAct reasoning loop for all tool-use interactions. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/react_mode.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `safety_mode` be used?"}, {"role": "assistant", "content": "Use `safety_mode` when: Requires risk classification and user confirmation for high-risk tools. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/safety_mode.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `security` be used?"}, {"role": "assistant", "content": "Use `security` when: Alert on potentially unsafe patterns (secrets, eval, SQL injection, etc.). Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/security.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `self_reflection` be used?"}, {"role": "assistant", "content": "Use `self_reflection` when: Runs structured post-action reflection after every tool call. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/self_reflection.py", "task_type": "skill_behavior"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "When should the Kairox skill `surgical_editor` be used?"}, {"role": "assistant", "content": "Use `surgical_editor` when: Enforces minimal diff-based edits with automatic verification. Activate it only when the user request or tool context matches the trigger, then keep the response focused on the resulting guidance rather than exposing internal mechanics."}], "metadata": {"source": "src/kairox00/agent/skills/builtin/surgical_editor.py", "task_type": "skill_behavior"}}

kairox_hf_dataset_repo/data/train.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

kairox_hf_dataset_repo/data/validation.jsonl ADDED Viewed

	@@ -0,0 +1,16 @@

+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Review this module for maintainability and suggest safe improvements without changing behavior:\n\n```python\n\"\"\"MCTS-based risk classifier for Bash / FileWrite / FileEdit commands.\n\nInstead of a hand-written rule list, we frame risk estimation as a tiny\none-player search: the \"actions\" are command properties (read-only,\nwrites-to-existing, creates-new, mutates-path, contains-dangerous-words),\nand the leaf value is a risk score. Because the branching factor is tiny\n(< 8), a small MuZero search gives a smoothed, context-aware score.\n\nFor the \"best form\" build this is deliberately lightweight — the classifier\ncan be swapped for a learned model later without changing the permission\ninterface.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nimport numpy as np\n\nfrom kairox00.mcts import (\n    RecurrentFnOutput,\n    RootFnOutput,\n    gumbel_muzero_policy,\n)\n\n# Property actions used by the risk search model.\n_PROPS = [\n    \"read_only\",\n    \"write_existing\",\n    \"create_new\",\n    \"mutate_path\",\n    \"dangerous_word\",\n    \"recursive_delete\",\n    \"network_call\",\n    \"pipe_to_system\",\n]\n_NUM_PROPS = len(_PROPS)\n\ndef _extract_props(tool_name: str, args: dict) -> list[int]:\n    \"\"\"Return indices of properties that apply to `(tool_name, args)`.\"\"\"\n    props: set[int] = set()\n    cmd = \"\"\n    path = \"\"\n    if tool_name == \"Bash\":\n        cmd = str(args.get(\"command\", \"\")).lower()\n    elif tool_name in (\"FileWrite\", \"FileEdit\"):\n        path = str(args.get(\"path\", \"\")).lower()\n        cmd = str(args.get(\"content\", \"\")).lower()\n    else:\n        props.add(0)  # read_only\n        return sorted(props)\n\n    if tool_name == \"Bash\":\n        if re.search(r\"\\b(rm|mv|dd|chmod|chown|sudo|chattr|mkfs|mount|umount)\\b\", cmd):\n            props.add(3)\n        if re.search(r\"\\b(curl|wget|nc|nmap|ssh|socat|ncat|scp|rsync|telnet)\\b\", cmd):\n            props.add(6)\n        if re.search(r\"[>|&;]\", cmd):\n            props.add(7)\n        if re.search(r\"\\brm\\s+(-[a-z]*r[a-z]*\\s+.*|.*-[a-z]*r[a-z]*\\b)\", cmd) or \"rm -rf\" in cmd:\n            props.add(5)\n        if re.search(r\"\\b(eval|exec|__import__|os\\.system|subprocess)\\s*\\(\", cmd):\n            props.add(4)\n        props.add(3)  # mutate_path (all bash commands mutate something)\n    elif tool_name == \"FileWrite\":\n        props.add(2)  # create_new\n        if re.search(r\"\\b(eval|exec|__import__|os\\.system|subprocess)\\s*\\(\", cmd) or \"rm -rf\" in cmd:\n            props.add(4)\n    elif tool_name == \"FileEdit\":\n        props.add(1)  # write_existing\n        if re.search(r\"\\b(eval|exec|__import__|os\\.system|subprocess)\\s*\\(\", cmd) or \"rm -rf\" in cmd:\n            props.add(4)\n    return sorted(props)\n\ndef _risk_from_props(props: set[int]) -> float:\n    score = 0.0\n    if 5 in props:   # recursive_delete\n        score += 0.8\n    if 6 in props:   # network_call\n        score += 0.5\n    if 7 in props:   # pipe_to_system\n        score += 0.3\n    if 4 in props:   # dangerous_word\n        score += 0.6\n    if 3 in props:   # mutate_path\n        score += 0.2\n    if 2 in props:   # create_new\n        score += 0.1\n    if 1 in props:   # write_existing\n        score += 0.15\n    if 0 in props:   # read_only\n        score = max(0.0, score - 0.3)\n    return min(1.0, score)\n\ndef _make_recurrent_fn(props: set[int]):\n    def recurrent(_params, _rng, action: np.ndarray, embedding: list[set[int]]):\n        batch_size = action.shape[0]\n        new_emb: list[set[int]] = []\n        rewards = np.zeros(batch_size, dtype=np.float32)\n        values = []\n        for i in range(batch_size):\n            a = int(action[i])\n            new_gathered = set(embedding[i])\n            if a in props:\n                new_gathered.add(a)\n            \n            # Use absolute risk as the bootstrap value, 0 reward.\n            # This ensures the root value is just the average risk of explored states.\n            values.append(_risk_from_props(new_gathered))\n            new_emb.append(new_gathered)\n            \n        discounts = np.full(batch_size, 0.99, dtype=np.float32) # Standard discount\n        logits = np.full((batch_size, _NUM_PROPS), -1e9, dtype=np.float32)\n        for i in range(batch_size):\n            # Only allow properties in 'props' that haven't been added yet\n            for p in props:\n                if p not in new_emb[i]:\n                    logits[i, p] = 1.0\n        return (\n            RecurrentFnOutput(\n                reward=rewards,\n                discount=discounts,\n                prior_logits=logits,\n                value=np.array(values, dtype=np.float32),\n            ),\n            new_emb,\n        )\n\n    return recurrent\n\nde\n```"}, {"role": "assistant", "content": "### Maintainability Review\n\n- Preserve current public behavior and add or update tests before refactoring.\n- Extract helpers only when a function has multiple reasons to change.\n- Keep domain decisions explicit through typed inputs, structured return values, and narrow exception handling.\n- Avoid cosmetic churn; make each change traceable to readability, reliability, security, or performance.\n- Validate with relevant unit tests and at least one edge case for invalid input."}], "metadata": {"source": "src/kairox00/planning/risk_classifier.py", "task_type": "code_review"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_agent.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_reads_file` verifies reads file.\n- `test_read_only` verifies read only.\n- `test_rejects_escape` verifies rejects escape.\n- `test_line_range` verifies line range.\n- `test_writes_file` verifies writes file.\n- `test_edits_unique_match` verifies edits unique match.\n- `test_rejects_non_unique` verifies rejects non unique.\n- `test_rejects_missing` verifies rejects missing.\n- `test_echo` verifies echo.\n- `test_denies_dangerous` verifies denies dangerous.\n- `test_not_read_only` verifies not read only.\n- `test_finds_pattern` verifies finds pattern.\n- `test_finds_regex` verifies finds regex.\n- `test_rejects_path_traversal` verifies rejects path traversal.\n- `test_invalid_regex` verifies invalid regex.\n- `test_rejects_nonexistent_subdir` verifies rejects nonexistent subdir.\n- `test_returns_plan` verifies returns plan.\n- `test_plan_explainable` verifies plan explainable.\n- `test_bypass` verifies bypass.\n- `test_read_only_fast_path` verifies read only fast path.\n- `test_plan_mode_blocks_writes` verifies plan mode blocks writes.\n- `test_allowlist` verifies allowlist.\n- `test_denylist` verifies denylist.\n- `test_echos_text` verifies echos text.\n- `test_triggers_bash_tool` verifies triggers bash tool.\n- `test_aborts` verifies aborts.\n- `test_detects_convergent_mode` verifies detects convergent mode.\n- `test_detects_divergent_mode` verifies detects divergent mode.\n- `test_override_replaces_everything` verifies override replaces everything.\n- `test_custom_replaces_default` verifies custom replaces default.\n- `test_default_fallback` verifies default fallback.\n- `test_append_always_added` verifies append always added.\n- `test_injected_sections_order` verifies injected sections order.\n- `test_empty_base_skipped` verifies empty base skipped.\n- `test_query_engine_uses_prompt_config` verifies query engine uses prompt config.\n- `test_help` verifies help.\n- `test_doctor` verifies doctor.\n- `test_mode` verifies mode.\n- `test_compact` verifies compact.\n- `test_exit` verifies exit.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_agent.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_analyzer.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_analyze_csv_generates_kpis_and_segment_summary` verifies analyze csv generates kpis and segment summary.\n- `test_render_report_contains_required_sections` verifies render report contains required sections.\n- `test_analyze_csv_missing_file` verifies analyze csv missing file.\n- `test_analyze_csv_empty_file` verifies analyze csv empty file.\n- `test_analyze_csv_empty_dataframe` verifies analyze csv empty dataframe.\n- `test_analyze_csv_malformed_csv` verifies analyze csv malformed csv.\n- `test_analyze_csv_auto_selects_metric_and_dimension` verifies analyze csv auto selects metric and dimension.\n- `test_analyze_csv_no_numeric_columns` verifies analyze csv no numeric columns.\n- `test_analyze_result_json_serializable` verifies analyze result json serializable.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_analyzer.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_archive.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_analyze_archive_parses_supported_sidecars` verifies analyze archive parses supported sidecars.\n- `test_analyze_archive_missing_path` verifies analyze archive missing path.\n- `test_render_archive_report_contains_sections` verifies render archive report contains sections.\n- `test_cli_archive_json_output` verifies cli archive json output.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_archive.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_ask_user.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_valid_single` verifies valid single.\n- `test_too_few_options` verifies too few options.\n- `test_too_many_options` verifies too many options.\n- `test_duplicate_questions` verifies duplicate questions.\n- `test_duplicate_labels` verifies duplicate labels.\n- `test_no_other_option` verifies no other option.\n- `test_not_a_list` verifies not a list.\n- `test_too_many_questions` verifies too many questions.\n- `test_single_select_by_index` verifies single select by index.\n- `test_single_select_by_label` verifies single select by label.\n- `test_other_custom_answer` verifies other custom answer.\n- `test_multi_select` verifies multi select.\n- `test_invalid_then_retry` verifies invalid then retry.\n- `test_multiple_questions` verifies multiple questions.\n- `test_user_cancel` verifies user cancel.\n- `test_read_only` verifies read only.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_ask_user.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_commands.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_clears_messages` verifies clears messages.\n- `test_preserves_permissions` verifies preserves permissions.\n- `test_shows_zero_cost` verifies shows zero cost.\n- `test_shows_updated_cost` verifies shows updated cost.\n- `test_no_todo_file` verifies no todo file.\n- `test_reads_todo_file` verifies reads todo file.\n- `test_unknown_command` verifies unknown command.\n- `test_clear_command` verifies clear command.\n- `test_non_slash_returns_none` verifies non slash returns none.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_commands.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_mcp.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_string` verifies string.\n- `test_integer` verifies integer.\n- `test_number` verifies number.\n- `test_boolean` verifies boolean.\n- `test_array` verifies array.\n- `test_object` verifies object.\n- `test_unknown` verifies unknown.\n- `test_union_prefers_first` verifies union prefers first.\n- `test_union_skips_null` verifies union skips null.\n- `test_load_valid` verifies load valid.\n- `test_missing_file` verifies missing file.\n- `test_none_path` verifies none path.\n- `test_discover` verifies discover.\n- `test_call_greet` verifies call greet.\n- `test_call_add` verifies call add.\n- `test_call_bad_tool` verifies call bad tool.\n- `test_schema_from_json_schema` verifies schema from json schema.\n- `test_is_read_only` verifies is read only.\n- `test_full_integration` verifies full integration.\n- `test_no_config` verifies no config.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_mcp.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_mcts.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_tree_instantiation_shapes` verifies tree instantiation shapes.\n- `test_qtransform_min_max_bounds` verifies qtransform min max bounds.\n- `test_seq_halving_table_monotonic` verifies seq halving table monotonic.\n- `test_seq_halving_score_penalizes_mismatch` verifies seq halving score penalizes mismatch.\n- `test_muzero_policy_runs_and_picks_rewarded_action` verifies muzero policy runs and picks rewarded action.\n- `test_gumbel_muzero_policy_runs` verifies gumbel muzero policy runs.\n- `test_invalid_actions_are_masked` verifies invalid actions are masked.\n- `test_qtransforms_finite` verifies qtransforms finite.\n- `test_search_batch_size_1_single_sim` verifies search batch size 1 single sim.\n- `test_search_multi_batch_parity` verifies search multi batch parity.\n- `test_backward_single_batch_propagates_to_root` verifies backward single batch propagates to root.\n- `test_analyzer_empty_csv_returns_err` verifies analyzer empty csv returns err.\n- `test_analyzer_single_row_csv` verifies analyzer single row csv.\n- `test_analyzer_all_missing_column` verifies analyzer all missing column.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_mcts.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_notebook_edit.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_replace_by_id` verifies replace by id.\n- `test_replace_by_index` verifies replace by index.\n- `test_replace_changes_cell_type` verifies replace changes cell type.\n- `test_replace_code_to_markdown` verifies replace code to markdown.\n- `test_insert_after_id` verifies insert after id.\n- `test_insert_at_beginning_no_id` verifies insert at beginning no id.\n- `test_insert_requires_cell_type` verifies insert requires cell type.\n- `test_delete_by_id` verifies delete by id.\n- `test_not_ipynb` verifies not ipynb.\n- `test_invalid_edit_mode` verifies invalid edit mode.\n- `test_invalid_cell_type` verifies invalid cell type.\n- `test_cell_id_required_for_replace` verifies cell id required for replace.\n- `test_cell_not_found` verifies cell not found.\n- `test_invalid_json` verifies invalid json.\n- `test_missing_file` verifies missing file.\n- `test_not_read_only` verifies not read only.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_notebook_edit.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_risk_classifier.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_extract_props_bash` verifies extract props bash.\n- `test_extract_props_file` verifies extract props file.\n- `test_risk_from_props` verifies risk from props.\n- `test_classify_risk_benchmarks` verifies classify risk benchmarks.\n- `test_classify_risk_edge_cases` verifies classify risk edge cases.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_risk_classifier.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_skills.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_event_mismatch` verifies event mismatch.\n- `test_tool_name_filter` verifies tool name filter.\n- `test_tool_name_filter_miss` verifies tool name filter miss.\n- `test_pattern_match` verifies pattern match.\n- `test_pattern_no_match` verifies pattern no match.\n- `test_no_skills_enabled` verifies no skills enabled.\n- `test_skill_handler_called_on_match` verifies skill handler called on match.\n- `test_disabled_skill_not_run` verifies disabled skill not run.\n- `test_cmd_skills_lists_all` verifies cmd skills lists all.\n- `test_cmd_skills_list_shows_enabled` verifies cmd skills list shows enabled.\n- `test_cmd_skill_enable_unknown` verifies cmd skill enable unknown.\n- `test_cmd_skill_disable_unknown` verifies cmd skill disable unknown.\n- `test_cmd_skill_enable_disable` verifies cmd skill enable disable.\n- `test_local_skill` verifies local skill.\n- `test_prompt_skill` verifies prompt skill.\n- `test_flag_default` verifies flag default.\n- `test_flag_toggle` verifies flag toggle.\n- `test_cmd_flag_lists` verifies cmd flag lists.\n- `test_cmd_flag_set` verifies cmd flag set.\n- `test_autotest_triggers_on_python_file_write` verifies autotest triggers on python file write.\n- `test_autotest_ignores_non_python` verifies autotest ignores non python.\n- `test_bug_alert_triggers_on_error_message` verifies bug alert triggers on error message.\n- `test_bug_alert_ignores_clean_message` verifies bug alert ignores clean message.\n- `test_performance_triggers_on_slow_keyword` verifies performance triggers on slow keyword.\n- `test_architecture_triggers_on_design_keyword` verifies architecture triggers on design keyword.\n- `test_security_triggers_on_secret_keyword` verifies security triggers on secret keyword.\n- `test_git_review_prompt_skill` verifies git review prompt skill.\n- `test_discover_skills_dirs` verifies discover skills dirs.\n- `test_unregister` verifies unregister.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_skills.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_ssrf_hardened.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_web_fetch_connection_validation_ssrf` verifies web fetch connection validation ssrf.\n- `test_web_fetch_success_public` verifies web fetch success public.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_ssrf_hardened.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_task_tools.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_create_basic` verifies create basic.\n- `test_create_with_description` verifies create with description.\n- `test_not_read_only` verifies not read only.\n- `test_empty` verifies empty.\n- `test_lists_tasks` verifies lists tasks.\n- `test_hides_deleted` verifies hides deleted.\n- `test_read_only` verifies read only.\n- `test_get_existing` verifies get existing.\n- `test_get_missing` verifies get missing.\n- `test_read_only` verifies read only.\n- `test_update_status` verifies update status.\n- `test_update_owner` verifies update owner.\n- `test_delete_task` verifies delete task.\n- `test_add_blocks` verifies add blocks.\n- `test_add_blocked_by` verifies add blocked by.\n- `test_invalid_status` verifies invalid status.\n- `test_not_read_only` verifies not read only.\n- `test_stop_task` verifies stop task.\n- `test_list_active_when_no_id` verifies list active when no id.\n- `test_no_active_tasks` verifies no active tasks.\n- `test_not_read_only` verifies not read only.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_task_tools.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_todo_write.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_round_trip` verifies round trip.\n- `test_load_missing_file` verifies load missing file.\n- `test_write_new_list` verifies write new list.\n- `test_clear_when_all_done` verifies clear when all done.\n- `test_update_existing` verifies update existing.\n- `test_invalid_status` verifies invalid status.\n- `test_not_list` verifies not list.\n- `test_not_read_only` verifies not read only.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_todo_write.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_token_budget.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_max_tokens` verifies max tokens.\n- `test_length` verifies length.\n- `test_end_turn` verifies end turn.\n- `test_tool_use` verifies tool use.\n- `test_first_continuation` verifies first continuation.\n- `test_stops_at_context_threshold` verifies stops at context threshold.\n- `test_diminishing_returns` verifies diminishing returns.\n- `test_max_continuations` verifies max continuations.\n- `test_continues_on_max_tokens` verifies continues on max tokens.\n- `test_stops_on_diminishing_returns` verifies stops on diminishing returns.\n- `test_does_not_continue_with_tool_calls` verifies does not continue with tool calls.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_token_budget.py", "task_type": "test_specification"}}
+{"messages": [{"role": "system", "content": "You are Kairox, a production-minded AI engineering assistant. Provide concise, structured, technically accurate answers grounded in the provided project context."}, {"role": "user", "content": "Derive a behavioral test specification from `tests/test_web_tools.py`."}, {"role": "assistant", "content": "### Behavioral Specification\n\n- `test_read_only` verifies read only.\n- `test_rejects_long_url` verifies rejects long url.\n- `test_rejects_bad_scheme` verifies rejects bad scheme.\n- `test_rejects_credentials` verifies rejects credentials.\n- `test_auto_upgrades_http` verifies auto upgrades http.\n- `test_fetch_plain_text` verifies fetch plain text.\n- `test_redirect_reported` verifies redirect reported.\n- `test_http_error` verifies http error.\n- `test_html_to_text_extraction` verifies html to text extraction.\n- `test_validate_url_ok` verifies validate url ok.\n- `test_validate_url_missing_host` verifies validate url missing host.\n- `test_rejects_private_ip_when_resolving` verifies rejects private ip when resolving.\n- `test_same_origin_redirect_is_followed_manually` verifies same origin redirect is followed manually.\n- `test_read_only` verifies read only.\n- `test_rejects_short_query` verifies rejects short query.\n- `test_rejects_both_domain_filters` verifies rejects both domain filters.\n- `test_search_success` verifies search success.\n- `test_search_no_results` verifies search no results.\n- `test_allowed_domains_filter` verifies allowed domains filter.\n- `test_blocked_domains_filter` verifies blocked domains filter.\n\nThese tests are regression constraints for future implementation or fine-tuning examples."}], "metadata": {"source": "tests/test_web_tools.py", "task_type": "test_specification"}}

kairox_hf_dataset_repo/eval/validate_dataset.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import json
+from pathlib import Path
+def test_jsonl_schema():
+    for path in Path("data").glob("*.jsonl"):
+        assert path.stat().st_size > 0
+        for line in path.read_text(encoding="utf-8").splitlines():
+            row = json.loads(line)
+            assert [m["role"] for m in row["messages"]] == ["system", "user", "assistant"]
+            assert row["messages"][1]["content"].strip()
+            assert row["messages"][2]["content"].strip()

kairox_hf_dataset_repo/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch>=2.3
+transformers>=4.45
+datasets>=2.20
+accelerate>=0.33
+peft>=0.12
+trl>=0.9
+bitsandbytes>=0.43
+safetensors>=0.4

kairox_hf_dataset_repo/scripts/build_dataset.py ADDED Viewed

	@@ -0,0 +1,8 @@

+from __future__ import annotations
+"""Regenerate data/*.jsonl from a local Kairox checkout.
+This generated repository already includes ready-to-train JSONL files.
+For a production dataset, adapt the generator used to create this package so
+it can run in your own CI and include human-authored gold examples.
+"""
+print("The generated data/*.jsonl files are ready. Adapt this script for your local Kairox checkout.")

kairox_hf_dataset_repo/scripts/evaluate.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", required=True)
+    parser.add_argument("--test", default="data/test.jsonl")
+    parser.add_argument("--out", default="eval/predictions.jsonl")
+    args = parser.parse_args()
+    tokenizer = AutoTokenizer.from_pretrained(args.model)
+    model = AutoModelForCausalLM.from_pretrained(args.model, device_map="auto")
+    generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
+    Path(args.out).parent.mkdir(parents=True, exist_ok=True)
+    with open(args.test, encoding="utf-8") as src, open(args.out, "w", encoding="utf-8") as dst:
+        for line in src:
+            row = json.loads(line)
+            prompt = tokenizer.apply_chat_template(
+                row["messages"][:-1], tokenize=False, add_generation_prompt=True
+            )
+            prediction = generator(prompt, max_new_tokens=512, do_sample=False)[0]["generated_text"]
+            dst.write(json.dumps({
+                "source": row.get("metadata", {}).get("source"),
+                "prediction": prediction,
+            }, ensure_ascii=False) + "\n")
+if __name__ == "__main__":
+    main()

kairox_hf_dataset_repo/scripts/train_qlora.py ADDED Viewed

	@@ -0,0 +1,99 @@

+from __future__ import annotations
+import argparse
+import torch
+from datasets import load_dataset
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
+from peft import LoraConfig, prepare_model_for_kbit_training
+from trl import SFTTrainer
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_name", default="Qwen/Qwen2.5-7B-Instruct")
+    parser.add_argument("--data_dir", default="data")
+    parser.add_argument("--output_dir", default="outputs/kairox-qlora")
+    parser.add_argument("--max_seq_length", type=int, default=4096)
+    parser.add_argument("--epochs", type=float, default=2)
+    parser.add_argument("--batch_size", type=int, default=1)
+    parser.add_argument("--grad_accum", type=int, default=16)
+    parser.add_argument("--lr", type=float, default=2e-4)
+    return parser.parse_args()
+def main():
+    args = parse_args()
+    dataset = load_dataset(
+        "json",
+        data_files={
+            "train": f"{args.data_dir}/train.jsonl",
+            "validation": f"{args.data_dir}/validation.jsonl",
+        },
+    )
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name, use_fast=True)
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    def apply_template(row):
+        row["text"] = tokenizer.apply_chat_template(
+            row["messages"], tokenize=False, add_generation_prompt=False
+        )
+        return row
+    dataset = dataset.map(apply_template)
+    quantization = BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type="nf4",
+        bnb_4bit_compute_dtype=torch.bfloat16,
+        bnb_4bit_use_double_quant=True,
+    )
+    model = AutoModelForCausalLM.from_pretrained(
+        args.model_name,
+        quantization_config=quantization,
+        device_map="auto",
+        torch_dtype=torch.bfloat16,
+    )
+    model = prepare_model_for_kbit_training(model)
+    lora = LoraConfig(
+        r=16,
+        lora_alpha=32,
+        lora_dropout=0.05,
+        bias="none",
+        task_type="CAUSAL_LM",
+        target_modules="all-linear",
+    )
+    training_args = TrainingArguments(
+        output_dir=args.output_dir,
+        num_train_epochs=args.epochs,
+        per_device_train_batch_size=args.batch_size,
+        per_device_eval_batch_size=1,
+        gradient_accumulation_steps=args.grad_accum,
+        learning_rate=args.lr,
+        lr_scheduler_type="cosine",
+        warmup_ratio=0.03,
+        logging_steps=10,
+        eval_strategy="steps",
+        eval_steps=100,
+        save_steps=100,
+        save_total_limit=2,
+        bf16=True,
+        optim="paged_adamw_8bit",
+        report_to="none",
+    )
+    trainer = SFTTrainer(
+        model=model,
+        tokenizer=tokenizer,
+        train_dataset=dataset["train"],
+        eval_dataset=dataset["validation"],
+        dataset_text_field="text",
+        max_seq_length=args.max_seq_length,
+        peft_config=lora,
+        args=training_args,
+    )
+    trainer.train()
+    trainer.save_model(args.output_dir)
+    tokenizer.save_pretrained(args.output_dir)
+if __name__ == "__main__":
+    main()