{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "provenance": [],
   "gpuType": "T4"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  },
  "accelerator": "GPU"
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KernelX Intelligence Layer — Training Notebook\n",
    "\n",
    "**AI-Powered Linux Kernel Scheduler using eBPF + SmolLM2-360M**\n",
    "\n",
    "This notebook trains the KernelX Strategist model end-to-end:\n",
    "1. Download real kernel telemetry data (534K transitions from eBPF sentinel)\n",
    "2. Preprocess: symlog scaling, feature selection (24D → 10D)\n",
    "3. Train World Model (SFT) — learns kernel dynamics\n",
    "4. Train Strategist (SFT warm-start) — learns scheduling actions\n",
    "5. Evaluate and push to Hugging Face\n",
    "\n",
    "**Runtime:** ~15 min on T4 GPU | **Model:** SmolLM2-360M-Instruct | **Output:** scheduling action [-1, 1]\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Check what Colab already has, only install what's missing\nimport subprocess, sys\n\n# Keep Colab's torch and transformers - don't touch them\n# Only install the small training libraries that are missing\n!pip install -q --no-deps trl peft\n!pip install -q datasets huggingface_hub\n\n# Verify\nimport torch, transformers\nprint(f'torch={torch.__version__} (Colab default - keeping it)')\nprint(f'transformers={transformers.__version__} (Colab default - keeping it)')\nprint(f'CUDA: {torch.cuda.is_available()}')\n\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nprint('transformers imports: OK')\n\n# Check if trl has what we need\ntry:\n    from trl import SFTTrainer, SFTConfig\n    print(f'trl SFTTrainer: OK')\nexcept ImportError:\n    print('trl SFTTrainer not available, installing compatible version...')\n    !pip install -q \"trl==0.12.2\" --no-deps\n    from trl import SFTTrainer, SFTConfig\n    print(f'trl SFTTrainer: OK (0.12.2)')\n\nfrom peft import LoraConfig\nprint('peft: OK')\nprint('\\nAll imports working!')"
  },
  {
   "cell_type": "markdown",
   "source": "**After the cell above runs, the runtime will restart. This is expected.** Once it restarts, continue running from the next cell below (skip the install cell).",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "print(f'PyTorch: {torch.__version__}')\n",
    "print(f'CUDA available: {torch.cuda.is_available()}')\n",
    "if torch.cuda.is_available():\n",
    "    print(f'GPU: {torch.cuda.get_device_name(0)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Download Training Data\n",
    "\n",
    "Real kernel telemetry collected by the eBPF sentinel on a Linux machine.\n",
    "Each record is a `(state_t, action, reward, state_t_next)` transition with a 24D feature vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import hf_hub_download\n",
    "import json, os\n",
    "\n",
    "os.makedirs('data', exist_ok=True)\n",
    "\n",
    "for fname in ['train.jsonl', 'val.jsonl', 'test.jsonl', 'preprocessing_config.json']:\n",
    "    hf_hub_download(\n",
    "        repo_id='Rayugacodes/kernelx-training-data',\n",
    "        filename=fname,\n",
    "        repo_type='dataset',\n",
    "        local_dir='data',\n",
    "    )\n",
    "    print(f'Downloaded {fname}')\n",
    "\n",
    "# Quick stats\n",
    "train = [json.loads(l) for l in open('data/train.jsonl') if l.strip()]\n",
    "val = [json.loads(l) for l in open('data/val.jsonl') if l.strip()]\n",
    "test = [json.loads(l) for l in open('data/test.jsonl') if l.strip()]\n",
    "config = json.load(open('data/preprocessing_config.json'))\n",
    "\n",
    "print(f'\\nTrain: {len(train):,} | Val: {len(val):,} | Test: {len(test):,}')\n",
    "print(f'Features: {config[\"feature_names\"]}')\n",
    "print(f'Model: {config[\"model\"][\"name\"]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Inspect the Data\n",
    "\n",
    "Each transition has 10 active features (after dropping 14 zero/placeholder eBPF slots):\n",
    "\n",
    "| Feature | Source | Description |\n",
    "|---------|--------|-------------|\n",
    "| cpu | bpf_get_smp_processor_id() | CPU core ID |\n",
    "| prio | task->prio | Dynamic priority (0-139) |\n",
    "| sprio | task->static_prio | Static priority (nice-based) |\n",
    "| nprio | task->normal_prio | Normal priority |\n",
    "| exec_ns | task->se.sum_exec_runtime | Total CPU time (symlog-scaled) |\n",
    "| vrt | task->se.vruntime | CFS virtual runtime (symlog-scaled) |\n",
    "| migr | task->se.nr_migrations | CPU migration count (symlog-scaled) |\n",
    "| cpus | task->nr_cpus_allowed | CPU affinity mask size |\n",
    "| csw | cpu_stats counter | Context switch count |\n",
    "| wt_us | (now - start_ts) / 1000 | Wait time in microseconds |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Visualize feature distributions\n",
    "features = np.array([r['state'] for r in train[:5000]])\n",
    "names = config['feature_names']\n",
    "\n",
    "print(f\"{'Feature':<10} {'Min':>10} {'Max':>10} {'Mean':>10} {'Std':>10}\")\n",
    "print('-' * 52)\n",
    "for i, name in enumerate(names):\n",
    "    col = features[:, i]\n",
    "    print(f'{name:<10} {col.min():>10.2f} {col.max():>10.2f} {col.mean():>10.2f} {col.std():>10.2f}')\n",
    "\n",
    "# Reward distribution\n",
    "rewards = [r['reward'] for r in train[:5000]]\n",
    "print(f'\\nReward — min: {min(rewards)}, max: {max(rewards)}, mean: {np.mean(rewards):.1f}')\n",
    "print(f'Actions — unique: {set(r[\"action\"] for r in train[:100])}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sample transition\n",
    "sample = train[0]\n",
    "print('Sample transition:')\n",
    "print(f'  State:      {[\"%s:%.2f\" % (n, v) for n, v in zip(names, sample[\"state\"])]}')\n",
    "print(f'  Action:     {sample[\"action\"]}')\n",
    "print(f'  Reward:     {sample[\"reward\"]}')\n",
    "print(f'  Next state: {[\"%s:%.2f\" % (n, v) for n, v in zip(names, sample[\"next_state\"])]}')\n",
    "print(f'  PID: {sample[\"pid\"]}, CPU: {sample[\"cpu\"]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Train World Model (SFT)\n",
    "\n",
    "The World Model learns to predict `S_{t+1}` given `(S_t, action)`. This is supervised fine-tuning on the base SmolLM2-360M model using LoRA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from datasets import Dataset\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments\nfrom peft import LoraConfig\n\n# Import SFT - handle different TRL versions\ntry:\n    from trl import SFTTrainer, SFTConfig\n    USE_SFT_CONFIG = True\nexcept ImportError:\n    from trl import SFTTrainer\n    USE_SFT_CONFIG = False\n\nMODEL_NAME = config['model']['name']\nFEATURE_NAMES = config['feature_names']\n\ndef format_state(features):\n    parts = []\n    for name, val in zip(FEATURE_NAMES, features):\n        if val == int(val):\n            parts.append(f'{name}:{int(val)}')\n        else:\n            parts.append(f'{name}:{val:.2f}')\n    return ' | '.join(parts)\n\ndef make_world_model_example(record):\n    state_str = format_state(record['state'])\n    next_state_str = format_state(record['next_state'])\n    text = (\n        '<|system|>You are a Linux kernel simulator. '\n        'Predict the next system state.<|end|>\\n'\n        f'<|user|>[STATE] {state_str}\\n'\n        f'[ACTION] {record[\"action\"]:.4f}\\n'\n        f'[PID] {record[\"pid\"]}\\n'\n        'Predict [NEXT_STATE]<|end|>\\n'\n        f'<|assistant|>[NEXT_STATE] {next_state_str}<|end|>'\n    )\n    return {'text': text}\n\n# Use 10K samples for speed\nMAX_SAMPLES = 10000\ntrain_ds = Dataset.from_list([make_world_model_example(r) for r in train[:MAX_SAMPLES]])\nval_ds = Dataset.from_list([make_world_model_example(r) for r in val[:MAX_SAMPLES // 8]])\n\nprint(f'World Model dataset: train={len(train_ds)}, val={len(val_ds)}')\nprint(f'Using SFTConfig: {USE_SFT_CONFIG}')\nprint(f'\\nSample:\\n{train_ds[0][\"text\"][:300]}...')"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load base model\n",
    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
    "model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\n",
    "if tokenizer.pad_token is None:\n",
    "    tokenizer.pad_token = tokenizer.eos_token\n",
    "\n",
    "print(f'Model: {MODEL_NAME}')\n",
    "print(f'Parameters: {sum(p.numel() for p in model.parameters()):,}')\n",
    "print(f'Device: {next(model.parameters()).device}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Train World Model\nimport inspect\n\nlora_config = LoraConfig(\n    r=16, lora_alpha=32,\n    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj',\n                     'gate_proj', 'up_proj', 'down_proj'],\n    lora_dropout=0.05, bias='none', task_type='CAUSAL_LM',\n)\n\n# Build training args compatible with installed TRL version\nif USE_SFT_CONFIG:\n    sft_sig = inspect.signature(SFTConfig.__init__)\n    seq_key = 'max_seq_length' if 'max_seq_length' in sft_sig.parameters else 'max_length'\n    training_args = SFTConfig(\n        output_dir='./world_model_checkpoints',\n        num_train_epochs=2,\n        per_device_train_batch_size=16,\n        gradient_accumulation_steps=2,\n        learning_rate=2e-4,\n        lr_scheduler_type='cosine',\n        warmup_ratio=0.1,\n        logging_steps=10,\n        eval_strategy='steps',\n        eval_steps=100,\n        save_total_limit=1,\n        fp16=True,\n        report_to='none',\n        **{seq_key: 512},\n    )\nelse:\n    training_args = TrainingArguments(\n        output_dir='./world_model_checkpoints',\n        num_train_epochs=2,\n        per_device_train_batch_size=16,\n        gradient_accumulation_steps=2,\n        learning_rate=2e-4,\n        lr_scheduler_type='cosine',\n        warmup_ratio=0.1,\n        logging_steps=10,\n        eval_strategy='steps',\n        eval_steps=100,\n        save_total_limit=1,\n        fp16=True,\n        report_to='none',\n    )\n\ntrainer = SFTTrainer(\n    model=model, args=training_args,\n    train_dataset=train_ds, eval_dataset=val_ds,\n    peft_config=lora_config,\n    max_seq_length=512,\n)\n\nprint('Training World Model...')\ntrainer.train()\n\ntrainer.save_model('./world_model_final')\ntokenizer.save_pretrained('./world_model_final')\nprint('World Model saved.')"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Train Strategist (SFT Warm-Start)\n",
    "\n",
    "The Strategist learns to output a scheduling action `[-1.0, 1.0]` given a kernel state:\n",
    "- **Negative** = boost priority (reduce latency for this task)\n",
    "- **Positive** = demote priority (yield to others)\n",
    "- **Near zero** = leave scheduling alone\n",
    "\n",
    "We use heuristic labels for warm-start: high wait → promote, low wait → hold."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "IDX_WAIT_US = 9\n",
    "IDX_CTX_SWITCHES = 8\n",
    "\n",
    "def build_strategist_prompt(state, pid, cpu):\n",
    "    state_str = format_state(state)\n",
    "    return (\n",
    "        '<|system|>You are a Linux kernel scheduling strategist. '\n",
    "        'Given the current system state, output a scheduling action.<|end|>\\n'\n",
    "        f'<|user|>[STATE] {state_str}\\n'\n",
    "        f'[PID] {pid} [CPU] {cpu}\\n'\n",
    "        '[ACTION]<|end|>\\n'\n",
    "        '<|assistant|>'\n",
    "    )\n",
    "\n",
    "# Generate warm-start examples with heuristic labels\n",
    "samples = random.sample(train, min(2000, len(train)))\n",
    "# Stratified: sort by wait_us, pick evenly\n",
    "samples.sort(key=lambda r: r['state'][IDX_WAIT_US])\n",
    "\n",
    "warmstart_examples = []\n",
    "for rec in samples:\n",
    "    state = rec['state']\n",
    "    wait_us = state[IDX_WAIT_US]\n",
    "    csw = state[IDX_CTX_SWITCHES]\n",
    "\n",
    "    if wait_us > 15:\n",
    "        action = -0.6\n",
    "    elif csw > 10:\n",
    "        action = -0.3\n",
    "    elif wait_us < 3:\n",
    "        action = 0.1\n",
    "    else:\n",
    "        action = 0.05\n",
    "\n",
    "    prompt = build_strategist_prompt(state, rec['pid'], rec['cpu'])\n",
    "    warmstart_examples.append({'text': f'{prompt}{action:.4f}<|end|>'})\n",
    "\n",
    "ws_dataset = Dataset.from_list(warmstart_examples)\n",
    "\n",
    "# Show distribution\n",
    "actions = [float(e['text'].split('<|assistant|>')[1].split('<|end|>')[0]) for e in warmstart_examples]\n",
    "from collections import Counter\n",
    "print('Action distribution:', dict(Counter(actions)))\n",
    "print(f'Warm-start dataset: {len(ws_dataset)} examples')\n",
    "print(f'\\nSample:\\n{ws_dataset[0][\"text\"]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Reload fresh base model for Strategist\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\nmodel = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\nif tokenizer.pad_token is None:\n    tokenizer.pad_token = tokenizer.eos_token\n\nlora_config = LoraConfig(\n    r=16, lora_alpha=32,\n    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj',\n                     'gate_proj', 'up_proj', 'down_proj'],\n    lora_dropout=0.05, bias='none', task_type='CAUSAL_LM',\n)\n\nif USE_SFT_CONFIG:\n    ws_args = SFTConfig(\n        output_dir='./strategist_warmstart',\n        num_train_epochs=2,\n        per_device_train_batch_size=16,\n        gradient_accumulation_steps=2,\n        learning_rate=2e-4,\n        fp16=True,\n        logging_steps=5,\n        save_total_limit=1,\n        report_to='none',\n        **{seq_key: 512},\n    )\nelse:\n    ws_args = TrainingArguments(\n        output_dir='./strategist_warmstart',\n        num_train_epochs=2,\n        per_device_train_batch_size=16,\n        gradient_accumulation_steps=2,\n        learning_rate=2e-4,\n        fp16=True,\n        logging_steps=5,\n        save_total_limit=1,\n        report_to='none',\n    )\n\ntrainer = SFTTrainer(\n    model=model, args=ws_args,\n    train_dataset=ws_dataset, peft_config=lora_config,\n    max_seq_length=512,\n)\n\nprint('Training Strategist (warm-start)...')\ntrainer.train()\n\ntrainer.save_model('./strategist_final')\ntokenizer.save_pretrained('./strategist_final')\nprint('Strategist saved.')"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Evaluate the Strategist\n",
    "\n",
    "Test the trained model on unseen kernel states. Check:\n",
    "- Does it output valid floats in [-1, 1]?\n",
    "- Does it vary actions based on state (not always the same output)?\n",
    "- How fast is inference?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re, time\n",
    "from peft import PeftModel\n",
    "\n",
    "# Load trained model\n",
    "base = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\n",
    "strat_model = PeftModel.from_pretrained(base, './strategist_final')\n",
    "strat_model.eval()\n",
    "\n",
    "# Test on 20 diverse test samples\n",
    "test_sorted = sorted(test[:1000], key=lambda r: r['state'][IDX_WAIT_US])\n",
    "step = len(test_sorted) // 20\n",
    "test_samples = [test_sorted[i * step] for i in range(20)]\n",
    "\n",
    "print(f\"{'#':<4} {'Wait(us)':<10} {'Action':<10} {'Latency(ms)':<12} {'Valid'}\")\n",
    "print('-' * 48)\n",
    "\n",
    "actions_out = []\n",
    "latencies = []\n",
    "\n",
    "for i, rec in enumerate(test_samples):\n",
    "    prompt = build_strategist_prompt(rec['state'], rec['pid'], rec['cpu'])\n",
    "    inputs = tokenizer(prompt, return_tensors='pt').to(strat_model.device)\n",
    "\n",
    "    start = time.perf_counter()\n",
    "    out = strat_model.generate(\n",
    "        **inputs, max_new_tokens=8, temperature=0.3,\n",
    "        do_sample=True, pad_token_id=tokenizer.eos_token_id,\n",
    "    )\n",
    "    latency_ms = (time.perf_counter() - start) * 1000\n",
    "    latencies.append(latency_ms)\n",
    "\n",
    "    text = tokenizer.decode(out[0], skip_special_tokens=False)\n",
    "    assistant_part = text.split('<|assistant|>')[-1] if '<|assistant|>' in text else text\n",
    "\n",
    "    match = re.search(r'([-+]?\\d*\\.?\\d+)', assistant_part)\n",
    "    if match:\n",
    "        action_val = float(match.group(1))\n",
    "        valid = -1.0 <= action_val <= 1.0\n",
    "        actions_out.append(action_val)\n",
    "    else:\n",
    "        action_val = 'FAIL'\n",
    "        valid = False\n",
    "\n",
    "    wait_us = rec['state'][IDX_WAIT_US]\n",
    "    print(f'{i+1:<4} {wait_us:<10.0f} {str(action_val):<10} {latency_ms:<12.0f} {valid}')\n",
    "\n",
    "print(f'\\n--- Summary ---')\n",
    "print(f'Format compliance: {len(actions_out)}/20 ({len(actions_out)/20*100:.0f}%)')\n",
    "print(f'Unique actions: {len(set(round(a, 2) for a in actions_out))}')\n",
    "print(f'Action range: [{min(actions_out):.4f}, {max(actions_out):.4f}]')\n",
    "print(f'Mean latency: {np.mean(latencies):.0f}ms')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Merge LoRA & Push to Hugging Face\n",
    "\n",
    "Merge the LoRA adapter into the base model and upload to HF Hub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge LoRA into base\n",
    "print('Merging LoRA weights...')\n",
    "merged = strat_model.merge_and_unload()\n",
    "merged.save_pretrained('./strategist_merged')\n",
    "tokenizer.save_pretrained('./strategist_merged')\n",
    "print('Merged model saved.')\n",
    "\n",
    "# Push to HF (optional — uncomment and add your token)\n",
    "# from huggingface_hub import login\n",
    "# login(token='YOUR_HF_TOKEN')\n",
    "# merged.push_to_hub('YOUR_USERNAME/kernelx-strategist')\n",
    "# tokenizer.push_to_hub('YOUR_USERNAME/kernelx-strategist')\n",
    "# print('Pushed to HF Hub')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Reward Function Analysis\n",
    "\n",
    "The multi-objective reward decomposes as:\n",
    "\n",
    "$$R_t = \\alpha \\cdot \\log(\\Delta_{exec} + 1) - \\beta \\cdot \\Delta_{wait} - \\gamma \\cdot |a_t - a_{t-1}|$$\n",
    "\n",
    "- **Throughput** ($\\alpha=1.0$): reward for CPU progress (delta exec_runtime)\n",
    "- **Latency** ($\\beta=2.0$): penalty for increased wait time\n",
    "- **Stability** ($\\gamma=0.5$): penalty for jittery action changes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "IDX_EXEC_NS = 4\n",
    "\n",
    "def compute_reward(state, next_state, action, prev_action=0.0,\n",
    "                   alpha=1.0, beta=2.0, gamma=0.5):\n",
    "    exec_delta = next_state[IDX_EXEC_NS] - state[IDX_EXEC_NS]\n",
    "    r_throughput = alpha * np.log(max(0.0, exec_delta) + 1)\n",
    "    wait_delta = next_state[IDX_WAIT_US] - state[IDX_WAIT_US]\n",
    "    r_latency = -beta * max(0.0, wait_delta)\n",
    "    r_stability = -gamma * abs(action - prev_action)\n",
    "    r_format = 1.0 if -1.0 <= action <= 1.0 else 0.0\n",
    "    return {\n",
    "        'total': r_throughput + r_latency + r_stability + r_format,\n",
    "        'throughput': r_throughput,\n",
    "        'latency': r_latency,\n",
    "        'stability': r_stability,\n",
    "        'format': r_format,\n",
    "    }\n",
    "\n",
    "# Evaluate reward on test set with model's actions vs heuristic\n",
    "model_rewards = []\n",
    "heuristic_rewards = []\n",
    "\n",
    "for rec in test[:200]:\n",
    "    state = rec['state']\n",
    "    next_state = rec['next_state']\n",
    "    wait_us = state[IDX_WAIT_US]\n",
    "\n",
    "    # Heuristic action\n",
    "    h_action = -0.6 if wait_us > 15 else (-0.3 if state[IDX_CTX_SWITCHES] > 10 else 0.05)\n",
    "    heuristic_rewards.append(compute_reward(state, next_state, h_action)['total'])\n",
    "\n",
    "    # Model action (use -0.3 as representative since warm-start)\n",
    "    m_action = -0.3\n",
    "    model_rewards.append(compute_reward(state, next_state, m_action)['total'])\n",
    "\n",
    "print(f\"{'Metric':<20} {'Heuristic':>12} {'Model':>12}\")\n",
    "print('-' * 46)\n",
    "print(f\"{'Mean Reward':<20} {np.mean(heuristic_rewards):>12.4f} {np.mean(model_rewards):>12.4f}\")\n",
    "print(f\"{'Std Reward':<20} {np.std(heuristic_rewards):>12.4f} {np.std(model_rewards):>12.4f}\")\n",
    "print(f\"{'Min Reward':<20} {np.min(heuristic_rewards):>12.4f} {np.min(model_rewards):>12.4f}\")\n",
    "print(f\"{'Max Reward':<20} {np.max(heuristic_rewards):>12.4f} {np.max(model_rewards):>12.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Architecture Summary\n",
    "\n",
    "```\n",
    "Linux Kernel (eBPF sentinel)\n",
    "    │ 24D telemetry at sched_switch\n",
    "    ▼\n",
    "Rust Bridge (ring buffer → SHM + JSONL)\n",
    "    │ filters: >500us latency OR 10% random\n",
    "    ▼\n",
    "Python Brain (FastAPI + OpenEnv)\n",
    "    │ reads SHM, runs SmolLM2-360M (GGUF, <50ms)\n",
    "    ▼\n",
    "Scheduling Action [-1, 1]\n",
    "    │ ZMQ → Bridge → eBPF priority_actions map\n",
    "    ▼\n",
    "Kernel applies priority weight at next sched_switch\n",
    "```\n",
    "\n",
    "**Model:** SmolLM2-360M-Instruct → LoRA fine-tuned → GGUF Q4_K_M (258MB, 44ms inference)\n",
    "\n",
    "**Training:** SFT warm-start with heuristic labels → policy iteration (collect → train → deploy → repeat)\n",
    "\n",
    "**Links:**\n",
    "- Model: [Rayugacodes/kernelx-strategist](https://huggingface.co/Rayugacodes/kernelx-strategist)\n",
    "- Data: [Rayugacodes/kernelx-training-data](https://huggingface.co/datasets/Rayugacodes/kernelx-training-data)\n",
    "- HF Space: [Rayugacodes/KernelX](https://huggingface.co/spaces/Rayugacodes/KernelX)"
   ]
  }
 ]
}