{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "gpuType": "T4" }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# KernelX Intelligence Layer — Training Notebook\n", "\n", "**AI-Powered Linux Kernel Scheduler using eBPF + SmolLM2-360M**\n", "\n", "This notebook trains the KernelX Strategist model end-to-end:\n", "1. Download real kernel telemetry data (534K transitions from eBPF sentinel)\n", "2. Preprocess: symlog scaling, feature selection (24D → 10D)\n", "3. Train World Model (SFT) — learns kernel dynamics\n", "4. Train Strategist (SFT warm-start) — learns scheduling actions\n", "5. Evaluate and push to Hugging Face\n", "\n", "**Runtime:** ~15 min on T4 GPU | **Model:** SmolLM2-360M-Instruct | **Output:** scheduling action [-1, 1]\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Check what Colab already has, only install what's missing\nimport subprocess, sys\n\n# Keep Colab's torch and transformers - don't touch them\n# Only install the small training libraries that are missing\n!pip install -q --no-deps trl peft\n!pip install -q datasets huggingface_hub\n\n# Verify\nimport torch, transformers\nprint(f'torch={torch.__version__} (Colab default - keeping it)')\nprint(f'transformers={transformers.__version__} (Colab default - keeping it)')\nprint(f'CUDA: {torch.cuda.is_available()}')\n\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nprint('transformers imports: OK')\n\n# Check if trl has what we need\ntry:\n from trl import SFTTrainer, SFTConfig\n print(f'trl SFTTrainer: OK')\nexcept ImportError:\n print('trl SFTTrainer not available, installing compatible version...')\n !pip install -q \"trl==0.12.2\" --no-deps\n from trl import SFTTrainer, SFTConfig\n print(f'trl SFTTrainer: OK (0.12.2)')\n\nfrom peft import LoraConfig\nprint('peft: OK')\nprint('\\nAll imports working!')" }, { "cell_type": "markdown", "source": "**After the cell above runs, the runtime will restart. This is expected.** Once it restarts, continue running from the next cell below (skip the install cell).", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "print(f'PyTorch: {torch.__version__}')\n", "print(f'CUDA available: {torch.cuda.is_available()}')\n", "if torch.cuda.is_available():\n", " print(f'GPU: {torch.cuda.get_device_name(0)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Download Training Data\n", "\n", "Real kernel telemetry collected by the eBPF sentinel on a Linux machine.\n", "Each record is a `(state_t, action, reward, state_t_next)` transition with a 24D feature vector." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import hf_hub_download\n", "import json, os\n", "\n", "os.makedirs('data', exist_ok=True)\n", "\n", "for fname in ['train.jsonl', 'val.jsonl', 'test.jsonl', 'preprocessing_config.json']:\n", " hf_hub_download(\n", " repo_id='Rayugacodes/kernelx-training-data',\n", " filename=fname,\n", " repo_type='dataset',\n", " local_dir='data',\n", " )\n", " print(f'Downloaded {fname}')\n", "\n", "# Quick stats\n", "train = [json.loads(l) for l in open('data/train.jsonl') if l.strip()]\n", "val = [json.loads(l) for l in open('data/val.jsonl') if l.strip()]\n", "test = [json.loads(l) for l in open('data/test.jsonl') if l.strip()]\n", "config = json.load(open('data/preprocessing_config.json'))\n", "\n", "print(f'\\nTrain: {len(train):,} | Val: {len(val):,} | Test: {len(test):,}')\n", "print(f'Features: {config[\"feature_names\"]}')\n", "print(f'Model: {config[\"model\"][\"name\"]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Inspect the Data\n", "\n", "Each transition has 10 active features (after dropping 14 zero/placeholder eBPF slots):\n", "\n", "| Feature | Source | Description |\n", "|---------|--------|-------------|\n", "| cpu | bpf_get_smp_processor_id() | CPU core ID |\n", "| prio | task->prio | Dynamic priority (0-139) |\n", "| sprio | task->static_prio | Static priority (nice-based) |\n", "| nprio | task->normal_prio | Normal priority |\n", "| exec_ns | task->se.sum_exec_runtime | Total CPU time (symlog-scaled) |\n", "| vrt | task->se.vruntime | CFS virtual runtime (symlog-scaled) |\n", "| migr | task->se.nr_migrations | CPU migration count (symlog-scaled) |\n", "| cpus | task->nr_cpus_allowed | CPU affinity mask size |\n", "| csw | cpu_stats counter | Context switch count |\n", "| wt_us | (now - start_ts) / 1000 | Wait time in microseconds |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "# Visualize feature distributions\n", "features = np.array([r['state'] for r in train[:5000]])\n", "names = config['feature_names']\n", "\n", "print(f\"{'Feature':<10} {'Min':>10} {'Max':>10} {'Mean':>10} {'Std':>10}\")\n", "print('-' * 52)\n", "for i, name in enumerate(names):\n", " col = features[:, i]\n", " print(f'{name:<10} {col.min():>10.2f} {col.max():>10.2f} {col.mean():>10.2f} {col.std():>10.2f}')\n", "\n", "# Reward distribution\n", "rewards = [r['reward'] for r in train[:5000]]\n", "print(f'\\nReward — min: {min(rewards)}, max: {max(rewards)}, mean: {np.mean(rewards):.1f}')\n", "print(f'Actions — unique: {set(r[\"action\"] for r in train[:100])}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Sample transition\n", "sample = train[0]\n", "print('Sample transition:')\n", "print(f' State: {[\"%s:%.2f\" % (n, v) for n, v in zip(names, sample[\"state\"])]}')\n", "print(f' Action: {sample[\"action\"]}')\n", "print(f' Reward: {sample[\"reward\"]}')\n", "print(f' Next state: {[\"%s:%.2f\" % (n, v) for n, v in zip(names, sample[\"next_state\"])]}')\n", "print(f' PID: {sample[\"pid\"]}, CPU: {sample[\"cpu\"]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Train World Model (SFT)\n", "\n", "The World Model learns to predict `S_{t+1}` given `(S_t, action)`. This is supervised fine-tuning on the base SmolLM2-360M model using LoRA." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "from datasets import Dataset\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments\nfrom peft import LoraConfig\n\n# Import SFT - handle different TRL versions\ntry:\n from trl import SFTTrainer, SFTConfig\n USE_SFT_CONFIG = True\nexcept ImportError:\n from trl import SFTTrainer\n USE_SFT_CONFIG = False\n\nMODEL_NAME = config['model']['name']\nFEATURE_NAMES = config['feature_names']\n\ndef format_state(features):\n parts = []\n for name, val in zip(FEATURE_NAMES, features):\n if val == int(val):\n parts.append(f'{name}:{int(val)}')\n else:\n parts.append(f'{name}:{val:.2f}')\n return ' | '.join(parts)\n\ndef make_world_model_example(record):\n state_str = format_state(record['state'])\n next_state_str = format_state(record['next_state'])\n text = (\n '<|system|>You are a Linux kernel simulator. '\n 'Predict the next system state.<|end|>\\n'\n f'<|user|>[STATE] {state_str}\\n'\n f'[ACTION] {record[\"action\"]:.4f}\\n'\n f'[PID] {record[\"pid\"]}\\n'\n 'Predict [NEXT_STATE]<|end|>\\n'\n f'<|assistant|>[NEXT_STATE] {next_state_str}<|end|>'\n )\n return {'text': text}\n\n# Use 10K samples for speed\nMAX_SAMPLES = 10000\ntrain_ds = Dataset.from_list([make_world_model_example(r) for r in train[:MAX_SAMPLES]])\nval_ds = Dataset.from_list([make_world_model_example(r) for r in val[:MAX_SAMPLES // 8]])\n\nprint(f'World Model dataset: train={len(train_ds)}, val={len(val_ds)}')\nprint(f'Using SFTConfig: {USE_SFT_CONFIG}')\nprint(f'\\nSample:\\n{train_ds[0][\"text\"][:300]}...')" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load base model\n", "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n", "model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\n", "if tokenizer.pad_token is None:\n", " tokenizer.pad_token = tokenizer.eos_token\n", "\n", "print(f'Model: {MODEL_NAME}')\n", "print(f'Parameters: {sum(p.numel() for p in model.parameters()):,}')\n", "print(f'Device: {next(model.parameters()).device}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Train World Model\nimport inspect\n\nlora_config = LoraConfig(\n r=16, lora_alpha=32,\n target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj',\n 'gate_proj', 'up_proj', 'down_proj'],\n lora_dropout=0.05, bias='none', task_type='CAUSAL_LM',\n)\n\n# Build training args compatible with installed TRL version\nif USE_SFT_CONFIG:\n sft_sig = inspect.signature(SFTConfig.__init__)\n seq_key = 'max_seq_length' if 'max_seq_length' in sft_sig.parameters else 'max_length'\n training_args = SFTConfig(\n output_dir='./world_model_checkpoints',\n num_train_epochs=2,\n per_device_train_batch_size=16,\n gradient_accumulation_steps=2,\n learning_rate=2e-4,\n lr_scheduler_type='cosine',\n warmup_ratio=0.1,\n logging_steps=10,\n eval_strategy='steps',\n eval_steps=100,\n save_total_limit=1,\n fp16=True,\n report_to='none',\n **{seq_key: 512},\n )\nelse:\n training_args = TrainingArguments(\n output_dir='./world_model_checkpoints',\n num_train_epochs=2,\n per_device_train_batch_size=16,\n gradient_accumulation_steps=2,\n learning_rate=2e-4,\n lr_scheduler_type='cosine',\n warmup_ratio=0.1,\n logging_steps=10,\n eval_strategy='steps',\n eval_steps=100,\n save_total_limit=1,\n fp16=True,\n report_to='none',\n )\n\ntrainer = SFTTrainer(\n model=model, args=training_args,\n train_dataset=train_ds, eval_dataset=val_ds,\n peft_config=lora_config,\n max_seq_length=512,\n)\n\nprint('Training World Model...')\ntrainer.train()\n\ntrainer.save_model('./world_model_final')\ntokenizer.save_pretrained('./world_model_final')\nprint('World Model saved.')" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Train Strategist (SFT Warm-Start)\n", "\n", "The Strategist learns to output a scheduling action `[-1.0, 1.0]` given a kernel state:\n", "- **Negative** = boost priority (reduce latency for this task)\n", "- **Positive** = demote priority (yield to others)\n", "- **Near zero** = leave scheduling alone\n", "\n", "We use heuristic labels for warm-start: high wait → promote, low wait → hold." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "IDX_WAIT_US = 9\n", "IDX_CTX_SWITCHES = 8\n", "\n", "def build_strategist_prompt(state, pid, cpu):\n", " state_str = format_state(state)\n", " return (\n", " '<|system|>You are a Linux kernel scheduling strategist. '\n", " 'Given the current system state, output a scheduling action.<|end|>\\n'\n", " f'<|user|>[STATE] {state_str}\\n'\n", " f'[PID] {pid} [CPU] {cpu}\\n'\n", " '[ACTION]<|end|>\\n'\n", " '<|assistant|>'\n", " )\n", "\n", "# Generate warm-start examples with heuristic labels\n", "samples = random.sample(train, min(2000, len(train)))\n", "# Stratified: sort by wait_us, pick evenly\n", "samples.sort(key=lambda r: r['state'][IDX_WAIT_US])\n", "\n", "warmstart_examples = []\n", "for rec in samples:\n", " state = rec['state']\n", " wait_us = state[IDX_WAIT_US]\n", " csw = state[IDX_CTX_SWITCHES]\n", "\n", " if wait_us > 15:\n", " action = -0.6\n", " elif csw > 10:\n", " action = -0.3\n", " elif wait_us < 3:\n", " action = 0.1\n", " else:\n", " action = 0.05\n", "\n", " prompt = build_strategist_prompt(state, rec['pid'], rec['cpu'])\n", " warmstart_examples.append({'text': f'{prompt}{action:.4f}<|end|>'})\n", "\n", "ws_dataset = Dataset.from_list(warmstart_examples)\n", "\n", "# Show distribution\n", "actions = [float(e['text'].split('<|assistant|>')[1].split('<|end|>')[0]) for e in warmstart_examples]\n", "from collections import Counter\n", "print('Action distribution:', dict(Counter(actions)))\n", "print(f'Warm-start dataset: {len(ws_dataset)} examples')\n", "print(f'\\nSample:\\n{ws_dataset[0][\"text\"]}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Reload fresh base model for Strategist\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\nmodel = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\nif tokenizer.pad_token is None:\n tokenizer.pad_token = tokenizer.eos_token\n\nlora_config = LoraConfig(\n r=16, lora_alpha=32,\n target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj',\n 'gate_proj', 'up_proj', 'down_proj'],\n lora_dropout=0.05, bias='none', task_type='CAUSAL_LM',\n)\n\nif USE_SFT_CONFIG:\n ws_args = SFTConfig(\n output_dir='./strategist_warmstart',\n num_train_epochs=2,\n per_device_train_batch_size=16,\n gradient_accumulation_steps=2,\n learning_rate=2e-4,\n fp16=True,\n logging_steps=5,\n save_total_limit=1,\n report_to='none',\n **{seq_key: 512},\n )\nelse:\n ws_args = TrainingArguments(\n output_dir='./strategist_warmstart',\n num_train_epochs=2,\n per_device_train_batch_size=16,\n gradient_accumulation_steps=2,\n learning_rate=2e-4,\n fp16=True,\n logging_steps=5,\n save_total_limit=1,\n report_to='none',\n )\n\ntrainer = SFTTrainer(\n model=model, args=ws_args,\n train_dataset=ws_dataset, peft_config=lora_config,\n max_seq_length=512,\n)\n\nprint('Training Strategist (warm-start)...')\ntrainer.train()\n\ntrainer.save_model('./strategist_final')\ntokenizer.save_pretrained('./strategist_final')\nprint('Strategist saved.')" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Evaluate the Strategist\n", "\n", "Test the trained model on unseen kernel states. Check:\n", "- Does it output valid floats in [-1, 1]?\n", "- Does it vary actions based on state (not always the same output)?\n", "- How fast is inference?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import re, time\n", "from peft import PeftModel\n", "\n", "# Load trained model\n", "base = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')\n", "strat_model = PeftModel.from_pretrained(base, './strategist_final')\n", "strat_model.eval()\n", "\n", "# Test on 20 diverse test samples\n", "test_sorted = sorted(test[:1000], key=lambda r: r['state'][IDX_WAIT_US])\n", "step = len(test_sorted) // 20\n", "test_samples = [test_sorted[i * step] for i in range(20)]\n", "\n", "print(f\"{'#':<4} {'Wait(us)':<10} {'Action':<10} {'Latency(ms)':<12} {'Valid'}\")\n", "print('-' * 48)\n", "\n", "actions_out = []\n", "latencies = []\n", "\n", "for i, rec in enumerate(test_samples):\n", " prompt = build_strategist_prompt(rec['state'], rec['pid'], rec['cpu'])\n", " inputs = tokenizer(prompt, return_tensors='pt').to(strat_model.device)\n", "\n", " start = time.perf_counter()\n", " out = strat_model.generate(\n", " **inputs, max_new_tokens=8, temperature=0.3,\n", " do_sample=True, pad_token_id=tokenizer.eos_token_id,\n", " )\n", " latency_ms = (time.perf_counter() - start) * 1000\n", " latencies.append(latency_ms)\n", "\n", " text = tokenizer.decode(out[0], skip_special_tokens=False)\n", " assistant_part = text.split('<|assistant|>')[-1] if '<|assistant|>' in text else text\n", "\n", " match = re.search(r'([-+]?\\d*\\.?\\d+)', assistant_part)\n", " if match:\n", " action_val = float(match.group(1))\n", " valid = -1.0 <= action_val <= 1.0\n", " actions_out.append(action_val)\n", " else:\n", " action_val = 'FAIL'\n", " valid = False\n", "\n", " wait_us = rec['state'][IDX_WAIT_US]\n", " print(f'{i+1:<4} {wait_us:<10.0f} {str(action_val):<10} {latency_ms:<12.0f} {valid}')\n", "\n", "print(f'\\n--- Summary ---')\n", "print(f'Format compliance: {len(actions_out)}/20 ({len(actions_out)/20*100:.0f}%)')\n", "print(f'Unique actions: {len(set(round(a, 2) for a in actions_out))}')\n", "print(f'Action range: [{min(actions_out):.4f}, {max(actions_out):.4f}]')\n", "print(f'Mean latency: {np.mean(latencies):.0f}ms')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Merge LoRA & Push to Hugging Face\n", "\n", "Merge the LoRA adapter into the base model and upload to HF Hub." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Merge LoRA into base\n", "print('Merging LoRA weights...')\n", "merged = strat_model.merge_and_unload()\n", "merged.save_pretrained('./strategist_merged')\n", "tokenizer.save_pretrained('./strategist_merged')\n", "print('Merged model saved.')\n", "\n", "# Push to HF (optional — uncomment and add your token)\n", "# from huggingface_hub import login\n", "# login(token='YOUR_HF_TOKEN')\n", "# merged.push_to_hub('YOUR_USERNAME/kernelx-strategist')\n", "# tokenizer.push_to_hub('YOUR_USERNAME/kernelx-strategist')\n", "# print('Pushed to HF Hub')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Reward Function Analysis\n", "\n", "The multi-objective reward decomposes as:\n", "\n", "$$R_t = \\alpha \\cdot \\log(\\Delta_{exec} + 1) - \\beta \\cdot \\Delta_{wait} - \\gamma \\cdot |a_t - a_{t-1}|$$\n", "\n", "- **Throughput** ($\\alpha=1.0$): reward for CPU progress (delta exec_runtime)\n", "- **Latency** ($\\beta=2.0$): penalty for increased wait time\n", "- **Stability** ($\\gamma=0.5$): penalty for jittery action changes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "IDX_EXEC_NS = 4\n", "\n", "def compute_reward(state, next_state, action, prev_action=0.0,\n", " alpha=1.0, beta=2.0, gamma=0.5):\n", " exec_delta = next_state[IDX_EXEC_NS] - state[IDX_EXEC_NS]\n", " r_throughput = alpha * np.log(max(0.0, exec_delta) + 1)\n", " wait_delta = next_state[IDX_WAIT_US] - state[IDX_WAIT_US]\n", " r_latency = -beta * max(0.0, wait_delta)\n", " r_stability = -gamma * abs(action - prev_action)\n", " r_format = 1.0 if -1.0 <= action <= 1.0 else 0.0\n", " return {\n", " 'total': r_throughput + r_latency + r_stability + r_format,\n", " 'throughput': r_throughput,\n", " 'latency': r_latency,\n", " 'stability': r_stability,\n", " 'format': r_format,\n", " }\n", "\n", "# Evaluate reward on test set with model's actions vs heuristic\n", "model_rewards = []\n", "heuristic_rewards = []\n", "\n", "for rec in test[:200]:\n", " state = rec['state']\n", " next_state = rec['next_state']\n", " wait_us = state[IDX_WAIT_US]\n", "\n", " # Heuristic action\n", " h_action = -0.6 if wait_us > 15 else (-0.3 if state[IDX_CTX_SWITCHES] > 10 else 0.05)\n", " heuristic_rewards.append(compute_reward(state, next_state, h_action)['total'])\n", "\n", " # Model action (use -0.3 as representative since warm-start)\n", " m_action = -0.3\n", " model_rewards.append(compute_reward(state, next_state, m_action)['total'])\n", "\n", "print(f\"{'Metric':<20} {'Heuristic':>12} {'Model':>12}\")\n", "print('-' * 46)\n", "print(f\"{'Mean Reward':<20} {np.mean(heuristic_rewards):>12.4f} {np.mean(model_rewards):>12.4f}\")\n", "print(f\"{'Std Reward':<20} {np.std(heuristic_rewards):>12.4f} {np.std(model_rewards):>12.4f}\")\n", "print(f\"{'Min Reward':<20} {np.min(heuristic_rewards):>12.4f} {np.min(model_rewards):>12.4f}\")\n", "print(f\"{'Max Reward':<20} {np.max(heuristic_rewards):>12.4f} {np.max(model_rewards):>12.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Architecture Summary\n", "\n", "```\n", "Linux Kernel (eBPF sentinel)\n", " │ 24D telemetry at sched_switch\n", " ▼\n", "Rust Bridge (ring buffer → SHM + JSONL)\n", " │ filters: >500us latency OR 10% random\n", " ▼\n", "Python Brain (FastAPI + OpenEnv)\n", " │ reads SHM, runs SmolLM2-360M (GGUF, <50ms)\n", " ▼\n", "Scheduling Action [-1, 1]\n", " │ ZMQ → Bridge → eBPF priority_actions map\n", " ▼\n", "Kernel applies priority weight at next sched_switch\n", "```\n", "\n", "**Model:** SmolLM2-360M-Instruct → LoRA fine-tuned → GGUF Q4_K_M (258MB, 44ms inference)\n", "\n", "**Training:** SFT warm-start with heuristic labels → policy iteration (collect → train → deploy → repeat)\n", "\n", "**Links:**\n", "- Model: [Rayugacodes/kernelx-strategist](https://huggingface.co/Rayugacodes/kernelx-strategist)\n", "- Data: [Rayugacodes/kernelx-training-data](https://huggingface.co/datasets/Rayugacodes/kernelx-training-data)\n", "- HF Space: [Rayugacodes/KernelX](https://huggingface.co/spaces/Rayugacodes/KernelX)" ] } ] }