Rayugacodes commited on
Commit
7574947
·
verified ·
1 Parent(s): 0b6fd4f

Update README with latest

Browse files
Files changed (1) hide show
  1. README.md +148 -137
README.md CHANGED
@@ -1,199 +1,210 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
 
 
 
 
 
 
 
 
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
61
 
62
- [More Information Needed]
 
 
 
 
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
 
 
 
 
83
 
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
87
 
88
- #### Preprocessing [optional]
 
 
 
89
 
90
- [More Information Needed]
 
 
91
 
 
 
 
92
 
93
- #### Training Hyperparameters
 
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
100
 
101
- [More Information Needed]
 
 
 
 
 
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
 
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
124
 
125
- [More Information Needed]
 
 
126
 
127
- ### Results
 
128
 
129
- [More Information Needed]
 
 
130
 
131
- #### Summary
132
 
 
133
 
 
 
 
 
 
 
 
 
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
156
 
157
- [More Information Needed]
158
 
159
- ### Compute Infrastructure
 
 
 
 
 
 
 
160
 
161
- [More Information Needed]
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
+ # KernelX
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ **An OpenEnv-compliant world-modeling environment for Linux kernel scheduling.**
4
 
5
+ KernelX teaches a 360-million-parameter language model to make Linux scheduling decisions in real time. An eBPF sentinel extracts a 24-dimensional state vector at every context switch, a learned World Model predicts the consequences of each action, and a GRPO-trained Strategist outputs scheduling nudges in 44 milliseconds on a laptop CPU.
6
 
7
+ Built for the Meta PyTorch OpenEnv Hackathon 2026 Theme 3.1, World Modeling.
8
 
9
+ ## Try it now
10
 
11
+ | | |
12
+ |---|---|
13
+ | **Live environment** | [huggingface.co/spaces/Rayugacodes/KernelX](https://huggingface.co/spaces/Rayugacodes/KernelX) |
14
+ | **Training notebook (free T4)** | [KernelX_Training.ipynb](https://colab.research.google.com/github/pie-314/KernelX/blob/main/KernelX_Training.ipynb) |
15
+ | **Trained model** | [Rayugacodes/kernelx-strategist](https://huggingface.co/Rayugacodes/kernelx-strategist) |
16
+ | **Training data (534K transitions)** | [Rayugacodes/kernelx-training-data](https://huggingface.co/datasets/Rayugacodes/kernelx-training-data) |
17
+ | **Blog post** | *The Digital Traffic Jam.md* |
18
+ | **Demo video (2 min)** | *[YouTube link]* |
19
+ | **Performance report** | [training/PERFORMANCE.md](training/PERFORMANCE.md) |
20
 
21
+ ## What this environment is
22
 
23
+ KernelX gives an LLM agent a partially-observable view of a real Linux kernel and asks it to learn scheduling policy from interaction. The agent observes a 24-dimensional telemetry vector, takes a single scalar action between -1 and +1, and the next state comes from a World Model trained on real kernel transitions.
24
 
25
+ It is an OpenEnv environment. The standard `reset()` / `step(action)` / `state` interface works the way you expect. Plug in TRL, Stable Baselines, or any RL loop — the environment doesn't care.
26
 
27
+ ```python
28
+ from brain.client import KernelXClient
29
 
30
+ env = KernelXClient(url="https://your-space.hf.space")
31
+ obs = env.reset()
32
+ obs = env.step(action=0.5) # nudge a process priority
33
+ score = env.evaluate() # OpenEnv-compliant grading
34
+ ```
35
 
36
+ ## Why it's interesting to train an LLM on
37
 
38
+ Kernel scheduling is a domain where the "right" action is not obvious from the immediate observation, where mistakes cascade through subsequent states, and where the cost function (latency, throughput, fairness) involves real trade-offs. An agent that learns to schedule well must build a causal model of how its priority adjustments propagate through the scheduler's internal state — exactly the kind of world-modeling capability Theme 3.1 targets.
39
 
40
+ Compared to most RL environments LLMs get trained on, this one has three properties that we think make it useful:
41
 
42
+ The **state space is real**. The 24D observation is what an eBPF program actually extracts at `sched_switch`: priorities, virtual runtime, migration counts, wait time. We collected 534,134 of these from a real Linux machine under mixed workloads. There is no toy MDP underneath.
43
 
44
+ The **dynamics are learned**. The World Model is a SmolLM2-360M fine-tune that predicts `S_{t+1}` given `(S_t, a_t)`. The Strategist trains against the World Model, not against a recorded replay. This means the agent's actions actually drive state transitions during training — the standard RL contract.
45
 
46
+ The **reward decomposes**. We don't optimize a single number. The reward is the sum of a throughput term, a latency penalty, a stability penalty, and a format reward. Each component is independently inspectable, which makes debugging tractable and makes reward-hacking visible when it happens.
47
 
48
+ ## Architecture
49
 
50
+ ```
51
+ Linux kernel (eBPF sentinel)
52
+ ↓ 24D telemetry vector at every sched_switch
53
+ Rust bridge (lockless ring buffer → /dev/shm + JSONL)
54
+ ↓ filtered: wait_us > 500 OR 10% random sample
55
+ Python brain (FastAPI + OpenEnv server)
56
+ ↓ World Model predicts next state given (state, action)
57
+ ↓ Strategist outputs action ∈ [-1, +1]
58
+ ZMQ → Bridge → eBPF priority_actions map
59
+
60
+ Kernel applies the nudge at the next context switch
61
+ ```
62
 
63
+ Five components, each in its native language:
64
 
65
+ - `kernel/` — eBPF C program (`sentinel.bpf.c`) attached to `sched_wakeup` and raw `sched_switch` tracepoints. Extracts the 24D vector, ships it through a `BPF_MAP_TYPE_RINGBUF`. The actuator side reads from a `priority_actions` hash map.
66
+ - `bridge/` — Rust userspace process built on Aya. Reads the ring buffer, mirrors state to shared memory at sub-millisecond latency, persists trajectories to JSONL, listens on ZMQ for actions from the brain. Optionally writes through to RadishDB (the team's WAL-backed key-value store) for durable trajectory storage.
67
+ - `brain/` — Python OpenEnv server. Implements the `Environment` interface. Loads the trained GGUF Strategist, runs inference, talks to the bridge over ZMQ. Includes an `LLMGrader` for OpenEnv-compliant scoring and a `/reload-policy` endpoint for hot-swapping models without downtime.
68
+ - `training/` — Full ML pipeline. Preprocessing (symlog scaling, 10D active-feature extraction), World Model SFT, Strategist warm-start + GRPO, GGUF export, policy iteration, baseline comparison.
69
+ - `ui/` — Ratatui terminal HUD. Reads the same shared memory as the brain, renders live telemetry, AI reasoning, and reward sparklines at 10 Hz.
70
 
71
+ ## The training pipeline
72
 
73
+ ```bash
74
+ # 1. Preprocess raw kernel transitions
75
+ python -m training.data.preprocess --input data/state_transitions.jsonl
76
 
77
+ # 2. Train the World Model (SFT — predicts S_{t+1} | S_t, a_t)
78
+ python -m training.models.train_world_model \
79
+ --train-data training/data/train.jsonl \
80
+ --val-data training/data/val.jsonl
81
 
82
+ # 3. Train the Strategist (warm-start SFT + GRPO against the World Model)
83
+ python -m training.models.train_strategist \
84
+ --train-data training/data/train.jsonl
85
 
86
+ # 4. Export to GGUF for sub-50ms CPU inference
87
+ python -m training.models.export_gguf \
88
+ --adapter-path training/models/strategist_final
89
 
90
+ # 5. Closed-loop policy iteration: collect → train → deploy → repeat
91
+ python -m training.policy_iteration \
92
+ --trajectories-path data/trajectories.jsonl
93
+ ```
94
 
95
+ The full pipeline runs on a free Colab T4. See [`KernelX_Training.ipynb`](KernelX_Training.ipynb).
96
 
97
+ ## Reward function
98
 
99
+ ```
100
+ R_t = α · log(Δ_exec + 1) − β · max(0, Δ_wait) − γ · |a_t − a_{t-1}| + format_reward
101
+ ```
102
 
103
+ | Component | Weight | Signal | Range |
104
+ |---|---|---|---|
105
+ | Throughput | α = 1.0 | log of CPU-time progress | [0, ~10] |
106
+ | Latency penalty | β = 2.0 | per-microsecond increase in wait time | (-∞, 0] |
107
+ | Stability penalty | γ = 0.5 | absolute action change between steps | [-1, 0] |
108
+ | Format reward | 1.0 | action ∈ [-1, +1] | {0, 1} |
109
 
110
+ The format reward is what stops the agent from outputting nonsense — every other component still applies if it does, but losing the format point is a hard signal during early GRPO. The stability term is what stops the agent from oscillating. The latency term is the actual objective. The throughput term keeps the agent from learning that "do nothing forever" is a local optimum.
111
 
112
+ ## Results
113
 
114
+ **World Model (Stage 2 SFT).** The model learns the kernel's default dynamics from 10K transitions in 2 epochs. Loss dropped from 2.05 → 0.29, token-level prediction accuracy from 61% → 91%. *[Plot: training/plots/world_model_training.png]*
115
 
116
+ **Strategist warm-start (Stage 3a SFT).** Teaches the model the output format before RL begins. Loss 2.13 → 0.28, 100% format compliance. *[Plot: training/plots/strategist_warmstart_training.png]*
117
 
118
+ **Strategist GRPO (Stage 3b RL).** Trained against the World Model simulator. The trained policy achieves higher cumulative reward than both the random-action baseline and the hand-written heuristic policy on held-out test states. *[Plot: training/plots/grpo_training.png — to be regenerated against World-Model simulator]*
119
 
120
+ **Inference.** The Q4_K_M-quantized GGUF model is 258MB and runs in 44ms warm-cache on a laptop CPU.
121
 
122
+ For full numbers and per-iteration breakdowns: [`training/PERFORMANCE.md`](training/PERFORMANCE.md).
123
 
124
+ ## Running locally
125
 
126
+ The full kernel→bridge→brain stack requires a Linux machine with kernel BTF support and root access. The OpenEnv environment alone (which is what judges interact with) runs anywhere — the HF Space is the easiest path.
127
 
128
+ ```bash
129
+ # Step 1: Load the eBPF sentinel (Linux only, requires sudo)
130
+ cd kernel && sudo make load
131
 
132
+ # Step 2: Start the Rust bridge
133
+ cargo run --manifest-path bridge/Cargo.toml --release -- --record
134
 
135
+ # Step 3: Start the OpenEnv server
136
+ export PYTHONPATH=$PYTHONPATH:.
137
+ python3 -m brain.server.app
138
 
139
+ # Step 4: Run the autonomous policy loop
140
+ python3 -m brain.server.run_autonomous --steps 50 --verbose
141
 
142
+ # Step 5: Launch the HUD
143
+ cargo run --manifest-path ui/Cargo.toml --release
144
+ ```
145
 
146
+ If the eBPF stack isn't available, the brain server falls back to a simulator and the UI runs in `MOCK DEMO` mode.
147
 
148
+ ## Model details
149
 
150
+ | | |
151
+ |---|---|
152
+ | Base model | SmolLM2-360M-Instruct |
153
+ | Fine-tuning | LoRA (r=16, α=32) on q/k/v/o + gate/up/down |
154
+ | Quantization | GGUF Q4_K_M (258MB) |
155
+ | Inference latency | 44ms warm-cache, CPU |
156
+ | Action space | single float ∈ [-1.0, +1.0] |
157
+ | Observation | 10 active features extracted from 24D eBPF vector |
158
+ | Target hardware | i3 CPU laptop, sub-50ms decision budget |
159
 
160
+ ## Shared-memory contract
161
 
162
+ The UI and the brain both read from `/dev/shm/kernelx_state`:
163
 
164
+ ```rust
165
+ #[repr(C, packed)]
166
+ struct HUDState {
167
+ features: [u64; 24], // 24D telemetry vector
168
+ current_action: f32, // most recent AI action
169
+ active_pid: u32, // process being scheduled
170
+ is_clamped: u32, // safety auditor flag
171
+ reasoning: [u8; 128], // explanation string
172
+ p99_wait_us: u64, // P99 wait latency
173
+ core_heat: [f32; 4], // per-core utilization
174
+ model_confidence: f32,
175
+ world_model_drift: f32,
176
+ radish_wal_size: u64,
177
+ radish_dirty_pages: u32,
178
+ }
179
+ ```
180
 
181
+ Total: 376 bytes, packed C layout, byte-identical between Rust and Python.
182
 
183
+ ## What we'd do with more time
184
 
185
+ **Reward normalization.** Wait-delta values can hit 89,000 microseconds, which dominates the reward and risks gradient explosion in GRPO. Clipping the latency penalty to a fixed range (or scaling by p95 wait time) would stabilize training.
186
 
187
+ **PMU features.** Fourteen of the 24 feature slots are reserved for hardware performance counters (IPC, cache misses, branch mispredictions). Populating them via `perf_event_open` would give the agent much richer state, especially for distinguishing "CPU-bound but progressing" from "CPU-bound and thrashing."
 
 
 
 
188
 
189
+ **Multi-process reasoning.** The current Strategist acts on one PID at a time. A multi-agent extension where each PID has its own agent — or a centralized agent reasoning about process *interactions* — is the natural next step.
190
 
191
+ **Real GRPO on real telemetry.** The current setup trains GRPO against the learned World Model. With more compute, training could close the loop by collecting fresh trajectories under the trained policy and re-training — proper online RL on a real system.
192
 
193
+ ## Citation
194
 
195
+ ```
196
+ @misc{kernelx2026,
197
+ title = {KernelX: An OpenEnv World-Modeling Environment for Linux Kernel Scheduling},
198
+ author = {Naman Gupta and team},
199
+ year = {2026},
200
+ note = {Meta PyTorch OpenEnv Hackathon}
201
+ }
202
+ ```
203
 
204
+ ## License
205
 
206
+ MIT. RadishDB sub-component is also MIT (see `RadishDB/LICENSE`).
207
 
208
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
 
210
+ *KernelX Meta PyTorch OpenEnv Hackathon 2026 — Theme 3.1, World Modeling*