# LeanMigrate Task Reference LeanMigrate models a real engineering workflow: a developer must migrate production business logic from one language to another, then prove correctness against a Lean 4 formal specification. Each task is structured as a small function dependency graph. The agent must migrate each function in dependency order, run sample tests, and pass Lean verification before moving to the next. --- ## Overview | # | Task ID | Migration | Difficulty | Functions | Proof? | | -- | --------------------- | -------------------- | ---------- | --------- | ------ | | 1 | `rbac_auth` | Python → TypeScript | Easy | 3 | No | | 2 | `pricing_engine` | JavaScript → Python | Medium | 5 | No | | 3 | `path_canonicalizer` | C → Rust | Medium | 4 | Yes | | 4 | `expression_eval` | C → Rust | Medium | 3 | Yes | | 5 | `payment_saga` | JavaScript → Python | Hard | 4 | Yes | | 6 | `lru_cache` | C++ → Rust | Hard | 4 | Yes | | 7 | `shortest_path` | C++ → Rust | Hard | 3 | Yes | | 8 | `interval_scheduler` | C++ → Rust | Hard | 3 | Yes | --- ## Task 1 — RBAC Authorization Engine (`rbac_auth`) **Difficulty:** Easy | **Migration:** Python → TypeScript | **Max Steps:** 15 ### What It Is A role-based access control (RBAC) engine that models permissions as a directed inheritance graph. Roles can inherit permissions from parent roles, enabling hierarchical access policies (e.g. `admin` inherits from `editor` which inherits from `viewer`). ### Why It Matters RBAC is the access control backbone of most enterprise software. Migrating it correctly and proving its behavior is critical: a bug in permission checking is a security vulnerability. ### Functions to Migrate | Function | Description | Depends On | | -------------------- | ----------------------------------------------------------------- | ------------------------------------ | | `findRole` | Look up a role by name from the role list | — | | `hasDirectPermission`| Check if a role has a specific resource/action permission directly | — | | `canAccess` | Check permission with role inheritance traversal, depth limit 5 | `findRole`, `hasDirectPermission` | ### What Makes It Interesting The inheritance traversal is bounded (depth ≤ 5) to prevent cycles from causing infinite loops. The Lean spec expresses this via a `depth : Nat` countdown parameter. Agents must replicate this exact bounding behavior in TypeScript, using `function` declarations (not arrow functions) so tree-sitter can extract them for the verification IR. ### Lean Spec Excerpt ```lean def canAccess (roles : RoleMap) (roleName resource action : String) (depth : Nat := 5) : Bool := match depth, findRole roles roleName with | 0, _ => false | _, none => false | n + 1, some role => hasDirectPermission role resource action || role.inherits.any (fun parentName => canAccess roles parentName resource action n) ``` ### Scoring * 3 functions × (1/3) reward each on successful Lean verification * No proof obligation — functional correctness only --- ## Task 2 — E-Commerce Pricing Engine (`pricing_engine`) **Difficulty:** Medium | **Migration:** JavaScript → Python | **Max Steps:** 25 ### What It Is A JavaScript pricing engine that computes the final price of an e-commerce order in **integer cents** (no floating point). The engine handles line item subtotals, coupon discounts capped at 50% of subtotal, loyalty point discounts capped at 10% of subtotal, and region-based tax rates in basis points. ### Why It Matters Pricing bugs cost real money. Integer arithmetic and correct discount-capping semantics are notoriously tricky to migrate across languages. The Lean spec pins the exact integer semantics so the agent cannot accidentally introduce floating-point rounding. ### Functions to Migrate | Function | Description | Depends On | | --------------- | --------------------------------------------------------- | --------------------------------------------- | | `subtotal` | Sum of unit price × quantity for all line items | — | | `taxRateBps` | Tax rate in basis points for a given region ID | — | | `couponDiscount`| Coupon discount capped at 50% of subtotal | `subtotal` | | `loyaltyDiscount`| Loyalty points discount capped at 10% of subtotal | `subtotal` | | `finalPrice` | Final price = subtotal − discounts + tax | `subtotal`, `couponDiscount`, `loyaltyDiscount`, `taxRateBps` | ### What Makes It Interesting The discount cap logic uses integer division (`//`), not `round()` or `ceil()`. Agents that reach for floating-point will fail Lean verification. The dependency chain is linear (subtotal → discounts → tax → final price), giving a natural migration order that the `analyze_deps` action surfaces. ### Lean Spec Excerpt ```lean def couponDiscount (order : Order) : Int := let sub := subtotal order let rawDiscount := order.coupons.foldl (fun total coupon => total + (sub * coupon.discountPercent / 100)) 0 min rawDiscount (sub / 2) ``` ### Scoring * 5 functions × (1/5) reward each on successful Lean verification * No proof obligation — functional correctness only --- ## Task 3 — Path Canonicalizer (`path_canonicalizer`) **Difficulty:** Medium | **Migration:** C → Rust | **Max Steps:** 25 ### What It Is A C path normalization library migrated to Rust. Paths are represented as lists of string segments (not raw strings). The normalizer removes `.` segments and resolves `..` clamped at the root — you cannot go above `/`. ### Why It Matters Path canonicalization is a security-critical operation. Incorrect `..` resolution leads to path traversal vulnerabilities. The Lean spec formally proves idempotency: normalizing an already-normalized path produces the same path. ### Functions to Migrate | Function | Description | Depends On | | ------------------ | -------------------------------------------------------- | --------------- | | `normalizePath` | Remove `.`, resolve `..` clamped at root | — | | `joinPaths` | Join two segment lists: normalize(base ++ rel) | `normalizePath` | | `pathDepth` | Count the number of segments in a path | — | | `canonicalizeProof`| **Lean proof** that `normalize(normalize(p)) = normalize(p)` | `normalizePath` | ### What Makes It Interesting This is the first C → Rust task. Agents must write Rust with `serde_json` deserialization. The proof obligation (idempotency) is the first formal proof task — agents must reason about the list-based normalization algorithm in Lean rather than just implementing it. ### Lean Spec Excerpt ```lean def normalizePath (p : PathSpec.Path) : PathSpec.Path := normalizeAux [] p -- Proof obligation: idempotency -- ∀ p, normalizePath (normalizePath p) = normalizePath p ``` ### Scoring * 3 runtime functions × (1/4) reward each on Lean verification * 1 proof × (1/4) reward on proof acceptance * Proof obligation: `weight_proof = 0.2` --- ## Task 4 — Expression Evaluator (`expression_eval`) **Difficulty:** Medium | **Migration:** C → Rust | **Max Steps:** 25 ### What It Is A C recursive-descent expression evaluator migrated to Rust. Expressions are binary trees with literal leaves and binary operator nodes (`+`, `-`, `*`, `/`). Division by zero propagates as `None` (Rust) / `none` (Lean). ### Why It Matters Safe arithmetic evaluation is a common primitive in configuration languages, rule engines, and calculators. The Lean spec formalizes the `None`-propagation contract so the agent cannot silently swallow divide-by-zero. ### Functions to Migrate | Function | Description | Depends On | | --------------- | ---------------------------------------------------------- | ----------- | | `evalBinOp` | Evaluate a binary operation; return `None` for `/ 0` | — | | `evalExpr` | Recursively evaluate an expression tree; propagate `None` | `evalBinOp` | | `divisionProof` | **Lean proof** that `evalBinOp .Div a 0 = none` for all `a` | `evalBinOp` | ### What Makes It Interesting The proof obligation is compact and tractable (`by simp [evalBinOp]`), making it a good first proof for agents exploring the proof mechanics. The expression tree is passed as a `serde_json::Value` from the test harness, requiring careful recursive deserialization. ### Lean Spec Excerpt ```lean def evalBinOp (op : ExprSpec.Op) (a b : Int) : Option Int := match op with | .Add => some (a + b) | .Sub => some (a - b) | .Mul => some (a * b) | .Div => if b = 0 then none else some (a / b) -- Proof obligation: ∀ a, evalBinOp .Div a 0 = none ``` ### Scoring * 2 runtime functions × (1/3) reward each * 1 proof × (1/3) reward on proof acceptance * Proof obligation: `weight_proof = 0.2` --- ## Task 5 — Payment Saga State Machine (`payment_saga`) **Difficulty:** Hard | **Migration:** JavaScript → Python | **Max Steps:** 40 ### What It Is A JavaScript Express.js payment saga migrated to Python. A saga is a compensating transaction pattern: it progresses through states (`Idle → Reserved → Authorized → Captured → Settled`) and supports rollback transitions (`CompensateCapture → CompensateAuthorize → CompensateReserve → Compensated`). The agent must also prove the **no-double-charge theorem**: the happy path reaches `Settled` and `isCharged` returns `true` exactly once. ### Why It Matters Payment systems are among the most heavily regulated and correctness-critical software. Proving no-double-charge is not just a unit test — it is a machine-checked guarantee that the state machine cannot reach a charged state more than once via the happy path. ### Functions to Migrate | Function | Description | Depends On | | --------------------- | ---------------------------------------------------- | ---------------------------------- | | `transition` | Pure state transition for one event | — | | `runSaga` | Fold a sequence of events into the final saga state | `transition` | | `isCharged` | Return `true` for `Captured` or `Settled` | — | | `no_double_charge_proof` | **Lean proof** of happy-path no-double-charge theorem | `transition`, `runSaga`, `isCharged` | ### What Makes It Interesting This is the hardest proof in the easy/medium tier. The proof must reference `happyPath` and pre-proved lemmas (`happy_path_settles`) from the `SagaSpec` module. Agents that try to `native_decide` the entire proof will hit timeout; the intended approach uses `constructor`, `exact`, and `simp [isCharged]`. ### Lean Spec Excerpt ```lean theorem no_double_charge : runSaga happyPath = .Settled ∧ isCharged (runSaga happyPath) = true := by constructor · exact happy_path_settles · rw [happy_path_settles] simp [isCharged] ``` ### Scoring * 3 runtime functions × (1/4) reward each * 1 proof × (1/4) reward on proof acceptance * Proof obligation: `weight_proof = 0.3` --- ## Task 6 — LRU Cache (`lru_cache`) **Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40 ### What It Is A C++ LRU cache (backed by `std::list` + `unordered_map`) migrated to a Rust `Vec<(u64, u64)>` where the **front is MRU**. The agent must implement eviction, put (insert/update with MRU promotion), get (lookup with MRU promotion), and prove the capacity bound. ### Why It Matters LRU caches are ubiquitous in systems programming. The Lean spec pins the capacity invariant formally, ensuring the implementation never silently exceeds its capacity — a property that is hard to test exhaustively but trivial to prove once the algorithm is correct. ### Functions to Migrate | Function | Description | Depends On | | ------------ | ---------------------------------------------------------------- | ---------- | | `lruEvict` | Trim cache to `cap` entries (keep front = MRU, drop LRU tail) | — | | `lruPut` | Insert/update key-val; move to MRU front; evict if over capacity | `lruEvict` | | `lruGet` | Lookup key; on hit, move to MRU front; return `(value, cache)` | — | | `lruPutProof`| **Lean proof** that `(lruPut cache cap k v).length ≤ cap` | `lruPut` | ### What Makes It Interesting The Rust representation (`Vec<(u64,u64)>`) differs structurally from the C++ `std::list + unordered_map`. Agents must re-derive the algorithm from the Lean spec, not just translate the C++ line by line. The proof obligation is an inequality bound, requiring `omega` or `simp` with list length lemmas. ### Lean Spec Excerpt ```lean def lruPut (cache : LruSpec.CacheState) (cap : Nat) (key val : Nat) : LruSpec.CacheState := ((key, val) :: cache.filter (fun p => p.1 ≠ key)).take cap -- Proof obligation: ∀ cache cap k v, (lruPut cache cap k v).length ≤ cap ``` ### Scoring * 3 runtime functions × (1/4) reward each * 1 proof × (1/4) reward on proof acceptance * Proof obligation: `weight_proof = 0.2` --- ## Task 7 — Shortest Path / Dijkstra (`shortest_path`) **Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40 ### What It Is A C++ Dijkstra (Bellman-Ford relaxation loop) migrated to Rust. Input is a list of weighted directed edges `(from, to, weight)`. Output is a `Vec>` of length `n` where `None` means unreachable. The agent must also prove that the self-distance `shortestDist graph n src src = some 0` when `src < n`. ### Why It Matters Shortest-path algorithms appear in routing, logistics, game AI, and network analysis. The Lean spec uses a Bellman-Ford style relaxation (not a priority queue), making the formal model tractable while the Rust implementation can use any correct strategy. ### Functions to Migrate | Function | Description | Depends On | | -------------- | -------------------------------------------------------------------- | ---------- | | `dijkstra` | Single-source shortest paths; returns `Vec>` of length n | — | | `shortestDist` | Shortest distance from `src` to `dst`, or `None` if unreachable | `dijkstra` | | `relaxProof` | **Lean proof** that `shortestDist graph n src src = some 0` | `shortestDist` | ### What Makes It Interesting The Lean spec models Dijkstra as iterated Bellman-Ford relaxation — `n-1` passes over all edges. Agents writing Rust can use a proper priority queue for efficiency, but the formal model in Lean is the ground truth for correctness. The proof requires reasoning about the initial distance vector and the `src < n` guard. ### Lean Spec Excerpt ```lean def dijkstra (graph : ShortestPathSpec.Graph) (n src : Nat) : List (Option Nat) := let init := (List.range n).map (fun i => if i == src then some 0 else none) (List.range (n - 1)).foldl (fun d _ => relaxPass d graph) init -- Proof obligation: ∀ graph n src, src < n → shortestDist graph n src src = some 0 ``` ### Scoring * 2 runtime functions × (1/3) reward each * 1 proof × (1/3) reward on proof acceptance * Proof obligation: `weight_proof = 0.2` --- ## Task 8 — Interval Scheduler (`interval_scheduler`) **Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40 ### What It Is A C++ interval scheduling library migrated to Rust. Two algorithms: (1) **merge overlapping intervals** (sort by start, scan and merge), and (2) **greedy activity selection** (sort by end time, pick non-overlapping intervals to maximize count). The agent must also prove the merge output is sorted and non-overlapping. ### Why It Matters Interval scheduling is the foundation of calendar systems, resource allocators, and database range scans. The merge-then-prove pattern is a canonical example of verifying a sorting + merging algorithm — a class of problems where off-by-one errors are common and hard to catch with finite tests. ### Functions to Migrate | Function | Description | Depends On | | ---------------- | ---------------------------------------------------------------------- | ---------- | | `mergeIntervals` | Sort by start, merge overlapping/touching intervals | — | | `maxSchedule` | Greedy activity selection: sort by end time, pick non-overlapping | — | | `mergeProof` | **Lean proof** that `mergeIntervals` output is sorted + non-overlapping | `mergeIntervals` | ### What Makes It Interesting Both functions are independent, which means the agent can migrate them in either order. The proof obligation (`isSorted ∧ noOverlap`) is a conjunction that requires two separate sub-proofs, making it the most structurally complex proof in the suite. Agents must use the `constructor` tactic and prove each branch using the Lean definitions of `isSorted` and `noOverlap` from `IntervalSpec`. ### Lean Spec Excerpt ```lean def mergeIntervals (ivs : List IntervalSpec.Interval) : List IntervalSpec.Interval := (sortByStart ivs).foldl (fun acc iv => match acc.getLast? with | none => [iv] | some (ls, le) => if iv.1 ≤ le then dropLastInterval acc ++ [(ls, max le iv.2)] else acc ++ [iv]) [] -- Proof obligation: ∀ ivs, isSorted (mergeIntervals ivs) ∧ noOverlap (mergeIntervals ivs) ``` ### Scoring * 2 runtime functions × (1/3) reward each * 1 proof × (1/3) reward on proof acceptance * Proof obligation: `weight_proof = 0.2` --- ## Reward Design ### Philosophy Every action provides signal — there is no sparse-reward problem: 1. **Exploration signal**: `inspect` and `analyze_deps` give small positive rewards (+0.05) so agents learn to gather information before attempting implementation. 2. **Functional signal**: `run_tests` gives +0.10 on full pass, with proportional penalties on partial failure (−0.01 per failing case). This guides the agent toward correct behavior before spending Lean compilation time. 3. **Verification signal**: `submit` gives a large positive reward (`1 / n_functions`) when Lean accepts the code, and a small penalty (−0.05) on rejection. The penalty is small enough not to discourage reasonable attempts, but nonzero to penalize spamming. 4. **Proof signal**: Proof tasks split the reward between functional implementation and the proof, with explicit weight fields (`weight_functional`, `weight_property`, `weight_proof`) configurable per task. ### Score Clamping Internal episode scores are clamped to `[0.01, 0.99]` — this is the "open unit" convention from OpenEnv that keeps scores away from the extremes, which some validators interpret specially. Step reward logs clamp to `[0.0, 1.0]`. ### Episode Termination An episode ends when: * All functions are verified (`progress = 1.0`), **or** * The step limit is reached (`step_count ≥ max_steps`) The step limit scales with difficulty: 15 steps for easy tasks, 25 for medium, 40 for hard. --- ## Design Notes for Evaluators **Why Lean 4 as the verifier?** Lean 4 provides machine-checked guarantees that go beyond unit tests. A submission that passes sample cases but violates the formal spec is rejected. This makes it impossible to "cheat" with a hardcoded lookup table that only works on known inputs. **Why multiple language pairs?** Cross-language migration is harder than same-language refactoring. Each language pair exercises different aspects of the agent's capability: type system awareness, idiomatic pattern translation, and error handling conventions. **Why incremental function-level migration?** Submitting the entire file at once would give the agent no partial-credit signal and would make Lean errors much harder to diagnose. Function-level granularity mirrors real code review and makes the reward landscape smooth. **Difficulty calibration:** - Easy: one language pair, no proofs, short dependency chains, small functions - Medium: two language pairs (including C→Rust), optional proofs, medium chains - Hard: C++→Rust with complex data structures, mandatory proofs, long chains