# LeanMigrate Task Reference

LeanMigrate models a real engineering workflow: a developer must migrate production business logic from one language to another, then prove correctness against a Lean 4 formal specification. Each task is structured as a small function dependency graph. The agent must migrate each function in dependency order, run sample tests, and pass Lean verification before moving to the next.

---

## Overview

| #  | Task ID               | Migration            | Difficulty | Functions | Proof? |
| -- | --------------------- | -------------------- | ---------- | --------- | ------ |
| 1  | `rbac_auth`           | Python → TypeScript  | Easy       | 3         | No     |
| 2  | `pricing_engine`      | JavaScript → Python  | Medium     | 5         | No     |
| 3  | `path_canonicalizer`  | C → Rust             | Medium     | 4         | Yes    |
| 4  | `expression_eval`     | C → Rust             | Medium     | 3         | Yes    |
| 5  | `payment_saga`        | JavaScript → Python  | Hard       | 4         | Yes    |
| 6  | `lru_cache`           | C++ → Rust           | Hard       | 4         | Yes    |
| 7  | `shortest_path`       | C++ → Rust           | Hard       | 3         | Yes    |
| 8  | `interval_scheduler`  | C++ → Rust           | Hard       | 3         | Yes    |

---

## Task 1 — RBAC Authorization Engine (`rbac_auth`)

**Difficulty:** Easy | **Migration:** Python → TypeScript | **Max Steps:** 15

### What It Is

A role-based access control (RBAC) engine that models permissions as a directed inheritance graph. Roles can inherit permissions from parent roles, enabling hierarchical access policies (e.g. `admin` inherits from `editor` which inherits from `viewer`).

### Why It Matters

RBAC is the access control backbone of most enterprise software. Migrating it correctly and proving its behavior is critical: a bug in permission checking is a security vulnerability.

### Functions to Migrate

| Function             | Description                                                       | Depends On                           |
| -------------------- | ----------------------------------------------------------------- | ------------------------------------ |
| `findRole`           | Look up a role by name from the role list                         | —                                    |
| `hasDirectPermission`| Check if a role has a specific resource/action permission directly | —                                    |
| `canAccess`          | Check permission with role inheritance traversal, depth limit 5   | `findRole`, `hasDirectPermission`    |

### What Makes It Interesting

The inheritance traversal is bounded (depth ≤ 5) to prevent cycles from causing infinite loops. The Lean spec expresses this via a `depth : Nat` countdown parameter. Agents must replicate this exact bounding behavior in TypeScript, using `function` declarations (not arrow functions) so tree-sitter can extract them for the verification IR.

### Lean Spec Excerpt

```lean
def canAccess (roles : RoleMap) (roleName resource action : String) (depth : Nat := 5) : Bool :=
  match depth, findRole roles roleName with
  | 0, _ => false
  | _, none => false
  | n + 1, some role =>
    hasDirectPermission role resource action ||
    role.inherits.any (fun parentName =>
      canAccess roles parentName resource action n)
```

### Scoring

* 3 functions × (1/3) reward each on successful Lean verification
* No proof obligation — functional correctness only

---

## Task 2 — E-Commerce Pricing Engine (`pricing_engine`)

**Difficulty:** Medium | **Migration:** JavaScript → Python | **Max Steps:** 25

### What It Is

A JavaScript pricing engine that computes the final price of an e-commerce order in **integer cents** (no floating point). The engine handles line item subtotals, coupon discounts capped at 50% of subtotal, loyalty point discounts capped at 10% of subtotal, and region-based tax rates in basis points.

### Why It Matters

Pricing bugs cost real money. Integer arithmetic and correct discount-capping semantics are notoriously tricky to migrate across languages. The Lean spec pins the exact integer semantics so the agent cannot accidentally introduce floating-point rounding.

### Functions to Migrate

| Function        | Description                                               | Depends On                                    |
| --------------- | --------------------------------------------------------- | --------------------------------------------- |
| `subtotal`      | Sum of unit price × quantity for all line items           | —                                             |
| `taxRateBps`    | Tax rate in basis points for a given region ID            | —                                             |
| `couponDiscount`| Coupon discount capped at 50% of subtotal                 | `subtotal`                                    |
| `loyaltyDiscount`| Loyalty points discount capped at 10% of subtotal        | `subtotal`                                    |
| `finalPrice`    | Final price = subtotal − discounts + tax                  | `subtotal`, `couponDiscount`, `loyaltyDiscount`, `taxRateBps` |

### What Makes It Interesting

The discount cap logic uses integer division (`//`), not `round()` or `ceil()`. Agents that reach for floating-point will fail Lean verification. The dependency chain is linear (subtotal → discounts → tax → final price), giving a natural migration order that the `analyze_deps` action surfaces.

### Lean Spec Excerpt

```lean
def couponDiscount (order : Order) : Int :=
  let sub := subtotal order
  let rawDiscount := order.coupons.foldl (fun total coupon =>
    total + (sub * coupon.discountPercent / 100)) 0
  min rawDiscount (sub / 2)
```

### Scoring

* 5 functions × (1/5) reward each on successful Lean verification
* No proof obligation — functional correctness only

---

## Task 3 — Path Canonicalizer (`path_canonicalizer`)

**Difficulty:** Medium | **Migration:** C → Rust | **Max Steps:** 25

### What It Is

A C path normalization library migrated to Rust. Paths are represented as lists of string segments (not raw strings). The normalizer removes `.` segments and resolves `..` clamped at the root — you cannot go above `/`.

### Why It Matters

Path canonicalization is a security-critical operation. Incorrect `..` resolution leads to path traversal vulnerabilities. The Lean spec formally proves idempotency: normalizing an already-normalized path produces the same path.

### Functions to Migrate

| Function           | Description                                              | Depends On      |
| ------------------ | -------------------------------------------------------- | --------------- |
| `normalizePath`    | Remove `.`, resolve `..` clamped at root                 | —               |
| `joinPaths`        | Join two segment lists: normalize(base ++ rel)           | `normalizePath` |
| `pathDepth`        | Count the number of segments in a path                   | —               |
| `canonicalizeProof`| **Lean proof** that `normalize(normalize(p)) = normalize(p)` | `normalizePath` |

### What Makes It Interesting

This is the first C → Rust task. Agents must write Rust with `serde_json` deserialization. The proof obligation (idempotency) is the first formal proof task — agents must reason about the list-based normalization algorithm in Lean rather than just implementing it.

### Lean Spec Excerpt

```lean
def normalizePath (p : PathSpec.Path) : PathSpec.Path := normalizeAux [] p

-- Proof obligation: idempotency
-- ∀ p, normalizePath (normalizePath p) = normalizePath p
```

### Scoring

* 3 runtime functions × (1/4) reward each on Lean verification
* 1 proof × (1/4) reward on proof acceptance
* Proof obligation: `weight_proof = 0.2`

---

## Task 4 — Expression Evaluator (`expression_eval`)

**Difficulty:** Medium | **Migration:** C → Rust | **Max Steps:** 25

### What It Is

A C recursive-descent expression evaluator migrated to Rust. Expressions are binary trees with literal leaves and binary operator nodes (`+`, `-`, `*`, `/`). Division by zero propagates as `None` (Rust) / `none` (Lean).

### Why It Matters

Safe arithmetic evaluation is a common primitive in configuration languages, rule engines, and calculators. The Lean spec formalizes the `None`-propagation contract so the agent cannot silently swallow divide-by-zero.

### Functions to Migrate

| Function        | Description                                                | Depends On  |
| --------------- | ---------------------------------------------------------- | ----------- |
| `evalBinOp`     | Evaluate a binary operation; return `None` for `/ 0`       | —           |
| `evalExpr`      | Recursively evaluate an expression tree; propagate `None`  | `evalBinOp` |
| `divisionProof` | **Lean proof** that `evalBinOp .Div a 0 = none` for all `a` | `evalBinOp` |

### What Makes It Interesting

The proof obligation is compact and tractable (`by simp [evalBinOp]`), making it a good first proof for agents exploring the proof mechanics. The expression tree is passed as a `serde_json::Value` from the test harness, requiring careful recursive deserialization.

### Lean Spec Excerpt

```lean
def evalBinOp (op : ExprSpec.Op) (a b : Int) : Option Int :=
  match op with
  | .Add => some (a + b)
  | .Sub => some (a - b)
  | .Mul => some (a * b)
  | .Div => if b = 0 then none else some (a / b)

-- Proof obligation: ∀ a, evalBinOp .Div a 0 = none
```

### Scoring

* 2 runtime functions × (1/3) reward each
* 1 proof × (1/3) reward on proof acceptance
* Proof obligation: `weight_proof = 0.2`

---

## Task 5 — Payment Saga State Machine (`payment_saga`)

**Difficulty:** Hard | **Migration:** JavaScript → Python | **Max Steps:** 40

### What It Is

A JavaScript Express.js payment saga migrated to Python. A saga is a compensating transaction pattern: it progresses through states (`Idle → Reserved → Authorized → Captured → Settled`) and supports rollback transitions (`CompensateCapture → CompensateAuthorize → CompensateReserve → Compensated`). The agent must also prove the **no-double-charge theorem**: the happy path reaches `Settled` and `isCharged` returns `true` exactly once.

### Why It Matters

Payment systems are among the most heavily regulated and correctness-critical software. Proving no-double-charge is not just a unit test — it is a machine-checked guarantee that the state machine cannot reach a charged state more than once via the happy path.

### Functions to Migrate

| Function              | Description                                          | Depends On                         |
| --------------------- | ---------------------------------------------------- | ---------------------------------- |
| `transition`          | Pure state transition for one event                  | —                                  |
| `runSaga`             | Fold a sequence of events into the final saga state  | `transition`                       |
| `isCharged`           | Return `true` for `Captured` or `Settled`            | —                                  |
| `no_double_charge_proof` | **Lean proof** of happy-path no-double-charge theorem | `transition`, `runSaga`, `isCharged` |

### What Makes It Interesting

This is the hardest proof in the easy/medium tier. The proof must reference `happyPath` and pre-proved lemmas (`happy_path_settles`) from the `SagaSpec` module. Agents that try to `native_decide` the entire proof will hit timeout; the intended approach uses `constructor`, `exact`, and `simp [isCharged]`.

### Lean Spec Excerpt

```lean
theorem no_double_charge :
    runSaga happyPath = .Settled ∧ isCharged (runSaga happyPath) = true := by
  constructor
  · exact happy_path_settles
  · rw [happy_path_settles]
    simp [isCharged]
```

### Scoring

* 3 runtime functions × (1/4) reward each
* 1 proof × (1/4) reward on proof acceptance
* Proof obligation: `weight_proof = 0.3`

---

## Task 6 — LRU Cache (`lru_cache`)

**Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40

### What It Is

A C++ LRU cache (backed by `std::list` + `unordered_map`) migrated to a Rust `Vec<(u64, u64)>` where the **front is MRU**. The agent must implement eviction, put (insert/update with MRU promotion), get (lookup with MRU promotion), and prove the capacity bound.

### Why It Matters

LRU caches are ubiquitous in systems programming. The Lean spec pins the capacity invariant formally, ensuring the implementation never silently exceeds its capacity — a property that is hard to test exhaustively but trivial to prove once the algorithm is correct.

### Functions to Migrate

| Function     | Description                                                      | Depends On |
| ------------ | ---------------------------------------------------------------- | ---------- |
| `lruEvict`   | Trim cache to `cap` entries (keep front = MRU, drop LRU tail)    | —          |
| `lruPut`     | Insert/update key-val; move to MRU front; evict if over capacity | `lruEvict` |
| `lruGet`     | Lookup key; on hit, move to MRU front; return `(value, cache)`   | —          |
| `lruPutProof`| **Lean proof** that `(lruPut cache cap k v).length ≤ cap`        | `lruPut`   |

### What Makes It Interesting

The Rust representation (`Vec<(u64,u64)>`) differs structurally from the C++ `std::list + unordered_map`. Agents must re-derive the algorithm from the Lean spec, not just translate the C++ line by line. The proof obligation is an inequality bound, requiring `omega` or `simp` with list length lemmas.

### Lean Spec Excerpt

```lean
def lruPut (cache : LruSpec.CacheState) (cap : Nat) (key val : Nat) : LruSpec.CacheState :=
  ((key, val) :: cache.filter (fun p => p.1 ≠ key)).take cap

-- Proof obligation: ∀ cache cap k v, (lruPut cache cap k v).length ≤ cap
```

### Scoring

* 3 runtime functions × (1/4) reward each
* 1 proof × (1/4) reward on proof acceptance
* Proof obligation: `weight_proof = 0.2`

---

## Task 7 — Shortest Path / Dijkstra (`shortest_path`)

**Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40

### What It Is

A C++ Dijkstra (Bellman-Ford relaxation loop) migrated to Rust. Input is a list of weighted directed edges `(from, to, weight)`. Output is a `Vec<Option<u64>>` of length `n` where `None` means unreachable. The agent must also prove that the self-distance `shortestDist graph n src src = some 0` when `src < n`.

### Why It Matters

Shortest-path algorithms appear in routing, logistics, game AI, and network analysis. The Lean spec uses a Bellman-Ford style relaxation (not a priority queue), making the formal model tractable while the Rust implementation can use any correct strategy.

### Functions to Migrate

| Function       | Description                                                          | Depends On |
| -------------- | -------------------------------------------------------------------- | ---------- |
| `dijkstra`     | Single-source shortest paths; returns `Vec<Option<u64>>` of length n | —          |
| `shortestDist` | Shortest distance from `src` to `dst`, or `None` if unreachable      | `dijkstra` |
| `relaxProof`   | **Lean proof** that `shortestDist graph n src src = some 0`          | `shortestDist` |

### What Makes It Interesting

The Lean spec models Dijkstra as iterated Bellman-Ford relaxation — `n-1` passes over all edges. Agents writing Rust can use a proper priority queue for efficiency, but the formal model in Lean is the ground truth for correctness. The proof requires reasoning about the initial distance vector and the `src < n` guard.

### Lean Spec Excerpt

```lean
def dijkstra (graph : ShortestPathSpec.Graph) (n src : Nat) : List (Option Nat) :=
  let init := (List.range n).map (fun i => if i == src then some 0 else none)
  (List.range (n - 1)).foldl (fun d _ => relaxPass d graph) init

-- Proof obligation: ∀ graph n src, src < n → shortestDist graph n src src = some 0
```

### Scoring

* 2 runtime functions × (1/3) reward each
* 1 proof × (1/3) reward on proof acceptance
* Proof obligation: `weight_proof = 0.2`

---

## Task 8 — Interval Scheduler (`interval_scheduler`)

**Difficulty:** Hard | **Migration:** C++ → Rust | **Max Steps:** 40

### What It Is

A C++ interval scheduling library migrated to Rust. Two algorithms: (1) **merge overlapping intervals** (sort by start, scan and merge), and (2) **greedy activity selection** (sort by end time, pick non-overlapping intervals to maximize count). The agent must also prove the merge output is sorted and non-overlapping.

### Why It Matters

Interval scheduling is the foundation of calendar systems, resource allocators, and database range scans. The merge-then-prove pattern is a canonical example of verifying a sorting + merging algorithm — a class of problems where off-by-one errors are common and hard to catch with finite tests.

### Functions to Migrate

| Function         | Description                                                            | Depends On |
| ---------------- | ---------------------------------------------------------------------- | ---------- |
| `mergeIntervals` | Sort by start, merge overlapping/touching intervals                    | —          |
| `maxSchedule`    | Greedy activity selection: sort by end time, pick non-overlapping      | —          |
| `mergeProof`     | **Lean proof** that `mergeIntervals` output is sorted + non-overlapping | `mergeIntervals` |

### What Makes It Interesting

Both functions are independent, which means the agent can migrate them in either order. The proof obligation (`isSorted ∧ noOverlap`) is a conjunction that requires two separate sub-proofs, making it the most structurally complex proof in the suite. Agents must use the `constructor` tactic and prove each branch using the Lean definitions of `isSorted` and `noOverlap` from `IntervalSpec`.

### Lean Spec Excerpt

```lean
def mergeIntervals (ivs : List IntervalSpec.Interval) : List IntervalSpec.Interval :=
  (sortByStart ivs).foldl (fun acc iv =>
    match acc.getLast? with
    | none => [iv]
    | some (ls, le) =>
      if iv.1 ≤ le then dropLastInterval acc ++ [(ls, max le iv.2)]
      else acc ++ [iv]) []

-- Proof obligation: ∀ ivs, isSorted (mergeIntervals ivs) ∧ noOverlap (mergeIntervals ivs)
```

### Scoring

* 2 runtime functions × (1/3) reward each
* 1 proof × (1/3) reward on proof acceptance
* Proof obligation: `weight_proof = 0.2`

---

## Reward Design

### Philosophy

Every action provides signal — there is no sparse-reward problem:

1. **Exploration signal**: `inspect` and `analyze_deps` give small positive rewards (+0.05) so agents learn to gather information before attempting implementation.

2. **Functional signal**: `run_tests` gives +0.10 on full pass, with proportional penalties on partial failure (−0.01 per failing case). This guides the agent toward correct behavior before spending Lean compilation time.

3. **Verification signal**: `submit` gives a large positive reward (`1 / n_functions`) when Lean accepts the code, and a small penalty (−0.05) on rejection. The penalty is small enough not to discourage reasonable attempts, but nonzero to penalize spamming.

4. **Proof signal**: Proof tasks split the reward between functional implementation and the proof, with explicit weight fields (`weight_functional`, `weight_property`, `weight_proof`) configurable per task.

### Score Clamping

Internal episode scores are clamped to `[0.01, 0.99]` — this is the "open unit" convention from OpenEnv that keeps scores away from the extremes, which some validators interpret specially. Step reward logs clamp to `[0.0, 1.0]`.

### Episode Termination

An episode ends when:
* All functions are verified (`progress = 1.0`), **or**
* The step limit is reached (`step_count ≥ max_steps`)

The step limit scales with difficulty: 15 steps for easy tasks, 25 for medium, 40 for hard.

---

## Design Notes for Evaluators

**Why Lean 4 as the verifier?** Lean 4 provides machine-checked guarantees that go beyond unit tests. A submission that passes sample cases but violates the formal spec is rejected. This makes it impossible to "cheat" with a hardcoded lookup table that only works on known inputs.

**Why multiple language pairs?** Cross-language migration is harder than same-language refactoring. Each language pair exercises different aspects of the agent's capability: type system awareness, idiomatic pattern translation, and error handling conventions.

**Why incremental function-level migration?** Submitting the entire file at once would give the agent no partial-credit signal and would make Lean errors much harder to diagnose. Function-level granularity mirrors real code review and makes the reward landscape smooth.

**Difficulty calibration:**
- Easy: one language pair, no proofs, short dependency chains, small functions
- Medium: two language pairs (including C→Rust), optional proofs, medium chains
- Hard: C++→Rust with complex data structures, mandatory proofs, long chains