Spaces:
Build error
Polyglot-Optima Beginner + Technical Explanation
This document explains the project from zero, then gradually adds technical depth.
1) One-line idea
Polyglot-Optima is a training environment where an AI learns to convert Python functions into fast C++ without breaking correctness.
2) Why this project exists
Most code models can produce "fast-looking" code, but in real systems that is not enough.
Common failure modes:
- code compiles but gives wrong outputs,
- code is fast only on one machine but fails elsewhere,
- reward is easy to game (model hacks scoring instead of solving task),
- model does not improve over multiple refinement rounds.
This project is built to fix those problems using:
- strict compile checks,
- fuzz-based correctness verification,
- cross-hardware portability checks,
- anti-gaming trap tasks,
- curriculum learning (easy -> hard),
- structured continuous reward.
3) Mental model (simple)
Think of this project as a game with rules:
- Input: a Python function + a hardware profile.
- Player (AI): can call tools to analyze and optimize.
- Goal: submit C++ that is fast and correct.
- Score (reward): combines speed, correctness, reasoning quality, and portability.
The AI plays this game many times and learns better strategies.
4) Core architecture
Main folders:
models.py
Defines typed data objects for actions, observations, and state.server/environment.py
The main OpenEnv environment implementation (reset,step,state,close).server/tools/
Actual capability tools (compiler, verifier, profiling, portability, submit).server/rewards/
Reward rubrics and reward composition logic.server/scenarios/
Task generators, hardware profiles, trap library, and adaptive curriculum.tests/
Unit + integration tests validating behavior and quality.
5) Episode lifecycle (what happens in one training sample)
Each episode has 3 rounds.
Round flow
- Environment samples:
- Python code task
- hardware profile
- hidden bottleneck labels (for diagnosis scoring)
- Model calls tools (analyze, compile, verify, etc.).
- Model eventually calls
submit_optimization. - Environment computes round reward.
- Repeat for rounds 2 and 3.
- Final episode reward is computed from round rewards.
Important implementation details
max_calls_per_roundis enforced.- If call budget is exhausted, environment forces submit for that round.
- Adaptive curriculum can update global difficulty after batch outcomes.
6) The 9 tools (what the model can do)
The AI does not directly "guess" everything. It uses tools:
get_hardware_profileprofile_python_hotspotsanalyze_complexitycheck_memory_accesscompile_and_benchmarkverify_equivalencecheck_portabilityget_bottleneck_reportsubmit_optimization(round-closing action)
The most important tools for trustworthiness are:
compile_and_benchmark(real compile/runtime behavior),verify_equivalence(catches wrong-but-fast code),check_portability(checks behavior across profiles).
7) Reward system explained simply
Reward is continuous, not just pass/fail.
That means:
- weak solutions get small score,
- better solutions get higher score,
- fully good solutions get top score.
This is important for RL because the model needs gradient/signal to improve.
Reward components
- SpeedupRubric: how much faster C++ is vs Python baseline
- CorrectnessRubric: fuzz pass-rate quality
- CompilationRubric: compile quality/status
- DiagnosisRubric: quality/coherence of bottleneck reasoning
- PortabilityRubric: cross-profile robustness
- SelfCorrectionRubric: improvement from earlier rounds
Composition
Reward is composed using rubric operators (Sequential, Gate, WeightedSum), so it is easier to reason about and tune than one large monolithic score function.
8) Anti-gaming design
This project assumes the model will try shortcuts. So it includes defenses:
- Trap functions (overflow, NaN/Inf, aliasing, semantic edge cases)
- Adversarial fuzzing
- Correctness + adversarial pass-rate signals
- Portability checks across hardware profiles
- Reasoning/diagnosis quality signal
Net effect: "fast but wrong" should score poorly.
9) Curriculum learning (easy -> hard)
Difficulty axes include:
- function complexity tier,
- hardware difficulty class,
- verifier strictness,
- portability requirement.
Curriculum controller monitors success in batches and adjusts:
- high success -> increase difficulty,
- low success -> reduce difficulty,
- middle zone -> hold.
This stabilizes learning and prevents early collapse.
10) Adaptive traps (what was improved)
Adaptive traps now do two things:
- prioritize categories where the model recently failed,
- create semantic-preserving trap variants (not only naive renaming).
Why this matters:
- reduces memorization,
- improves robustness,
- increases novelty/innovation signal for judges.
11) What "good performance" means here
Not just one high speedup number.
A good policy should show:
- increasing reward trend,
- high correctness/adversarial pass-rate,
- high compile success,
- better portability over time,
- stable behavior on held-out/edge-case tasks.
12) How to run and verify locally
From polyglot_optima/:
python -m ruff check .
python -m pytest -q
Smoke test (LLM-in-the-loop):
python tests/smoke_llm_hf.py
Cursor/OpenAI-compatible mode:
set LLM_PROVIDER=cursor
set CURSOR_API_KEY=...
set CURSOR_MODEL=gpt-4.1-nano
python tests/smoke_llm_hf.py
13) Training workflow for beginners
Use training/openenv_hackathon_training.ipynb:
- Configure model + episodes + logging.
- Run baseline eval first (fixed seeds).
- Run RL training (TRL scaffold cell).
- Run post-training eval with same seed protocol.
- Export plots to
docs/plots. - Add results to
README.md.
Track at least:
- reward,
- correctness pass rate,
- compile success rate,
- portability metrics.
14) How this maps to hackathon judging
The project can score well if you clearly show:
- Innovation: adaptive curriculum + anti-gaming traps + structured reward
- Storytelling: clear problem -> method -> before/after outcome
- Improvement evidence: baseline vs trained plots
- Pipeline quality: reproducible notebook/script + OpenEnv-compliant deployment
15) Most important files to read next
Recommended reading order:
README.mdmodels.pyserver/environment.pyserver/tools/submit.pyserver/tools/cpp_compiler.pyserver/tools/verifier.pyserver/rewards/__init__.pyserver/scenarios/dataset_loader.pytests/test_skeleton.py
16) Beginner takeaway
If you remember one thing:
This is not just "code generation."
It is a full RL environment that teaches an AI to do correct, robust, hardware-aware optimization under realistic constraints.