Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Spike 008 — Streaming DiLoCo outer-loop smoke
Closes: V2 (DiLoCo "deferred to v0.2") in docs/VISION_VALIDATION.md.
Goal
Bolt the DiLoCo outer-loop pseudo-gradient sync onto the framework using
torchft.local_sgd.DiLoCo (see docs/adrs/ADR-003-diloco-impl.md).
Verify:
- Two in-process replicas converge to identical parameters after outer sync.
- Outer Nesterov momentum is actually populated (i.e. the outer optimizer ran).
- The pseudo-gradient sign convention is what we expect (sign flip detected by an explicit unit test).
- Importing torchft does not regress Spike 005's existing 38 tests.
Single-process, no NCCL. Mock Manager.allreduce does real cross-replica
averaging through a shared buffer.
Files
composer_diloco.py—make_diloco_outer_loop(...)wrapper aroundtorchft.local_sgd.DiLoCo. Documents the sign convention.tests/test_diloco_smoke.py— 3 acceptance tests.
Acceptance
| Criterion | Status |
|---|---|
| 2 replicas converge after 2 outer rounds | ✓ test 1 |
| Nesterov momentum state populated | ✓ test 1 |
| Sync fires once per outer round per replica | ✓ test 1 |
| Pseudo-gradient sign convention verified | ✓ test 2 |
| No regression in Spike 005 imports | ✓ test 3 |
| Spike 005's 38 tests still pass after this wave | (verified separately) |
Future work (v0.2 Streaming DiLoCo)
fragment_sync_delay > 0requires CUDA streams. Spike 008 usesfragment_sync_delay=0(vanilla DiLoCo) for the smoke.- Multiple fragments via
model_fragments=[frag_0, frag_1, ...]configured bymake_diloco_outer_loop()but not exercised in the smoke. - Real torch.distributed backend (NCCL) for multi-node training is
one config switch away (replace mock
Managerwith realtorchft.Manager).
Cost / time
- Pure CPU, single process, no GPU.
- Tests run in <2 seconds total.
Dependencies added
torchft-nightly(BSD-3, Meta-maintained,pip install torchft-nightly)