Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Baladithya Balamurugan
Wave 20: fix SageMaker smoke — torch-2.7 DLC + drop vllm pin (the real conflict)
a578ad9 | # AWS SageMaker Quickstart — the runnable-now GRPO smoke | |
| The minimum path to running the Composer-replication RL inner loop on a real | |
| GPU, end-to-end, for **under $1**. Implements F3 (`research/design-F3-rl-sagemaker.md`). | |
| ## Live account facts (verified 2026-06-09, acct 386931836011, us-west-2) | |
| | Fact | Value | | |
| |---|---| | |
| | `ml.g5.2xlarge` training-job quota | **1** (live, code `L-2D6DEB3C`) → no quota ticket needed | | |
| | Execution role | `arn:aws:iam::386931836011:role/service-role/AmazonSageMaker-ExecutionRole-20250725T133247` | | |
| | Bucket (rendezvous + output) | `amazon-sagemaker-386931836011-us-west-2-7597bf4d9a3d` | | |
| | PyTorch DLC base image | `763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.7.1-gpu-py312-cu128-ubuntu22.04-sagemaker-v1.26` | | |
| > **The base image MUST be the torch-2.7 DLC** — learned from two live runs | |
| > (2026-06-09). The dependency chain forces it: | |
| > `ComposerReplicationTrainer → trl 1.5.x → transformers>=4.56.2 → | |
| > torch.float8_e8m0fnu (MXFP4, torch>=2.7)`. The torch-**2.6** DLC fails | |
| > `AutoModel.from_pretrained` with `AttributeError: module 'torch' has no | |
| > attribute 'float8_e8m0fnu'`, and pinning transformers *down* is impossible | |
| > (trl 1.5's floor is 4.56.2). Resolve the tag against the live registry — the | |
| > AWS docs page lists wrong/stale tags (it showed a cu124 2.6 tag that doesn't | |
| > exist; real tags are cu126 for 2.6, cu128 for 2.7, each with a mandatory | |
| > `-vX.Y` build suffix — no bare floating tag): | |
| > ```bash | |
| > aws ecr describe-images --registry-id 763104351884 \ | |
| > --repository-name pytorch-training --region us-west-2 \ | |
| > --query "reverse(sort_by(imageDetails,&imagePushedAt))[].imageTags" --output text \ | |
| > | tr '\t' '\n' | grep -E '^2.7.[0-9]+-gpu-py312-cu128-.*-sagemaker-v[0-9.]+$' | head -1 | |
| > ``` | |
| > | |
| > **vLLM is OFF by default** in the smoke: `vllm==0.8.5` hard-pins `torch==2.6.0`, | |
| > which fights the torch-2.7 base. The smoke uses `model.generate` rollout (what | |
| > it proves is trainer-on-GPU + reward, not rollout speed). For colocated vLLM, | |
| > bake a torch-2.7-matched `vllm>=0.9` into an image and pass `--image <ecr> --vllm`. | |
| > **SDK pin:** the smoke launcher uses the **sagemaker SDK v2** Estimator API. | |
| > SDK **v3 is an API rewrite** that dropped `sagemaker.estimator.Estimator` and | |
| > `sagemaker.pytorch` — install `pip install 'sagemaker>=2.200,<3'`. | |
| ## Run it (no local Docker build) | |
| ```bash | |
| pip install 'sagemaker>=2.200,<3' | |
| export AWS_REGION=us-west-2 | |
| python examples/gsm8k_grpo/run_sagemaker_launch.py --max-steps 20 | |
| ``` | |
| This uses the **stock PyTorch DLC directly** as the training image and ships the | |
| framework + entry script via `source_dir`; `examples/gsm8k_grpo/requirements.txt` | |
| (trl + the RL stack) installs at job start. No 15 GB local image build, no ECR | |
| push. The script trains `Qwen/Qwen2.5-0.5B-Instruct` with GRPO + the GSM8K | |
| `#### NUMBER` RLVR reward, using `model.generate` rollout (vLLM off by default — | |
| see the torch-pin note above). | |
| Flags: `--no-wait` (submit + poll later), `--spot` (managed spot, quota=1 too), | |
| `--vllm` (enable colocated vLLM — only with a baked `--image` carrying a | |
| torch-2.7 vllm), `--image <ecr-uri>` (use a prebuilt baked image instead of the DLC). | |
| **Cost:** `ml.g5.2xlarge` ≈ $1.52/hr on-demand; a 20-step 0.5B smoke is | |
| ~15–25 min ⇒ **well under $1**. Spot ≈ $0.45–0.60/hr ⇒ pennies. | |
| ## The repeatable path (baked image) | |
| For runs where the ~5–10 min per-job pip-install is unwanted (and for the | |
| DiLoCo N-replica `SageMakerExecutor`, which passes `ContainerEntrypoint` and | |
| needs the framework baked in), build the image once: | |
| ```bash | |
| scripts/build_and_push_ecr.sh # creates ECR repo, builds, pushes composer-rl:smoke | |
| python examples/gsm8k_grpo/run_sagemaker_launch.py \ | |
| --image 386931836011.dkr.ecr.us-west-2.amazonaws.com/composer-rl:smoke | |
| ``` | |
| On an Apple-Silicon host the build cross-compiles (`--platform linux/amd64`) the | |
| ~15 GB GPU DLC under emulation — slow; prefer a linux/amd64 host or CodeBuild. | |
| ## Gotchas (load-bearing) | |
| - **`EnableNetworkIsolation` stays False** (the default) so the container can | |
| reach `huggingface.co` (model + GSM8K download) and S3. | |
| - **`vllm_gpu_memory_utilization=0.3`** is the load-bearing knob on a 24 GB | |
| A10G: too high → OOM when the policy + grads also need the GPU; too low → tiny | |
| KV cache. Use `--no-vllm` if a vLLM wheel/CUDA mismatch surfaces. | |
| - **Warm pools are off** — `g5 training warm pool usage` quota is 0 in this | |
| account, so each job pays ~3–6 min cold-start. Request a warm-pool quota bump | |
| for iterative dev, or move the long inner loop to HyperPod (persistent). | |
| - **"Waiting for capacity"** in `SecondaryStatus` is transient g5 capacity | |
| contention in the region, not an error — the job proceeds when capacity frees. | |
| ## Next: DiLoCo N-replica (the `SageMakerExecutor` path) | |
| `examples/diloco_sagemaker/run.py` (driver, F3 §4.3) drives N independent | |
| single-instance Training Jobs sharing one `s3://.../rendezvous/` prefix via | |
| `ObjectStoreAllReduce` — no cross-job NCCL. N=1 runs today; N=2–4 needs a | |
| `ml.g5.2xlarge for training job usage` quota increase. The DiLoCo math, loss, | |
| trainer, and `ObjectStoreAllReduce` are unchanged from the smoke — the S3 | |
| rendezvous is the entire portability contract (validated `file://` and live | |
| `s3://`; see `test_serverless_local.py::test_s3_rendezvous_allreduce_across_replicas`). | |
| ``` | |