DeepCoder Training Examples
This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL).
Our examples uses the following:
- DeepSeek-R1-Distill-Qwen-14B as the base model
- agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation
Model Hosting
Option 1: Using vLLM
Start a vLLM server with OpenAI-compatible API:
python -m vllm.entrypoints.openai.api_server \
--model agentica-org/DeepCoder-14B-Preview \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16 \
--max-model-len 65536
Option 2: Using SGLang
python -m sglang_router.launch_server \
--model-path agentica-org/DeepCoder-14B-Preview \
--dp-size 1 \
--dtype bfloat16
# increase dp_size to enable data-parallel processing on multi-GPU
The server should be accessible at http://localhost:30000/v1
Dataset Preparation
Prepare the DeepCoder Preview Dataset:
cd examples/deepcoder
python prepare_deepcoder_data.py
This will:
- Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset)
- Register both train/test splits with the RLLM DatasetRegistry
Running Inference
Once your model server is running and datasets are prepared, you can run inference:
cd examples/deepcoder
python run_deepcoder.py
Configuration Options
You can modify the inference script parameters:
n_parallel_agents: Number of parallel agents (default: 64)model_name: Model to use (default: "agentica-org/DeepCoder-14B-Preview")base_url: API server URL (default: "http://localhost:30000/v1")max_response_length: Maximum response length (default: 64000)max_prompt_length: Maximum prompt length (default: 2048)temperature: Sampling temperature (default: 0.6)top_p: Top-p sampling (default: 0.95)
The script will:
- Load the DeepCoder Preview test dataset
- Run parallel and async trajectory collection using the agent execution engine
- Evaluate results and report accuracy metrics
Training
Basic Training
To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K):
bash examples/deepcoder/train_deepcoder_16k.sh
# modify MODEL_PATH to the 16k checkpoint path before running the script.
bash examples/deepcoder/train_deepcoder_32k.sh