| # DeepCoder Training Examples | |
| This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL). | |
| Our examples uses the following: | |
| * DeepSeek-R1-Distill-Qwen-14B as the base model | |
| * agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation | |
| ## Model Hosting | |
| ### Option 1: Using vLLM | |
| Start a vLLM server with OpenAI-compatible API: | |
| ```bash | |
| python -m vllm.entrypoints.openai.api_server \ | |
| --model agentica-org/DeepCoder-14B-Preview \ | |
| --host 0.0.0.0 \ | |
| --port 30000 \ | |
| --dtype bfloat16 \ | |
| --max-model-len 65536 | |
| ``` | |
| ### Option 2: Using SGLang | |
| ```bash | |
| python -m sglang_router.launch_server \ | |
| --model-path agentica-org/DeepCoder-14B-Preview \ | |
| --dp-size 1 \ | |
| --dtype bfloat16 | |
| # increase dp_size to enable data-parallel processing on multi-GPU | |
| ``` | |
| The server should be accessible at `http://localhost:30000/v1` | |
| ## Dataset Preparation | |
| Prepare the DeepCoder Preview Dataset: | |
| ```bash | |
| cd examples/deepcoder | |
| python prepare_deepcoder_data.py | |
| ``` | |
| This will: | |
| - Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) | |
| - Register both train/test splits with the RLLM DatasetRegistry | |
| ## Running Inference | |
| Once your model server is running and datasets are prepared, you can run inference: | |
| ```bash | |
| cd examples/deepcoder | |
| python run_deepcoder.py | |
| ``` | |
| ### Configuration Options | |
| You can modify the inference script parameters: | |
| - `n_parallel_agents`: Number of parallel agents (default: 64) | |
| - `model_name`: Model to use (default: "agentica-org/DeepCoder-14B-Preview") | |
| - `base_url`: API server URL (default: "http://localhost:30000/v1") | |
| - `max_response_length`: Maximum response length (default: 64000) | |
| - `max_prompt_length`: Maximum prompt length (default: 2048) | |
| - `temperature`: Sampling temperature (default: 0.6) | |
| - `top_p`: Top-p sampling (default: 0.95) | |
| The script will: | |
| 1. Load the DeepCoder Preview test dataset | |
| 2. Run parallel and async trajectory collection using the agent execution engine | |
| 3. Evaluate results and report accuracy metrics | |
| ## Training | |
| ### Basic Training | |
| To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K): | |
| ```bash | |
| bash examples/deepcoder/train_deepcoder_16k.sh | |
| # modify MODEL_PATH to the 16k checkpoint path before running the script. | |
| bash examples/deepcoder/train_deepcoder_32k.sh | |
| ``` | |