# DeepCoder Training Examples

This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL). 

Our examples uses the following:
* DeepSeek-R1-Distill-Qwen-14B as the base model
* agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation



## Model Hosting

### Option 1: Using vLLM

Start a vLLM server with OpenAI-compatible API:

```bash
python -m vllm.entrypoints.openai.api_server \
    --model agentica-org/DeepCoder-14B-Preview \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16 \
    --max-model-len 65536
```

### Option 2: Using SGLang

```bash
python -m sglang_router.launch_server \
    --model-path agentica-org/DeepCoder-14B-Preview \ 
    --dp-size 1 \
    --dtype bfloat16
# increase dp_size to enable data-parallel processing on multi-GPU 
```

The server should be accessible at `http://localhost:30000/v1`

## Dataset Preparation

Prepare the DeepCoder Preview Dataset:

```bash
cd examples/deepcoder
python prepare_deepcoder_data.py
```

This will:
- Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset)
- Register both train/test splits with the RLLM DatasetRegistry

## Running Inference

Once your model server is running and datasets are prepared, you can run inference:

```bash
cd examples/deepcoder
python run_deepcoder.py
```

### Configuration Options

You can modify the inference script parameters:

- `n_parallel_agents`: Number of parallel agents (default: 64)
- `model_name`: Model to use (default: "agentica-org/DeepCoder-14B-Preview")
- `base_url`: API server URL (default: "http://localhost:30000/v1")
- `max_response_length`: Maximum response length (default: 64000)
- `max_prompt_length`: Maximum prompt length (default: 2048)
- `temperature`: Sampling temperature (default: 0.6)
- `top_p`: Top-p sampling (default: 0.95)

The script will:
1. Load the DeepCoder Preview test dataset
2. Run parallel and async trajectory collection using the agent execution engine
3. Evaluate results and report accuracy metrics

## Training

### Basic Training

To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K):

```bash
bash examples/deepcoder/train_deepcoder_16k.sh

# modify MODEL_PATH to the 16k checkpoint path before running the script.
bash examples/deepcoder/train_deepcoder_32k.sh
```