Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

File size: 2,500 Bytes

80b7188

# DeepCoder Training Examples

This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL). 

Our examples uses the following:
* DeepSeek-R1-Distill-Qwen-14B as the base model
* agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation



## Model Hosting

### Option 1: Using vLLM

Start a vLLM server with OpenAI-compatible API:

```bash

python -m vllm.entrypoints.openai.api_server \

    --model agentica-org/DeepCoder-14B-Preview \

    --host 0.0.0.0 \

    --port 30000 \

    --dtype bfloat16 \

    --max-model-len 65536

```

### Option 2: Using SGLang

```bash

python -m sglang_router.launch_server \

    --model-path agentica-org/DeepCoder-14B-Preview \ 

    --dp-size 1 \

    --dtype bfloat16

# increase dp_size to enable data-parallel processing on multi-GPU 

```

The server should be accessible at `http://localhost:30000/v1`

## Dataset Preparation

Prepare the DeepCoder Preview Dataset:

```bash

cd examples/deepcoder

python prepare_deepcoder_data.py

```

This will:
- Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset)
- Register both train/test splits with the RLLM DatasetRegistry

## Running Inference

Once your model server is running and datasets are prepared, you can run inference:

```bash

cd examples/deepcoder

python run_deepcoder.py

```

### Configuration Options

You can modify the inference script parameters:

- `n_parallel_agents`: Number of parallel agents (default: 64)
- `model_name`: Model to use (default: "agentica-org/DeepCoder-14B-Preview")
- `base_url`: API server URL (default: "http://localhost:30000/v1")
- `max_response_length`: Maximum response length (default: 64000)
- `max_prompt_length`: Maximum prompt length (default: 2048)
- `temperature`: Sampling temperature (default: 0.6)
- `top_p`: Top-p sampling (default: 0.95)

The script will:
1. Load the DeepCoder Preview test dataset
2. Run parallel and async trajectory collection using the agent execution engine
3. Evaluate results and report accuracy metrics

## Training

### Basic Training

To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K):

```bash

bash examples/deepcoder/train_deepcoder_16k.sh



# modify MODEL_PATH to the 16k checkpoint path before running the script.

bash examples/deepcoder/train_deepcoder_32k.sh

```