JSCPPProgrammer's picture
Initial: GenSearcher workflow + FireRed /generate adapter + Gradio
80b7188 verified

DeepCoder Training Examples

This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL).

Our examples uses the following:

  • DeepSeek-R1-Distill-Qwen-14B as the base model
  • agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation

Model Hosting

Option 1: Using vLLM

Start a vLLM server with OpenAI-compatible API:

python -m vllm.entrypoints.openai.api_server \
    --model agentica-org/DeepCoder-14B-Preview \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16 \
    --max-model-len 65536

Option 2: Using SGLang

python -m sglang_router.launch_server \
    --model-path agentica-org/DeepCoder-14B-Preview \ 
    --dp-size 1 \
    --dtype bfloat16
# increase dp_size to enable data-parallel processing on multi-GPU 

The server should be accessible at http://localhost:30000/v1

Dataset Preparation

Prepare the DeepCoder Preview Dataset:

cd examples/deepcoder
python prepare_deepcoder_data.py

This will:

  • Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset)
  • Register both train/test splits with the RLLM DatasetRegistry

Running Inference

Once your model server is running and datasets are prepared, you can run inference:

cd examples/deepcoder
python run_deepcoder.py

Configuration Options

You can modify the inference script parameters:

  • n_parallel_agents: Number of parallel agents (default: 64)
  • model_name: Model to use (default: "agentica-org/DeepCoder-14B-Preview")
  • base_url: API server URL (default: "http://localhost:30000/v1")
  • max_response_length: Maximum response length (default: 64000)
  • max_prompt_length: Maximum prompt length (default: 2048)
  • temperature: Sampling temperature (default: 0.6)
  • top_p: Top-p sampling (default: 0.95)

The script will:

  1. Load the DeepCoder Preview test dataset
  2. Run parallel and async trajectory collection using the agent execution engine
  3. Evaluate results and report accuracy metrics

Training

Basic Training

To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K):

bash examples/deepcoder/train_deepcoder_16k.sh

# modify MODEL_PATH to the 16k checkpoint path before running the script.
bash examples/deepcoder/train_deepcoder_32k.sh