Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

App Files Files Community

gensearcher-firered / vendor /rllm /examples /deepcoder /README.md

JSCPPProgrammer

Initial: GenSearcher workflow + FireRed /generate adapter + Gradio

80b7188 verified 2 months ago

preview code

raw

history blame contribute delete

2.5 kB

	# DeepCoder Training Examples

	This directory contains examples for training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed reinforcement learning (RL).

	Our examples uses the following:
	* DeepSeek-R1-Distill-Qwen-14B as the base model
	* agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset) for training and evaluation



	## Model Hosting

	### Option 1: Using vLLM

	Start a vLLM server with OpenAI-compatible API:

	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model agentica-org/DeepCoder-14B-Preview \
	--host 0.0.0.0 \
	--port 30000 \
	--dtype bfloat16 \
	--max-model-len 65536
	```

	### Option 2: Using SGLang

	```bash
	python -m sglang_router.launch_server \
	--model-path agentica-org/DeepCoder-14B-Preview \
	--dp-size 1 \
	--dtype bfloat16
	# increase dp_size to enable data-parallel processing on multi-GPU
	```

	The server should be accessible at `http://localhost:30000/v1`

	## Dataset Preparation

	Prepare the DeepCoder Preview Dataset:

	```bash
	cd examples/deepcoder
	python prepare_deepcoder_data.py
	```

	This will:
	- Download the agentica-org/DeepCoder-Preview-Dataset (lcbv5 subset)
	- Register both train/test splits with the RLLM DatasetRegistry

	## Running Inference

	Once your model server is running and datasets are prepared, you can run inference:

	```bash
	cd examples/deepcoder
	python run_deepcoder.py
	```

	### Configuration Options

	You can modify the inference script parameters:

	- `n_parallel_agents`: Number of parallel agents (default: 64)
	- `model_name`: Model to use (default: "agentica-org/DeepCoder-14B-Preview")
	- `base_url`: API server URL (default: "http://localhost:30000/v1")
	- `max_response_length`: Maximum response length (default: 64000)
	- `max_prompt_length`: Maximum prompt length (default: 2048)
	- `temperature`: Sampling temperature (default: 0.6)
	- `top_p`: Top-p sampling (default: 0.95)

	The script will:
	1. Load the DeepCoder Preview test dataset
	2. Run parallel and async trajectory collection using the agent execution engine
	3. Evaluate results and report accuracy metrics

	## Training

	### Basic Training

	To train DeepCoder with iterative context lengthening (16K -> 32K -> 64K):

	```bash
	bash examples/deepcoder/train_deepcoder_16k.sh

	# modify MODEL_PATH to the 16k checkpoint path before running the script.
	bash examples/deepcoder/train_deepcoder_32k.sh
	```