Student0809's picture
Add files using upload-large-folder tool
cb2428f verified
# README: GRPO Internal Mode Execution Scripts
---
## Known Issues
Bugs in **vLLM >= 0.8**
1. DeepSpeed ZeRO-3 Mode :
When using DeepSpeed's ZeRO-3 configuration, gradients may become zero during training.
2. Async Mode
In certain scenarios, the asynchronous mode (Async Mode) may hang, causing the program to become unresponsive.
To ensure stability and compatibility, it is recommended to use **vLLM 0.7.3** to avoid the above issues.
## **Introduction**
The GRPO (Gradient-based Reinforcement Policy Optimization) training framework supports integrating high-performance inference engines like vLLM to accelerate the sampling process. The **Internal Mode** allows the inference service to be directly launched within the Trainer, reducing external dependencies and simplifying deployment.
This folder contains scripts and instructions for running GRPO in **Internal Mode**, where the model training and inference are tightly integrated with flexible resource allocation strategies.
## **Resource Allocation Strategies**
GRPO provides two resource allocation strategies under the Internal mode:
### 1. **Colocate Mode**
- **Description**: Training and inference share GPU resources.
- **Recommended Setting**:
- Set `sleep_level=1` to release vLLM memory during training steps.
- **Resource Allocation Rules**:
```plaintext
NPROC_PER_NODE = Total number of GPUs
num_infer_workers = Total number of GPUs
```
### 2. **Async Mode**
- **Description**: Training and inference use independent GPU resources.
- **Recommended Setting**:
- Set `sleep_level=1` to release vLLM memory during training steps.
- **Resource Allocation Rules**:
```plaintext
NPROC_PER_NODE = Number of training GPUs
num_infer_workers = Number of inference GPUs
Must satisfy: Number of training GPUs + Number of inference GPUs = Total GPU count
```