| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen3-32B |
| | --- |
| | |
| | The missing "base model" of Qwen3-32B. This model serves as the foundation for our R1-0528 distillation work. |
| |
|
| | This model is the result of continued pre-training on Qwen3-32B, using a multilingual dataset of mixed code and text. |
| |
|
| | The purpose of training this model is to provide a model that is close to a "pre-trained" state, reducing the influence of the original Qwen3's linguistic style on subsequent fine-tuning efforts. |
| |
|
| | We are providing this model to the community to serve as a base model for further SFT, this model is not intended for direct inference. |
| |
|
| |
|
| |
|
| |
|