This model is a fine-tuned version of Qwen/Qwen3-8B on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--neulab-code-feedback-sandboxes_glm_4.7_traces_jupiter/snapshots/e815aba2c9ff5d91161edf385c1deba77cd72e9e_thinking_preprocessed dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
learning_rate: 4e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 16
total_train_batch_size: 16
total_eval_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments