# Chemistry Model - Fine-tuned Qwen2.5-3B-Instruct (Fixed) This is a fine-tuned version of Qwen2.5-3B-Instruct trained for chemistry-related tasks using GRPO (Group Relative Policy Optimization). The model was saved at global step 70. ⚠️ **This is a fixed version** - the original upload contained distributed tensor metadata that caused loading issues. This version has been properly consolidated. ## Model Details - **Base Model**: Qwen/Qwen2.5-3B-Instruct - **Architecture**: Qwen2ForCausalLM - **Training Algorithm**: GRPO with VLLM async rollouts - **Training Step**: 70 - **Framework**: PyTorch + Transformers - **Original checkpoint**: ckpts/global_step_70 ## Training Configuration This model was trained using the chemistry environment from skyrl-gym with the following key parameters: - Learning rate: 1.0e-6 - Train batch size: 1024 - Max generate length: 1024 - Environment: ChemGuesser (molecular similarity scoring) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("runrl/chemistry-step-70") tokenizer = AutoTokenizer.from_pretrained("runrl/chemistry-step-70") # Example usage for chemistry tasks prompt = "Predict the molecular structure for the compound with SMILES: " inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=512) response = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Training Environment This model was specifically trained for chemistry tasks involving molecular structure prediction and similarity scoring. ## Technical Notes - Consolidated from 4-rank FSDP2 checkpoint - DTensors properly converted to regular PyTorch tensors - FSDP2 sharded parameters reconstructed into full model - Compatible with standard Transformers loading