| # Chemistry Model - Fine-tuned Qwen2.5-3B-Instruct (Fixed) | |
| This is a fine-tuned version of Qwen2.5-3B-Instruct trained for chemistry-related tasks using GRPO (Group Relative Policy Optimization). The model was saved at global step 70. | |
| ⚠️ **This is a fixed version** - the original upload contained distributed tensor metadata that caused loading issues. This version has been properly consolidated. | |
| ## Model Details | |
| - **Base Model**: Qwen/Qwen2.5-3B-Instruct | |
| - **Architecture**: Qwen2ForCausalLM | |
| - **Training Algorithm**: GRPO with VLLM async rollouts | |
| - **Training Step**: 70 | |
| - **Framework**: PyTorch + Transformers | |
| - **Original checkpoint**: ckpts/global_step_70 | |
| ## Training Configuration | |
| This model was trained using the chemistry environment from skyrl-gym with the following key parameters: | |
| - Learning rate: 1.0e-6 | |
| - Train batch size: 1024 | |
| - Max generate length: 1024 | |
| - Environment: ChemGuesser (molecular similarity scoring) | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("runrl/chemistry-step-70") | |
| tokenizer = AutoTokenizer.from_pretrained("runrl/chemistry-step-70") | |
| # Example usage for chemistry tasks | |
| prompt = "Predict the molecular structure for the compound with SMILES: " | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=512) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| ``` | |
| ## Training Environment | |
| This model was specifically trained for chemistry tasks involving molecular structure prediction and similarity scoring. | |
| ## Technical Notes | |
| - Consolidated from 4-rank FSDP2 checkpoint | |
| - DTensors properly converted to regular PyTorch tensors | |
| - FSDP2 sharded parameters reconstructed into full model | |
| - Compatible with standard Transformers loading | |