chemistry-step-70 / README.md
aggr's picture
Upload folder using huggingface_hub
42ee1f4 verified
# Chemistry Model - Fine-tuned Qwen2.5-3B-Instruct (Fixed)
This is a fine-tuned version of Qwen2.5-3B-Instruct trained for chemistry-related tasks using GRPO (Group Relative Policy Optimization). The model was saved at global step 70.
⚠️ **This is a fixed version** - the original upload contained distributed tensor metadata that caused loading issues. This version has been properly consolidated.
## Model Details
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
- **Architecture**: Qwen2ForCausalLM
- **Training Algorithm**: GRPO with VLLM async rollouts
- **Training Step**: 70
- **Framework**: PyTorch + Transformers
- **Original checkpoint**: ckpts/global_step_70
## Training Configuration
This model was trained using the chemistry environment from skyrl-gym with the following key parameters:
- Learning rate: 1.0e-6
- Train batch size: 1024
- Max generate length: 1024
- Environment: ChemGuesser (molecular similarity scoring)
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("runrl/chemistry-step-70")
tokenizer = AutoTokenizer.from_pretrained("runrl/chemistry-step-70")
# Example usage for chemistry tasks
prompt = "Predict the molecular structure for the compound with SMILES: "
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Training Environment
This model was specifically trained for chemistry tasks involving molecular structure prediction and similarity scoring.
## Technical Notes
- Consolidated from 4-rank FSDP2 checkpoint
- DTensors properly converted to regular PyTorch tensors
- FSDP2 sharded parameters reconstructed into full model
- Compatible with standard Transformers loading