runrl
/

chemistry-step-70

Model card Files Files and versions

chemistry-step-70 / README.md

aggr's picture

Upload folder using huggingface_hub

42ee1f4 verified 6 months ago

|

history blame contribute delete

1.82 kB

	# Chemistry Model - Fine-tuned Qwen2.5-3B-Instruct (Fixed)

	This is a fine-tuned version of Qwen2.5-3B-Instruct trained for chemistry-related tasks using GRPO (Group Relative Policy Optimization). The model was saved at global step 70.

	⚠️ This is a fixed version - the original upload contained distributed tensor metadata that caused loading issues. This version has been properly consolidated.

	## Model Details
	- Base Model: Qwen/Qwen2.5-3B-Instruct
	- Architecture: Qwen2ForCausalLM
	- Training Algorithm: GRPO with VLLM async rollouts
	- Training Step: 70
	- Framework: PyTorch + Transformers
	- Original checkpoint: ckpts/global_step_70

	## Training Configuration
	This model was trained using the chemistry environment from skyrl-gym with the following key parameters:
	- Learning rate: 1.0e-6
	- Train batch size: 1024
	- Max generate length: 1024
	- Environment: ChemGuesser (molecular similarity scoring)

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("runrl/chemistry-step-70")
	tokenizer = AutoTokenizer.from_pretrained("runrl/chemistry-step-70")

	# Example usage for chemistry tasks
	prompt = "Predict the molecular structure for the compound with SMILES: "
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=512)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Training Environment
	This model was specifically trained for chemistry tasks involving molecular structure prediction and similarity scoring.

	## Technical Notes
	- Consolidated from 4-rank FSDP2 checkpoint
	- DTensors properly converted to regular PyTorch tensors
	- FSDP2 sharded parameters reconstructed into full model
	- Compatible with standard Transformers loading