Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Duplicated from  ayushozha/replicalab-scientist-grpo-lora

openenv-community
/
replicalab-scientist-grpo-lora

Text Generation
PEFT
Safetensors
Transformers
English
grpo
lora
trl
unsloth
reinforcement-learning
multi-agent
scientific-reasoning
replicalab
conversational
Model card Files Files and versions
xet
Community
replicalab-scientist-grpo-lora
46.3 MB
Ctrl+K
Ctrl+K
  • 2 contributors
History: 3 commits
maxxie114's picture
maxxie114
Update README.md
814ce97 verified 25 days ago
  • plots
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • reports
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • .gitattributes
    1.63 kB
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • README.md
    4.07 kB
    Update README.md 25 days ago
  • adapter_config.json
    1.21 kB
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • adapter_model.safetensors
    25.6 MB
    xet
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • chat_template.jinja
    7.82 kB
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • processor_config.json
    1.3 kB
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • tokenizer.json
    20 MB
    xet
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago
  • tokenizer_config.json
    1.17 kB
    Duplicate from ayushozha/replicalab-scientist-grpo-lora 25 days ago