Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
andrewlngdn
/
dsl-debug-7b-rl-only-step30
like
0
Text Generation
Safetensors
custom
English
qwen2
debugging
tool-use
multi-turn
reinforcement-learning
grpo
conversational
License:
mit
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
dsl-debug-7b-rl-only-step30
Ctrl+K
Ctrl+K
1 contributor
History:
4 commits
andrewlngdn
Update model card: add blog link, GitHub link, related models
7351c81
verified
about 2 months ago
.gitattributes
Safe
1.52 kB
initial commit
3 months ago
README.md
Safe
1.79 kB
Update model card: add blog link, GitHub link, related models
about 2 months ago
config.json
Safe
1.32 kB
Upload rl_only_step30 (publishable run)
3 months ago
generation_config.json
Safe
121 Bytes
Upload rl_only_step30 (publishable run)
3 months ago
merges.txt
Safe
1.67 MB
Upload rl_only_step30 (publishable run)
3 months ago
model-00001-of-00004.safetensors
4.88 GB
xet
Upload rl_only_step30 (publishable run)
3 months ago
model-00002-of-00004.safetensors
Safe
4.93 GB
xet
Upload rl_only_step30 (publishable run)
3 months ago
model-00003-of-00004.safetensors
Safe
4.33 GB
xet
Upload rl_only_step30 (publishable run)
3 months ago
model-00004-of-00004.safetensors
Safe
1.09 GB
xet
Upload rl_only_step30 (publishable run)
3 months ago
model.safetensors.index.json
Safe
27.8 kB
Upload rl_only_step30 (publishable run)
3 months ago
tokenizer.json
Safe
7.03 MB
Upload rl_only_step30 (publishable run)
3 months ago
tokenizer_config.json
Safe
7.31 kB
Upload rl_only_step30 (publishable run)
3 months ago
vocab.json
Safe
2.78 MB
Upload rl_only_step30 (publishable run)
3 months ago