PursuitOfDataScience
/

Llama-3.2-1B-GRPO

Text Generation

chain-of-thought

reinforcement-learning

Model card Files Files and versions

Llama-3.2-1B-GRPO

2.49 GB

Ctrl+K

Ctrl+K

1 contributor

History: 2 commits

PursuitOfDataScience's picture

PursuitOfDataScience

Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model

14e5493 verified 6 months ago

.gitattributes

1.57 kB
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
README.md

5.88 kB
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
config.json

859 Bytes
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
generation_config.json

216 Bytes
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
model.safetensors

2.47 GB
xet

Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
special_tokens_map.json

449 Bytes
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
tokenizer.json

17.2 MB
xet

Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
tokenizer_config.json

50.6 kB
Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago
training_args.bin

7.25 kB
xet

Upload Llama-3.2-1B-GRPO: CoT + GRPO trained model 6 months ago