AlignmentResearch
/

pineapple-policy-annah_grpo

Model card Files Files and versions

pineapple-policy-annah_grpo

530 MB

Ctrl+K

Ctrl+K

1 contributor

History: 2 commits

annahbanannah's picture

Upload trained grpo model

56c6a1c verified 11 months ago

policy
Upload trained grpo model 11 months ago
reference
Upload trained grpo model 11 months ago
reward
Upload trained grpo model 11 months ago
.gitattributes

1.57 kB
Upload trained grpo model 11 months ago
README.md

5.09 kB
Upload trained grpo model 11 months ago
added_tokens.json

707 Bytes
Upload trained grpo model 11 months ago
chat_template.jinja

4.17 kB
Upload trained grpo model 11 months ago
merges.txt

1.67 MB
Upload trained grpo model 11 months ago
special_tokens_map.json

613 Bytes
Upload trained grpo model 11 months ago
tokenizer.json

11.4 MB
xet

Upload trained grpo model 11 months ago
tokenizer_config.json

5.4 kB
Upload trained grpo model 11 months ago
training_args.bin
Detected Pickle imports (11)
- "torch.device",
- "accelerate.state.PartialState",
- "transformers.trainer_utils.SaveStrategy",
- "transformers.trainer_utils.HubStrategy",
- "trl.trainer.grpo_config.GRPOConfig",
- "transformers.trainer_utils.FSDPOption",
- "transformers.trainer_utils.IntervalStrategy",
- "accelerate.utils.dataclasses.DistributedType",
- "transformers.trainer_utils.SchedulerType",
- "transformers.trainer_pt_utils.AcceleratorConfig",
- "transformers.training_args.OptimizerNames"
How to fix it?
6.52 kB
xet

Upload trained grpo model 11 months ago
vocab.json

2.78 MB
Upload trained grpo model 11 months ago