Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,046 Bytes
9a2ce20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | """composer_replication.pipeline — the Stage-0 dataset-pipeline contract + driver.
THE single reconciled dataset contract (supersedes the two divergent layouts in
research/design-F1 and design-F2 — deepread finding V8/D-7), the pragmatic
near-duplicate detector, and the local stage-driver that turns
(tasks, env, policy) into a carded, deduped, holdout-split corpus.
"""
from composer_replication.pipeline.build_corpus import build_corpus
from composer_replication.pipeline.dedup import (
dedup,
find_near_duplicates,
jaccard_estimate,
minhash_signature,
)
from composer_replication.pipeline.s3_contract import (
RunLayout,
RunManifest,
write_dataset_card,
write_dpo_rows,
write_sft_rows,
write_tasks,
write_tasks_full,
)
__all__ = [
"RunLayout",
"RunManifest",
"build_corpus",
"dedup",
"find_near_duplicates",
"jaccard_estimate",
"minhash_signature",
"write_dataset_card",
"write_dpo_rows",
"write_sft_rows",
"write_tasks",
"write_tasks_full",
]
|