| | --- |
| | base_model: |
| | - princeton-nlp/Llama-3-8B-ProLong-512k-Instruct |
| | license: apache-2.0 |
| | language: |
| | - en |
| | datasets: |
| | - chtmp223/CLIPPER-WritingPrompts |
| | --- |
| | |
| | # ProLong-512k-8B-WritingPrompts |
| | ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset. |
| | Please check [our paper](https://arxiv.org/abs/2502.14854) for more details on the method. |
| |
|
| | ## π Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Language(s) (NLP):** English |
| | - **License:** Apache-2.0 |
| | - **Finetuned from model:** princeton-nlp/Llama-3-8B-ProLong-512k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct) |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [Github repository](https://github.com/chtmp223/CLIPPER). |
| | - **Paper:** [https://arxiv.org/abs/2502.14854](https://arxiv.org/abs/2502.14854) |
| |
|
| |
|
| | ## π» Training Details |
| |
|
| | ### Training Data |
| |
|
| | [chtmp223/CLIPPER-WritingPrompts](https://huggingface.co/datasets/chtmp223/CLIPPER-WritingPrompts) |
| |
|
| | ### Training Procedure |
| |
|
| | | **Configurations** | **Values** | |
| | |----------------------------------|--------------| |
| | | Hardware (Training and Inference)| 8xA100s | |
| | | Tracking | wandb | |
| | | batch size | 16 | |
| | | gradient_checkpointing | True | |
| | | learning_rate | 1.0e-5 | |
| | | lr_scheduler_type | cosine | |
| | | max_length | 131072 | |
| | | num_train_epochs | 1 | |
| | | optim | adamw_torch | |
| |
|
| | #### Software |
| |
|
| | Training code is adapted from [https://github.com/princeton-nlp/ProLong](https://github.com/princeton-nlp/ProLong). |
| |
|
| | ## π€ Inference |
| | Inference is done with [vLLM](https://github.com/vllm-project/vllm) on 1 A100-80GB. |
| |
|
| |
|
| | ## π Citation |
| |
|
| | ``` |
| | @misc{pham2025clippercompressionenableslongcontext, |
| | title={CLIPPER: Compression enables long-context synthetic data generation}, |
| | author={Chau Minh Pham and Yapei Chang and Mohit Iyyer}, |
| | year={2025}, |
| | eprint={2502.14854}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2502.14854}, |
| | } |
| | ``` |