chtmp223
/

ProLong-512k-8B-WritingPrompts

Model card Files Files and versions

ProLong-512k-8B-WritingPrompts / README.md

chtmp223's picture

Create README.md

e5a88a7 verified about 1 year ago

|

history blame contribute delete

2.22 kB

	---
	base_model:
	- princeton-nlp/Llama-3-8B-ProLong-512k-Instruct
	license: apache-2.0
	language:
	- en
	datasets:
	- chtmp223/CLIPPER-WritingPrompts
	---

	# ProLong-512k-8B-WritingPrompts
	ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset.
	Please check [our paper](https://arxiv.org/abs/2502.14854) for more details on the method.

	## 📒 Model Details

	### Model Description

	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct)

	### Model Sources

	- Repository: [Github repository](https://github.com/chtmp223/CLIPPER).
	- Paper: [https://arxiv.org/abs/2502.14854](https://arxiv.org/abs/2502.14854)


	## 💻 Training Details

	### Training Data

	[chtmp223/CLIPPER-WritingPrompts](https://huggingface.co/datasets/chtmp223/CLIPPER-WritingPrompts)

	### Training Procedure

	\| Configurations \| Values \|
	\|----------------------------------\|--------------\|
	\| Hardware (Training and Inference)\| 8xA100s \|
	\| Tracking \| wandb \|
	\| batch size \| 16 \|
	\| gradient_checkpointing \| True \|
	\| learning_rate \| 1.0e-5 \|
	\| lr_scheduler_type \| cosine \|
	\| max_length \| 131072 \|
	\| num_train_epochs \| 1 \|
	\| optim \| adamw_torch \|

	#### Software

	Training code is adapted from [https://github.com/princeton-nlp/ProLong](https://github.com/princeton-nlp/ProLong).

	## 🤗 Inference
	Inference is done with [vLLM](https://github.com/vllm-project/vllm) on 1 A100-80GB.


	## 📜 Citation

	```
	@misc{pham2025clippercompressionenableslongcontext,
	title={CLIPPER: Compression enables long-context synthetic data generation},
	author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
	year={2025},
	eprint={2502.14854},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.14854},
	}
	```