activeDap
/

pythia-1.4b_hh_helpful

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

pythia-1.4b_hh_helpful / README.md

BootsofLagrangian's picture

BootsofLagrangian

Upload folder using huggingface_hub

e5d8509 verified 4 months ago

|

history blame contribute delete

2.21 kB

	---
	license: apache-2.0
	base_model: EleutherAI/pythia-1.4b
	tags:
	- generated_from_trainer
	- sft
	- ultrafeedback
	datasets:
	- activeDap/sft-hh-data
	language:
	- en
	library_name: transformers
	---

	# pythia-1.4b Fine-tuned on sft-hh-data

	This model is a fine-tuned version of [EleutherAI/pythia-1.4b](https://huggingface.co/EleutherAI/pythia-1.4b) on the [activeDap/sft-hh-data](https://huggingface.co/datasets/activeDap/sft-hh-data) dataset.

	## Training Results

	![Training Loss](loss_plot.png)

	### Training Statistics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Steps \| 20 \|
	\| Final Training Loss \| 34.0845 \|
	\| Min Training Loss \| 34.0845 \|
	\| Training Runtime \| 6.19 seconds \|
	\| Samples/Second \| 197.71 \|

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| EleutherAI/pythia-1.4b \|
	\| Dataset \| activeDap/sft-hh-data \|
	\| Number of Epochs \| 1.0 \|
	\| Per Device Batch Size \| 16 \|
	\| Gradient Accumulation Steps \| 1 \|
	\| Total Batch Size \| 64 (4 GPUs) \|
	\| Learning Rate \| 2e-05 \|
	\| LR Scheduler \| cosine \|
	\| Warmup Ratio \| 0.1 \|
	\| Max Sequence Length \| 512 \|
	\| Optimizer \| adamw_torch_fused \|
	\| Mixed Precision \| BF16 \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "activeDap/pythia-1.4b_sft-hh-data"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Format input with prompt template
	prompt = "What is machine learning?\nAssistant:"
	inputs = tokenizer(prompt, return_tensors="pt")

	# Generate response
	outputs = model.generate(**inputs, max_new_tokens=100)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Training Framework

	- Library: Transformers + TRL
	- Training Type: Supervised Fine-Tuning (SFT)
	- Format: Prompt-completion with Assistant-only loss

	## Citation

	If you use this model, please cite the original base model and dataset:

	```bibtex
	@misc{ultrafeedback2023,
	title={UltraFeedback: Boosting Language Models with High-quality Feedback},
	author={Ganqu Cui and Lifan Yuan and Ning Ding and others},
	year={2023},
	eprint={2310.01377},
	archivePrefix={arXiv}
	}
	```