BLOOMZ-1B1 Fine-tuned with Ray Train + DeepSpeed ZeRO-3

This model is a fine-tuned version of bigscience/bloomz-1b1, fine-tuned using Ray Train with DeepSpeed ZeRO-3 for scalable distributed training with minimal configuration overhead.

This model was fine-tuned using the IMDB dataset on 2 × T4 16GB GPUs, achieving 11% training loss reduction (from ~3.6 to ~3.2) with automated distributed orchestration.

For detailed implementation, Ray configurations, and distributed training setup, please check out the project repository.

Downloads last month
-
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support