BLOOMZ-1B1 Fine-tuned with Ray Train + DeepSpeed ZeRO-3
This model is a fine-tuned version of bigscience/bloomz-1b1, fine-tuned using Ray Train with DeepSpeed ZeRO-3 for scalable distributed training with minimal configuration overhead.
This model was fine-tuned using the IMDB dataset on 2 × T4 16GB GPUs, achieving 11% training loss reduction (from ~3.6 to ~3.2) with automated distributed orchestration.
For detailed implementation, Ray configurations, and distributed training setup, please check out the project repository.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support