Instructions to use projecte-aina/Plume256k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use projecte-aina/Plume256k with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="projecte-aina/Plume256k")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("projecte-aina/Plume256k") model = AutoModelForCausalLM.from_pretrained("projecte-aina/Plume256k") - Notebooks
- Google Colab
- Kaggle
Float32 training
Hello!
First of all, thank you for your work; I find it very interesting and useful!
While reading through the documentation, I found that you have trained the model using float32:
We use DeepSpeed with full float32 training.
I find this surprising, as most models are trained with bfloat16. I was curious about why you made this decision. Was the model performance better with float32 than bfloat16?
Hello!
Thank you for your interest in our model.
At the beginning of the project, we observed some training instability due to deepspeed and fp16. As fp32 didn't cause any memory bottlenecks in our configuration, we decided to train it this way.
We continued working on the problems, and they are solved now. Future models will be trained using fp16 instead.