Tips on loading model with low memory

by ryanramos - opened Jan 27, 2023

Jan 27, 2023

Was just wondering if anyone's been able to load this model in something akin to a free Colab runtime i.e. ~12GB RAM, Tesla T4? I've tried the code snippet for loading the model in 8 bit precision (so I've got the device_map set to "auto") and have no luck. Luckily for me the model is already sharded (I can't normally load an 11B T5 without sharding) but I'm guessing I still can't handle the current shard size.

Muennighoff

BigScience Workshop org Feb 7, 2023

If it's just inference, something like https://huggingface.co/bigscience/bloomz/discussions/28 may work!

ryanramos

Feb 8, 2023

Thanks! I actually completely forgot about Petals. Might even use this for a different research project; thanks again!

Muennighoff

BigScience Workshop org Feb 8, 2023

👍 cc @borzunov

christopher changed discussion status to closed Jul 3, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment