Please share the fine tuning code

by enforcer007 - opened 17 days ago

Hi I came across your mode card and it's very informative to my current research. Can you please share the training code.

uditjain

Owner 17 days ago

nice to hear that! am currently in process of refining a paper i built using these adapter's, i hope them to be out soon and so will share the code github!

enforcer007

16 days ago

Sure, I’m unable to load the base 4B model with transformers 5.x and torch 2.10 and fails on building mamba-ssm.

Can you please share the package versions for torch, transformers and mamba-ssm you used for fine tuning.

uditjain

Owner 16 days ago

Unfortunately, you cannot use a 30B adapter with a 4B base model.
(Adapters are parameter-specific;
the weight matrices in the 30B adapter were trained for a model with a much larger hidden dimension and layer count, so they are physically incompatible with the 4B architecture.)

Failure in building is reasonable, i recommend these packages to use:
PyTorch: 2.12.0.dev20260407 (or at least 2.8+)
Transformers: 5.5.0
mamba-ssm: 2.3.1
causal-conv1d: 1.6.1

Build Tips:
Make sure you have ninja installed (pip install ninja) before building mamba-ssm.
Ensure your nvcc version matches your PyTorch CUDA version.
torch 2.10 is likely too old for the latest mamba-ssm kernels; upgrading to a newer torch release should resolve the build failure.

enforcer007

16 days ago

My bad, I was trying to fine tune 4B param model and found your model card. My question was specific to 4B than 30B.

I'll try with the versions you shared. Thanks

uditjain

Owner 16 days ago

No worries, i see there to be no fine tuned adapters/models for 4B model, lemme know if having such adapter can significantly help u in ur research – i will train a similar adapter for the 4B model and open source 😊.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment