Please share the fine tuning code
Hi I came across your mode card and it's very informative to my current research. Can you please share the training code.
nice to hear that! am currently in process of refining a paper i built using these adapter's, i hope them to be out soon and so will share the code github!
Sure, Iβm unable to load the base 4B model with transformers 5.x and torch 2.10 and fails on building mamba-ssm.
Can you please share the package versions for torch, transformers and mamba-ssm you used for fine tuning.
Unfortunately, you cannot use a 30B adapter with a 4B base model.
(Adapters are parameter-specific;
the weight matrices in the 30B adapter were trained for a model with a much larger hidden dimension and layer count, so they are physically incompatible with the 4B architecture.)
Failure in building is reasonable, i recommend these packages to use:
PyTorch: 2.12.0.dev20260407 (or at least 2.8+)
Transformers: 5.5.0
mamba-ssm: 2.3.1
causal-conv1d: 1.6.1
Build Tips:
Make sure you have ninja installed (pip install ninja) before building mamba-ssm.
Ensure your nvcc version matches your PyTorch CUDA version.
torch 2.10 is likely too old for the latest mamba-ssm kernels; upgrading to a newer torch release should resolve the build failure.
My bad, I was trying to fine tune 4B param model and found your model card. My question was specific to 4B than 30B.
I'll try with the versions you shared. Thanks
No worries, i see there to be no fine tuned adapters/models for 4B model, lemme know if having such adapter can significantly help u in ur research β i will train a similar adapter for the 4B model and open source π.