Spaces:
Running
Let's Talk about AI
Hello, here is an open space for everyone to talk, share, ask and show anything about AI.
Has anyone pre-trained LLM model from scratch ? If yes then share your experience, things to consider while training, notes, tips etc.
Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs
Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs
Hey @Shashank2k3 , if you want your own LLM model, first you need huge data. You can start with fine tuning already available good LLM models like Gemma, Phi, LLAMA, mistral etc with your dataset. Start with small models of sizes like 4 to 7B parameters. For pre-training LLM from scratch you need enormous data, good resources like heavy duty GPUs and CPUs and also have knowledge of training techniques, NLP, etc . You can always brainstorm with ChatGPT to get more knowledge.
Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโm looking to dive deeper into the world of LLMs.
Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโm looking to dive deeper into the world of LLMs.
Great !
Yupp so what you guys do, i mean profession!!!
Yupp so what you guys do, i mean profession!!!
I am an Associate Data Scientist at Casepoint.
What about you ?
https://huggingface.co/Shaligram-Dewangan/Dhi-5B-Base
my senior (3rd year) trained this model from scratch
There's no quantized version available ?
How to run it there is no config, modelling files so it's useless until we get them
Quantise by Our self bruh either in 4bit or in 8bit it to smaller weights only 16Gb
the repo does have config files, but the packaging is incomplete right now, it's not a inference-ready HF release
the repo does have config files, but the packaging is incomplete right now, it's not a inference-ready HF release
actually, the config file has nothing its empty atleast he should have given a modelling file to use that model
We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers
We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers
sorry Guyz i think it's pointless to talk to you all don't even have basic knowledge of LLM
We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers
This is not how things work safe tensor is just a format to save the tensors config is same for .pth and .safetensors
We need convert them Safetensors then only we will run and re structure the config.json
And TokensiersThis is not how things work safe tensor is just a format to save the tensors config is same for .pth and .
Lol ๐
Bhai, thoda repository khol ke dekh liya karo gyan pelne se pehle. Repo me Dhi-5B-Base.pt (raw PyTorch file) padi hai aur config.json ka size sirf 111 Bytes hai. 111 bytes me kisi 5B model ka architecture define nahi hota. Jab tak is .pt file ko .safetensors me convert karke, config file ko haath se re-structure (rewrite) nahi karoge, tab tak Hugging Face pipeline isko load tak nahi karegi. Basic model loading aur deployment seekh lo pehle, fir dusron ko sikhana ๐
We need convert them Safetensors then only we will run and re structure the config.json
And TokensiersThis is not how things work safe tensor is just a format to save the tensors config is same for .pth and .safetensors
Bhai, thoda repository khol ke dekh liya karo gyan pelne se pehle. Repo me Dhi-5B-Base.pt (raw PyTorch file) padi hai aur config.json ka size sirf 111 Bytes hai. 111 bytes me kisi 5B model ka architecture define nahi hota. Jab tak is .pt file ko .safetensors me convert karke, config file ko haath se re-structure (rewrite) nahi karoge, tab tak Hugging Face pipeline isko load tak nahi karegi. Basic model loading aur deployment seekh lo pehle, fir dusron ko sikhana ๐
so how you think .pt files are loaded without config file even if we don't have config file we need modelling file with all hyperparameters like layers dimensions even if we try to load it using torch we will need that modelling file so .pt and .safetensor doesn't decide if we need any config file of not