Let's Talk about AI

#1
by kalashshah19 - opened
Indian AI Developers org
โ€ข
edited Aug 22, 2025

Hello, here is an open space for everyone to talk, share, ask and show anything about AI.

kalashshah19 pinned discussion
Indian AI Developers org

Has anyone pre-trained LLM model from scratch ? If yes then share your experience, things to consider while training, notes, tips etc.

Indian AI Developers org

Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs

Indian AI Developers org

Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs

Hey @Shashank2k3 , if you want your own LLM model, first you need huge data. You can start with fine tuning already available good LLM models like Gemma, Phi, LLAMA, mistral etc with your dataset. Start with small models of sizes like 4 to 7B parameters. For pre-training LLM from scratch you need enormous data, good resources like heavy duty GPUs and CPUs and also have knowledge of training techniques, NLP, etc . You can always brainstorm with ChatGPT to get more knowledge.

Indian AI Developers org

Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโ€™m looking to dive deeper into the world of LLMs.

Indian AI Developers org

Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโ€™m looking to dive deeper into the world of LLMs.

Great !

Indian AI Developers org

Yupp so what you guys do, i mean profession!!!

Indian AI Developers org

Yupp so what you guys do, i mean profession!!!

I am an Associate Data Scientist at Casepoint.
What about you ?

Indian AI Developers org

https://huggingface.co/Shaligram-Dewangan/Dhi-5B-Base

my senior (3rd year) trained this model from scratch

There's no quantized version available ?

How to run it there is no config, modelling files so it's useless until we get them

Quantise by Our self bruh either in 4bit or in 8bit it to smaller weights only 16Gb

Indian AI Developers org

the repo does have config files, but the packaging is incomplete right now, it's not a inference-ready HF release

Indian AI Developers org

the repo does have config files, but the packaging is incomplete right now, it's not a inference-ready HF release

actually, the config file has nothing its empty atleast he should have given a modelling file to use that model

Indian AI Developers org

We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers

Indian AI Developers org

We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers

sorry Guyz i think it's pointless to talk to you all don't even have basic knowledge of LLM

Indian AI Developers org

We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers

This is not how things work safe tensor is just a format to save the tensors config is same for .pth and .safetensors

Indian AI Developers org
โ€ข
edited 3 minutes ago

We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers

This is not how things work safe tensor is just a format to save the tensors config is same for .pth and .

Lol ๐Ÿ˜…

Bhai, thoda repository khol ke dekh liya karo gyan pelne se pehle. Repo me Dhi-5B-Base.pt (raw PyTorch file) padi hai aur config.json ka size sirf 111 Bytes hai. 111 bytes me kisi 5B model ka architecture define nahi hota. Jab tak is .pt file ko .safetensors me convert karke, config file ko haath se re-structure (rewrite) nahi karoge, tab tak Hugging Face pipeline isko load tak nahi karegi. Basic model loading aur deployment seekh lo pehle, fir dusron ko sikhana ๐Ÿ˜‚

Indian AI Developers org

We need convert them Safetensors then only we will run and re structure the config.json
And Tokensiers

This is not how things work safe tensor is just a format to save the tensors config is same for .pth and .safetensors

Bhai, thoda repository khol ke dekh liya karo gyan pelne se pehle. Repo me Dhi-5B-Base.pt (raw PyTorch file) padi hai aur config.json ka size sirf 111 Bytes hai. 111 bytes me kisi 5B model ka architecture define nahi hota. Jab tak is .pt file ko .safetensors me convert karke, config file ko haath se re-structure (rewrite) nahi karoge, tab tak Hugging Face pipeline isko load tak nahi karegi. Basic model loading aur deployment seekh lo pehle, fir dusron ko sikhana ๐Ÿ˜‚

so how you think .pt files are loaded without config file even if we don't have config file we need modelling file with all hyperparameters like layers dimensions even if we try to load it using torch we will need that modelling file so .pt and .safetensor doesn't decide if we need any config file of not

Sign up or log in to comment