VishalPreetham
/

VISDOM

@@ -1,39 +1,31 @@
 model_name: VISDOM-32M
 license: mit
 language:
-- en
-  library_name: pytorch
-  license_name: MIT
-  tags:
-- causal-lm
-- gpt
-- pytorch
-- custom-code
-- sentencepiece
-- reinforcement-learning
-  pipeline_tag: text-generation
----
-# VISDOM-23M
-VISDOM-23M is a small decoder only GPT language model trained from scratch using pure PyTorch. The project also supports optional post training with supervised fine tuning, reward model training, and PPO reinforcement learning.
-This model is part of the VISDOM-23M project. It is intended for learning, experimentation, and small scale local inference, not production deployment.
-## Model Details
-Model type: decoder only causal language model
 Architecture: custom GPT Transformer implemented in PyTorch
-Parameter count: approximately 23M to 32M depending on tokenizer vocabulary and model configuration
 Context length: 256 tokens
@@ -41,53 +33,53 @@ Tokenizer: SentencePiece BPE trained within this project
 Training framework: pure PyTorch
-Intended use: text generation, instruction following experiments, and alignment experiments on a small local model
-## Training Summary
-The base model is trained from scratch on a local text corpus using next token prediction.
-Optional post training stages in this project include:
-1. Supervised fine tuning on prompt and response pairs
-2. Reward model training on chosen and rejected preference pairs
-3. PPO reinforcement learning using a frozen reference model and learned reward model
 If you are publishing a specific checkpoint, update this section to match what you uploaded.
-Base checkpoint: `checkpoints/best.pt`
-SFT checkpoint: `checkpoints/sft/best.pt`
-RL checkpoint: `checkpoints/rl/best.pt`
 Recommended note to keep or edit:
-`This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-23M project. It is not a standard Transformers checkpoint unless explicitly converted.`
-## Training Data
-The model is trained on user provided local text data and optional post training datasets prepared inside the repo.
 Potential data sources used in this project may include:
-1. Local raw text corpora for base pretraining
-2. Instruction tuning prompt and response pairs for SFT
-3. Preference datasets with chosen and rejected responses for reward model training
 Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
-## Intended Uses
-This model is intended for educational use, small scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
-This model is not intended for high stakes decision making, medical advice, legal advice, financial advice, safety critical systems, or production assistant behavior.
-## Limitations
 Small models of this size are much weaker than modern large language models.
-Output quality depends heavily on the training corpus and post training data.
 The model may hallucinate, repeat itself, or produce brittle responses.
@@ -95,25 +87,22 @@ Alignment behavior is limited by dataset size, reward model quality, and the lig
 Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
-## Bias, Risks, and Safety
 This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
 Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
-## How to Use
-This checkpoint is typically loaded with the VISDOM-23M project code rather than directly through `transformers`.
 Example local inference command:
-```bash
 python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
-```
 If this model repo includes the project files, a typical Python loading flow looks like this:
-```python
 import torch
 from src.model import GPTLanguageModel, config_from_dict
@@ -126,39 +115,49 @@ tokenizer = VisdomTokenizer("data/processed/visdom_tokenizer.model")
 model = GPTLanguageModel(config_from_dict(cfg))
 model.load_state_dict(checkpoint["model_state_dict"])
 model.eval()
-```
-## Repository Contents
-To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, `meta.json`, config file, model code, tokenizer code, generation script or demo script, and this model card.
-## Evaluation
-This project currently focuses more on end to end training and experimentation than benchmark reporting.
 If you have evaluation results, add them here.
 Suggested items to report:
-1. Validation loss after base training
-2. Validation loss after SFT
-3. Reward model validation accuracy
-4. Sample generations
-5. Qualitative before and after comparisons
-## Citation
 If you publish this model, you can cite the project like this:
-```bibtex
-@misc{visdom23m,
-  title        = {VISDOM-23M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
   author       = {YOUR_NAME_HERE},
   year         = {2026},
-  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/VISDOM-23M}}
-}
-```
-## Maintainer Notes
-Before uploading to Hugging Face, update the model name, author name, Hugging Face username or organization, exact checkpoint type, exact datasets used, license, and evaluation numbers.

 model_name: VISDOM-32M
 license: mit
 language:
+en
+library_name: pytorch
+license_name: mit
+tags:
+causal-lm
+gpt
+pytorch
+custom-code
+sentencepiece
+reinforcement-learning
+pipeline_tag: text-generation
+VISDOM-32M
+VISDOM-32M is a small decoder-only GPT language model trained from scratch using pure PyTorch. The project also supports optional post-training with supervised fine-tuning, reward model training, and PPO reinforcement learning.
+This model is part of the VISDOM-32M project. It is intended for learning, experimentation, and small-scale local inference, not production deployment.
+Model Details
+Model type: decoder-only causal language model
 Architecture: custom GPT Transformer implemented in PyTorch
+Parameter count: 32M depending on tokenizer vocabulary and model configuration
 Context length: 256 tokens
 Training framework: pure PyTorch
+Intended use: text generation, instruction-following experiments, and alignment experiments on a small local model
+Training Summary
+The base model is trained from scratch on a local text corpus using next-token prediction.
+Optional post-training stages in this project include:
+Supervised fine-tuning on prompt and response pairs
+Reward model training on chosen and rejected preference pairs
+PPO reinforcement learning using a frozen reference model and learned reward model
 If you are publishing a specific checkpoint, update this section to match what you uploaded.
+Base checkpoint: checkpoints/best.pt
+SFT checkpoint: checkpoints/sft/best.pt
+RL checkpoint: checkpoints/rl/best.pt
 Recommended note to keep or edit:
+This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-32M project. It is not a standard Transformers checkpoint unless explicitly converted.
+Training Data
+The model is trained on user-provided local text data and optional post-training datasets prepared inside the repo.
 Potential data sources used in this project may include:
+Local raw text corpora for base pretraining
+Instruction-tuning prompt and response pairs for SFT
+Preference datasets with chosen and rejected responses for reward model training
 Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
+Intended Uses
+This model is intended for educational use, small-scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
+This model is not intended for high-stakes decision making, medical advice, legal advice, financial advice, safety-critical systems, or production assistant behavior.
+Limitations
 Small models of this size are much weaker than modern large language models.
+Output quality depends heavily on the training corpus and post-training data.
 The model may hallucinate, repeat itself, or produce brittle responses.
 Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
+Bias, Risks, and Safety
 This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
 Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
+How to Use
+This checkpoint is typically loaded with the VISDOM-32M project code rather than directly through transformers.
 Example local inference command:
 python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
 If this model repo includes the project files, a typical Python loading flow looks like this:
 import torch
 from src.model import GPTLanguageModel, config_from_dict
 model = GPTLanguageModel(config_from_dict(cfg))
 model.load_state_dict(checkpoint["model_state_dict"])
 model.eval()
+Repository Contents
+To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, meta.json, config file, model code, tokenizer code, generation script or demo script, and this model card.
+Recommended files:
+README.md
+config.yaml
+meta.json
+generate.py
+requirements.txt
+checkpoints/
+  best.pt
+  sft/
+    best.pt
+  rl/
+    best.pt
+data/
+  processed/
+    visdom_tokenizer.model
+src/
+  model.py
+  tokenizer.py
+Evaluation
+This project currently focuses more on end-to-end training and experimentation than benchmark reporting.
 If you have evaluation results, add them here.
 Suggested items to report:
+Validation loss after base training
+Validation loss after SFT
+Reward model validation accuracy
+Sample generations
+Qualitative before and after comparisons
+Citation
 If you publish this model, you can cite the project like this:
+@misc{visdom32m,
+  title        = {VISDOM-32M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
   author       = {YOUR_NAME_HERE},
   year         = {2026},
+  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/VISDOM-32M}}
+}