| --- |
| model_name: VISDOM-32M |
| license: mit |
| language: |
| - en |
| library_name: pytorch |
| license_name: mit |
| tags: |
| - causal-lm |
| - gpt |
| - pytorch |
| - custom-code |
| - sentencepiece |
| - reinforcement-learning |
| pipeline_tag: text-generation |
| --- |
| |
| # VISDOM-32M |
|
|
| VISDOM-32M is a small decoder-only GPT language model trained from scratch using pure PyTorch. The project also supports optional post-training with supervised fine-tuning, reward model training, and PPO reinforcement learning. |
|
|
| This model is part of the VISDOM-32M project. It is intended for learning, experimentation, and small-scale local inference, not production deployment. |
|
|
| ## Model Details |
|
|
| Model type: decoder-only causal language model |
|
|
| Architecture: custom GPT Transformer implemented in PyTorch |
|
|
| Parameter count: 32M depending on tokenizer vocabulary and model configuration |
|
|
| Context length: 256 tokens |
|
|
| Tokenizer: SentencePiece BPE trained within this project |
|
|
| Training framework: pure PyTorch |
|
|
| Intended use: text generation, instruction-following experiments, and alignment experiments on a small local model |
|
|
| ## Training Summary |
|
|
| The base model is trained from scratch on a local text corpus using next-token prediction. |
|
|
| Optional post-training stages in this project include: |
|
|
| 1. Supervised fine-tuning on prompt and response pairs |
| 2. Reward model training on chosen and rejected preference pairs |
| 3. PPO reinforcement learning using a frozen reference model and learned reward model |
|
|
| If you are publishing a specific checkpoint, update this section to match what you uploaded. |
|
|
| Base checkpoint: `checkpoints/best.pt` |
|
|
| SFT checkpoint: `checkpoints/sft/best.pt` |
|
|
| RL checkpoint: `checkpoints/rl/best.pt` |
|
|
| Recommended note to keep or edit: |
|
|
| `This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-32M project. It is not a standard Transformers checkpoint unless explicitly converted.` |
|
|
| ## Training Data |
|
|
| The model is trained on user-provided local text data and optional post-training datasets prepared inside the repo. |
|
|
| Potential data sources used in this project may include: |
|
|
| 1. Local raw text corpora for base pretraining |
| 2. Instruction-tuning prompt and response pairs for SFT |
| 3. Preference datasets with chosen and rejected responses for reward model training |
|
|
| Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints. |
|
|
| ## Intended Uses |
|
|
| This model is intended for educational use, small-scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model. |
|
|
| This model is not intended for high-stakes decision making, medical advice, legal advice, financial advice, safety-critical systems, or production assistant behavior. |
|
|
| ## Limitations |
|
|
| Small models of this size are much weaker than modern large language models. |
|
|
| Output quality depends heavily on the training corpus and post-training data. |
|
|
| The model may hallucinate, repeat itself, or produce brittle responses. |
|
|
| Alignment behavior is limited by dataset size, reward model quality, and the lightweight PPO loop used in this repo. |
|
|
| Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint. |
|
|
| ## Bias, Risks, and Safety |
|
|
| This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics. |
|
|
| Use caution when sharing generations publicly or using this model in any workflow that could affect people materially. |
|
|
| ## How to Use |
|
|
| This checkpoint is typically loaded with the VISDOM-32M project code rather than directly through `transformers`. |
|
|
| Example local inference command: |
|
|
| ```bash |
| python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply." |
| ``` |
|
|
| If this model repo includes the project files, a typical Python loading flow looks like this: |
|
|
| ```python |
| import torch |
| |
| from src.model import GPTLanguageModel, config_from_dict |
| from src.tokenizer import VisdomTokenizer |
| |
| checkpoint = torch.load("checkpoints/rl/best.pt", map_location="cpu") |
| cfg = checkpoint["config"] |
| |
| tokenizer = VisdomTokenizer("data/processed/visdom_tokenizer.model") |
| model = GPTLanguageModel(config_from_dict(cfg)) |
| model.load_state_dict(checkpoint["model_state_dict"]) |
| model.eval() |
| ``` |
|
|
| ## Repository Contents |
|
|
| To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, `meta.json`, config file, model code, tokenizer code, generation script or demo script, and this model card. |
|
|
| Recommended files: |
|
|
| ```text |
| README.md |
| config.yaml |
| meta.json |
| generate.py |
| requirements.txt |
| checkpoints/ |
| best.pt |
| sft/ |
| best.pt |
| rl/ |
| best.pt |
| data/ |
| processed/ |
| visdom_tokenizer.model |
| src/ |
| model.py |
| tokenizer.py |
| ``` |
|
|
| ## Evaluation |
|
|
| This project currently focuses more on end-to-end training and experimentation than benchmark reporting. |
|
|
| If you have evaluation results, add them here. |
|
|
| Suggested items to report: |
|
|
| 1. Validation loss after base training |
| 2. Validation loss after SFT |
| 3. Reward model validation accuracy |
| 4. Sample generations |
| 5. Qualitative before and after comparisons |
|
|
| ## Citation |
|
|
| If you publish this model, you can cite the project like this: |
|
|
| ```bibtex |
| @misc{visdom32m, |
| title = {VISDOM-32M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU}, |
| author = {YOUR_NAME_HERE}, |
| year = {2026}, |
| howpublished = {https://huggingface.co/YOUR_USERNAME/VISDOM-32M} |
| } |
| ``` |
|
|
| ## Maintainer Notes |
|
|
| Before uploading to Hugging Face, update the model name, author name, Hugging Face username or organization, exact checkpoint type, exact datasets used, license, and evaluation numbers. |
|
|