VishalPreetham commited on
Commit
97aa0af
·
verified ·
1 Parent(s): 5340bf6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -38
README.md CHANGED
@@ -1,25 +1,27 @@
 
1
  model_name: VISDOM-32M
2
  license: mit
3
  language:
4
-
5
- en
6
  library_name: pytorch
7
  license_name: mit
8
  tags:
9
- causal-lm
10
- gpt
11
- pytorch
12
- custom-code
13
- sentencepiece
14
- reinforcement-learning
15
  pipeline_tag: text-generation
16
- VISDOM-32M
 
 
17
 
18
  VISDOM-32M is a small decoder-only GPT language model trained from scratch using pure PyTorch. The project also supports optional post-training with supervised fine-tuning, reward model training, and PPO reinforcement learning.
19
 
20
  This model is part of the VISDOM-32M project. It is intended for learning, experimentation, and small-scale local inference, not production deployment.
21
 
22
- Model Details
23
 
24
  Model type: decoder-only causal language model
25
 
@@ -35,47 +37,47 @@ Training framework: pure PyTorch
35
 
36
  Intended use: text generation, instruction-following experiments, and alignment experiments on a small local model
37
 
38
- Training Summary
39
 
40
  The base model is trained from scratch on a local text corpus using next-token prediction.
41
 
42
  Optional post-training stages in this project include:
43
 
44
- Supervised fine-tuning on prompt and response pairs
45
- Reward model training on chosen and rejected preference pairs
46
- PPO reinforcement learning using a frozen reference model and learned reward model
47
 
48
  If you are publishing a specific checkpoint, update this section to match what you uploaded.
49
 
50
- Base checkpoint: checkpoints/best.pt
51
 
52
- SFT checkpoint: checkpoints/sft/best.pt
53
 
54
- RL checkpoint: checkpoints/rl/best.pt
55
 
56
  Recommended note to keep or edit:
57
 
58
- This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-32M project. It is not a standard Transformers checkpoint unless explicitly converted.
59
 
60
- Training Data
61
 
62
  The model is trained on user-provided local text data and optional post-training datasets prepared inside the repo.
63
 
64
  Potential data sources used in this project may include:
65
 
66
- Local raw text corpora for base pretraining
67
- Instruction-tuning prompt and response pairs for SFT
68
- Preference datasets with chosen and rejected responses for reward model training
69
 
70
  Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
71
 
72
- Intended Uses
73
 
74
  This model is intended for educational use, small-scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
75
 
76
  This model is not intended for high-stakes decision making, medical advice, legal advice, financial advice, safety-critical systems, or production assistant behavior.
77
 
78
- Limitations
79
 
80
  Small models of this size are much weaker than modern large language models.
81
 
@@ -87,22 +89,25 @@ Alignment behavior is limited by dataset size, reward model quality, and the lig
87
 
88
  Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
89
 
90
- Bias, Risks, and Safety
91
 
92
  This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
93
 
94
  Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
95
 
96
- How to Use
97
 
98
- This checkpoint is typically loaded with the VISDOM-32M project code rather than directly through transformers.
99
 
100
  Example local inference command:
101
 
 
102
  python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
 
103
 
104
  If this model repo includes the project files, a typical Python loading flow looks like this:
105
 
 
106
  import torch
107
 
108
  from src.model import GPTLanguageModel, config_from_dict
@@ -115,12 +120,15 @@ tokenizer = VisdomTokenizer("data/processed/visdom_tokenizer.model")
115
  model = GPTLanguageModel(config_from_dict(cfg))
116
  model.load_state_dict(checkpoint["model_state_dict"])
117
  model.eval()
118
- Repository Contents
119
 
120
- To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, meta.json, config file, model code, tokenizer code, generation script or demo script, and this model card.
 
 
121
 
122
  Recommended files:
123
 
 
124
  README.md
125
  config.yaml
126
  meta.json
@@ -138,7 +146,9 @@ data/
138
  src/
139
  model.py
140
  tokenizer.py
141
- Evaluation
 
 
142
 
143
  This project currently focuses more on end-to-end training and experimentation than benchmark reporting.
144
 
@@ -146,18 +156,25 @@ If you have evaluation results, add them here.
146
 
147
  Suggested items to report:
148
 
149
- Validation loss after base training
150
- Validation loss after SFT
151
- Reward model validation accuracy
152
- Sample generations
153
- Qualitative before and after comparisons
154
- Citation
 
155
 
156
  If you publish this model, you can cite the project like this:
157
 
 
158
  @misc{visdom32m,
159
  title = {VISDOM-32M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
160
  author = {YOUR_NAME_HERE},
161
  year = {2026},
162
- howpublished = {\url{https://huggingface.co/YOUR_USERNAME/VISDOM-32M}}
163
- }
 
 
 
 
 
 
1
+ ---
2
  model_name: VISDOM-32M
3
  license: mit
4
  language:
5
+ - en
 
6
  library_name: pytorch
7
  license_name: mit
8
  tags:
9
+ - causal-lm
10
+ - gpt
11
+ - pytorch
12
+ - custom-code
13
+ - sentencepiece
14
+ - reinforcement-learning
15
  pipeline_tag: text-generation
16
+ ---
17
+
18
+ # VISDOM-32M
19
 
20
  VISDOM-32M is a small decoder-only GPT language model trained from scratch using pure PyTorch. The project also supports optional post-training with supervised fine-tuning, reward model training, and PPO reinforcement learning.
21
 
22
  This model is part of the VISDOM-32M project. It is intended for learning, experimentation, and small-scale local inference, not production deployment.
23
 
24
+ ## Model Details
25
 
26
  Model type: decoder-only causal language model
27
 
 
37
 
38
  Intended use: text generation, instruction-following experiments, and alignment experiments on a small local model
39
 
40
+ ## Training Summary
41
 
42
  The base model is trained from scratch on a local text corpus using next-token prediction.
43
 
44
  Optional post-training stages in this project include:
45
 
46
+ 1. Supervised fine-tuning on prompt and response pairs
47
+ 2. Reward model training on chosen and rejected preference pairs
48
+ 3. PPO reinforcement learning using a frozen reference model and learned reward model
49
 
50
  If you are publishing a specific checkpoint, update this section to match what you uploaded.
51
 
52
+ Base checkpoint: `checkpoints/best.pt`
53
 
54
+ SFT checkpoint: `checkpoints/sft/best.pt`
55
 
56
+ RL checkpoint: `checkpoints/rl/best.pt`
57
 
58
  Recommended note to keep or edit:
59
 
60
+ `This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-32M project. It is not a standard Transformers checkpoint unless explicitly converted.`
61
 
62
+ ## Training Data
63
 
64
  The model is trained on user-provided local text data and optional post-training datasets prepared inside the repo.
65
 
66
  Potential data sources used in this project may include:
67
 
68
+ 1. Local raw text corpora for base pretraining
69
+ 2. Instruction-tuning prompt and response pairs for SFT
70
+ 3. Preference datasets with chosen and rejected responses for reward model training
71
 
72
  Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
73
 
74
+ ## Intended Uses
75
 
76
  This model is intended for educational use, small-scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
77
 
78
  This model is not intended for high-stakes decision making, medical advice, legal advice, financial advice, safety-critical systems, or production assistant behavior.
79
 
80
+ ## Limitations
81
 
82
  Small models of this size are much weaker than modern large language models.
83
 
 
89
 
90
  Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
91
 
92
+ ## Bias, Risks, and Safety
93
 
94
  This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
95
 
96
  Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
97
 
98
+ ## How to Use
99
 
100
+ This checkpoint is typically loaded with the VISDOM-32M project code rather than directly through `transformers`.
101
 
102
  Example local inference command:
103
 
104
+ ```bash
105
  python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
106
+ ```
107
 
108
  If this model repo includes the project files, a typical Python loading flow looks like this:
109
 
110
+ ```python
111
  import torch
112
 
113
  from src.model import GPTLanguageModel, config_from_dict
 
120
  model = GPTLanguageModel(config_from_dict(cfg))
121
  model.load_state_dict(checkpoint["model_state_dict"])
122
  model.eval()
123
+ ```
124
 
125
+ ## Repository Contents
126
+
127
+ To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, `meta.json`, config file, model code, tokenizer code, generation script or demo script, and this model card.
128
 
129
  Recommended files:
130
 
131
+ ```text
132
  README.md
133
  config.yaml
134
  meta.json
 
146
  src/
147
  model.py
148
  tokenizer.py
149
+ ```
150
+
151
+ ## Evaluation
152
 
153
  This project currently focuses more on end-to-end training and experimentation than benchmark reporting.
154
 
 
156
 
157
  Suggested items to report:
158
 
159
+ 1. Validation loss after base training
160
+ 2. Validation loss after SFT
161
+ 3. Reward model validation accuracy
162
+ 4. Sample generations
163
+ 5. Qualitative before and after comparisons
164
+
165
+ ## Citation
166
 
167
  If you publish this model, you can cite the project like this:
168
 
169
+ ```bibtex
170
  @misc{visdom32m,
171
  title = {VISDOM-32M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
172
  author = {YOUR_NAME_HERE},
173
  year = {2026},
174
+ howpublished = {https://huggingface.co/YOUR_USERNAME/VISDOM-32M}
175
+ }
176
+ ```
177
+
178
+ ## Maintainer Notes
179
+
180
+ Before uploading to Hugging Face, update the model name, author name, Hugging Face username or organization, exact checkpoint type, exact datasets used, license, and evaluation numbers.