VishalPreetham commited on
Commit
5340bf6
·
verified ·
1 Parent(s): 39f2e83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -78
README.md CHANGED
@@ -1,39 +1,31 @@
1
-
2
-
3
  model_name: VISDOM-32M
4
  license: mit
5
  language:
6
 
7
- - en
8
- library_name: pytorch
9
- license_name: MIT
10
- tags:
11
- - causal-lm
12
- - gpt
13
- - pytorch
14
- - custom-code
15
- - sentencepiece
16
- - reinforcement-learning
17
- pipeline_tag: text-generation
18
-
19
-
20
-
21
-
22
- ---
23
-
24
- # VISDOM-23M
25
 
26
- VISDOM-23M is a small decoder only GPT language model trained from scratch using pure PyTorch. The project also supports optional post training with supervised fine tuning, reward model training, and PPO reinforcement learning.
27
 
28
- This model is part of the VISDOM-23M project. It is intended for learning, experimentation, and small scale local inference, not production deployment.
29
 
30
- ## Model Details
31
 
32
- Model type: decoder only causal language model
33
 
34
  Architecture: custom GPT Transformer implemented in PyTorch
35
 
36
- Parameter count: approximately 23M to 32M depending on tokenizer vocabulary and model configuration
37
 
38
  Context length: 256 tokens
39
 
@@ -41,53 +33,53 @@ Tokenizer: SentencePiece BPE trained within this project
41
 
42
  Training framework: pure PyTorch
43
 
44
- Intended use: text generation, instruction following experiments, and alignment experiments on a small local model
45
 
46
- ## Training Summary
47
 
48
- The base model is trained from scratch on a local text corpus using next token prediction.
49
 
50
- Optional post training stages in this project include:
51
 
52
- 1. Supervised fine tuning on prompt and response pairs
53
- 2. Reward model training on chosen and rejected preference pairs
54
- 3. PPO reinforcement learning using a frozen reference model and learned reward model
55
 
56
  If you are publishing a specific checkpoint, update this section to match what you uploaded.
57
 
58
- Base checkpoint: `checkpoints/best.pt`
59
 
60
- SFT checkpoint: `checkpoints/sft/best.pt`
61
 
62
- RL checkpoint: `checkpoints/rl/best.pt`
63
 
64
  Recommended note to keep or edit:
65
 
66
- `This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-23M project. It is not a standard Transformers checkpoint unless explicitly converted.`
67
 
68
- ## Training Data
69
 
70
- The model is trained on user provided local text data and optional post training datasets prepared inside the repo.
71
 
72
  Potential data sources used in this project may include:
73
 
74
- 1. Local raw text corpora for base pretraining
75
- 2. Instruction tuning prompt and response pairs for SFT
76
- 3. Preference datasets with chosen and rejected responses for reward model training
77
 
78
  Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
79
 
80
- ## Intended Uses
81
 
82
- This model is intended for educational use, small scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
83
 
84
- This model is not intended for high stakes decision making, medical advice, legal advice, financial advice, safety critical systems, or production assistant behavior.
85
 
86
- ## Limitations
87
 
88
  Small models of this size are much weaker than modern large language models.
89
 
90
- Output quality depends heavily on the training corpus and post training data.
91
 
92
  The model may hallucinate, repeat itself, or produce brittle responses.
93
 
@@ -95,25 +87,22 @@ Alignment behavior is limited by dataset size, reward model quality, and the lig
95
 
96
  Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
97
 
98
- ## Bias, Risks, and Safety
99
 
100
  This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
101
 
102
  Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
103
 
104
- ## How to Use
105
 
106
- This checkpoint is typically loaded with the VISDOM-23M project code rather than directly through `transformers`.
107
 
108
  Example local inference command:
109
 
110
- ```bash
111
  python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
112
- ```
113
 
114
  If this model repo includes the project files, a typical Python loading flow looks like this:
115
 
116
- ```python
117
  import torch
118
 
119
  from src.model import GPTLanguageModel, config_from_dict
@@ -126,39 +115,49 @@ tokenizer = VisdomTokenizer("data/processed/visdom_tokenizer.model")
126
  model = GPTLanguageModel(config_from_dict(cfg))
127
  model.load_state_dict(checkpoint["model_state_dict"])
128
  model.eval()
129
- ```
130
-
131
- ## Repository Contents
132
-
133
- To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, `meta.json`, config file, model code, tokenizer code, generation script or demo script, and this model card.
134
-
135
- ## Evaluation
136
-
137
- This project currently focuses more on end to end training and experimentation than benchmark reporting.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  If you have evaluation results, add them here.
140
 
141
  Suggested items to report:
142
 
143
- 1. Validation loss after base training
144
- 2. Validation loss after SFT
145
- 3. Reward model validation accuracy
146
- 4. Sample generations
147
- 5. Qualitative before and after comparisons
148
-
149
- ## Citation
150
 
151
  If you publish this model, you can cite the project like this:
152
 
153
- ```bibtex
154
- @misc{visdom23m,
155
- title = {VISDOM-23M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
156
  author = {YOUR_NAME_HERE},
157
  year = {2026},
158
- howpublished = {\url{https://huggingface.co/YOUR_USERNAME/VISDOM-23M}}
159
- }
160
- ```
161
-
162
- ## Maintainer Notes
163
-
164
- Before uploading to Hugging Face, update the model name, author name, Hugging Face username or organization, exact checkpoint type, exact datasets used, license, and evaluation numbers.
 
 
 
1
  model_name: VISDOM-32M
2
  license: mit
3
  language:
4
 
5
+ en
6
+ library_name: pytorch
7
+ license_name: mit
8
+ tags:
9
+ causal-lm
10
+ gpt
11
+ pytorch
12
+ custom-code
13
+ sentencepiece
14
+ reinforcement-learning
15
+ pipeline_tag: text-generation
16
+ VISDOM-32M
 
 
 
 
 
 
17
 
18
+ VISDOM-32M is a small decoder-only GPT language model trained from scratch using pure PyTorch. The project also supports optional post-training with supervised fine-tuning, reward model training, and PPO reinforcement learning.
19
 
20
+ This model is part of the VISDOM-32M project. It is intended for learning, experimentation, and small-scale local inference, not production deployment.
21
 
22
+ Model Details
23
 
24
+ Model type: decoder-only causal language model
25
 
26
  Architecture: custom GPT Transformer implemented in PyTorch
27
 
28
+ Parameter count: 32M depending on tokenizer vocabulary and model configuration
29
 
30
  Context length: 256 tokens
31
 
 
33
 
34
  Training framework: pure PyTorch
35
 
36
+ Intended use: text generation, instruction-following experiments, and alignment experiments on a small local model
37
 
38
+ Training Summary
39
 
40
+ The base model is trained from scratch on a local text corpus using next-token prediction.
41
 
42
+ Optional post-training stages in this project include:
43
 
44
+ Supervised fine-tuning on prompt and response pairs
45
+ Reward model training on chosen and rejected preference pairs
46
+ PPO reinforcement learning using a frozen reference model and learned reward model
47
 
48
  If you are publishing a specific checkpoint, update this section to match what you uploaded.
49
 
50
+ Base checkpoint: checkpoints/best.pt
51
 
52
+ SFT checkpoint: checkpoints/sft/best.pt
53
 
54
+ RL checkpoint: checkpoints/rl/best.pt
55
 
56
  Recommended note to keep or edit:
57
 
58
+ This Hugging Face repo currently contains a custom code checkpoint from the VISDOM-32M project. It is not a standard Transformers checkpoint unless explicitly converted.
59
 
60
+ Training Data
61
 
62
+ The model is trained on user-provided local text data and optional post-training datasets prepared inside the repo.
63
 
64
  Potential data sources used in this project may include:
65
 
66
+ Local raw text corpora for base pretraining
67
+ Instruction-tuning prompt and response pairs for SFT
68
+ Preference datasets with chosen and rejected responses for reward model training
69
 
70
  Before publishing, replace this section with the exact datasets you used, including corpus names, collection dates, filtering steps, cleaning steps, approximate size, licensing details, and redistribution constraints.
71
 
72
+ Intended Uses
73
 
74
+ This model is intended for educational use, small-scale experimentation, custom training pipeline testing, and studying the effects of SFT, reward modeling, and reinforcement learning on a compact model.
75
 
76
+ This model is not intended for high-stakes decision making, medical advice, legal advice, financial advice, safety-critical systems, or production assistant behavior.
77
 
78
+ Limitations
79
 
80
  Small models of this size are much weaker than modern large language models.
81
 
82
+ Output quality depends heavily on the training corpus and post-training data.
83
 
84
  The model may hallucinate, repeat itself, or produce brittle responses.
85
 
 
87
 
88
  Because this is a custom architecture package, downstream users may need this repo code to load and run the checkpoint.
89
 
90
+ Bias, Risks, and Safety
91
 
92
  This model can reflect biases, errors, and undesirable patterns present in its training data. It may generate incorrect, harmful, or misleading text, especially when prompted about sensitive topics.
93
 
94
  Use caution when sharing generations publicly or using this model in any workflow that could affect people materially.
95
 
96
+ How to Use
97
 
98
+ This checkpoint is typically loaded with the VISDOM-32M project code rather than directly through transformers.
99
 
100
  Example local inference command:
101
 
 
102
  python generate.py --checkpoint checkpoints/rl/best.pt --prompt "Explain entropy simply."
 
103
 
104
  If this model repo includes the project files, a typical Python loading flow looks like this:
105
 
 
106
  import torch
107
 
108
  from src.model import GPTLanguageModel, config_from_dict
 
115
  model = GPTLanguageModel(config_from_dict(cfg))
116
  model.load_state_dict(checkpoint["model_state_dict"])
117
  model.eval()
118
+ Repository Contents
119
+
120
+ To make this Hugging Face repo usable by others, include the model checkpoint file, tokenizer model file, meta.json, config file, model code, tokenizer code, generation script or demo script, and this model card.
121
+
122
+ Recommended files:
123
+
124
+ README.md
125
+ config.yaml
126
+ meta.json
127
+ generate.py
128
+ requirements.txt
129
+ checkpoints/
130
+ best.pt
131
+ sft/
132
+ best.pt
133
+ rl/
134
+ best.pt
135
+ data/
136
+ processed/
137
+ visdom_tokenizer.model
138
+ src/
139
+ model.py
140
+ tokenizer.py
141
+ Evaluation
142
+
143
+ This project currently focuses more on end-to-end training and experimentation than benchmark reporting.
144
 
145
  If you have evaluation results, add them here.
146
 
147
  Suggested items to report:
148
 
149
+ Validation loss after base training
150
+ Validation loss after SFT
151
+ Reward model validation accuracy
152
+ Sample generations
153
+ Qualitative before and after comparisons
154
+ Citation
 
155
 
156
  If you publish this model, you can cite the project like this:
157
 
158
+ @misc{visdom32m,
159
+ title = {VISDOM-32M: Train Your Own LLM From Scratch on an NVIDIA RTX GPU},
 
160
  author = {YOUR_NAME_HERE},
161
  year = {2026},
162
+ howpublished = {\url{https://huggingface.co/YOUR_USERNAME/VISDOM-32M}}
163
+ }