marksverdhei Claude commited on
Commit
e63df1f
·
1 Parent(s): c982305

Update README with working generation example

Browse files

- Add Quick Start section with random initialization example
- Show actual generation code that works
- Include example output (random tokens)
- Update model configuration to reflect 24 layers
- Add note that config matches gpt-oss-20b
- Show ~2.4B parameters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +33 -17
README.md CHANGED
@@ -28,28 +28,39 @@ GptOssDense is a dense variant of the GptOss model architecture. While GptOss us
28
 
29
  ## Usage
30
 
31
- ### Loading the Configuration and Model Class
32
 
33
- Since this repository contains the model architecture but not pre-trained weights, you can load the config and initialize a model:
34
 
35
  ```python
36
- from transformers import AutoConfig, AutoModelForCausalLM
 
37
 
38
- # Load config from Hub
39
- config = AutoConfig.from_pretrained(
40
- "marksverdhei/gpt-oss-dense",
41
- trust_remote_code=True
42
- )
43
 
44
  # Initialize model with random weights
45
- model = AutoModelForCausalLM.from_config(
46
- config,
47
- trust_remote_code=True
48
- )
49
-
50
- # If you have trained weights, save them:
51
- # model.save_pretrained("path/to/save")
52
- # Then upload: huggingface-cli upload marksverdhei/gpt-oss-dense path/to/save
 
 
 
 
 
 
 
 
 
 
 
 
53
  ```
54
 
55
  ### Loading Pre-trained Weights (when available)
@@ -92,16 +103,21 @@ model = GptOssDenseForCausalLM(config)
92
 
93
  ## Model Configuration
94
 
 
 
95
  - **Hidden size**: 2880
96
  - **Intermediate size**: 2880
97
- - **Number of layers**: 36
98
  - **Number of attention heads**: 64
99
  - **Number of key-value heads**: 8
100
  - **Head dimension**: 64
101
  - **Vocabulary size**: 201,088
102
  - **Max position embeddings**: 131,072
 
103
  - **Sliding window**: 128
104
  - **RoPE type**: YaRN with factor 32.0
 
 
105
 
106
  ## License
107
 
 
28
 
29
  ## Usage
30
 
31
+ ### Quick Start - Random Initialization
32
 
33
+ Try the model with randomly initialized weights (outputs will be random):
34
 
35
  ```python
36
+ from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
37
+ import torch
38
 
39
+ # Load config and tokenizer
40
+ config = AutoConfig.from_pretrained("marksverdhei/gpt-oss-dense", trust_remote_code=True)
41
+ tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")
 
 
42
 
43
  # Initialize model with random weights
44
+ model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
45
+ model.eval()
46
+
47
+ # Generate text (will be random since model is not trained)
48
+ prompt = "Hello, how are you?"
49
+ inputs = tokenizer(prompt, return_tensors="pt")
50
+
51
+ with torch.no_grad():
52
+ outputs = model.generate(
53
+ inputs.input_ids,
54
+ max_new_tokens=20,
55
+ do_sample=True,
56
+ temperature=1.0,
57
+ top_k=50,
58
+ pad_token_id=tokenizer.pad_token_id
59
+ )
60
+
61
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
62
+ # Example output: "Hello, how are you? pronunci bhithCiudadstdafxipseігlanders導 conveyoruviainn"
63
+ # (random tokens since model is not trained)
64
  ```
65
 
66
  ### Loading Pre-trained Weights (when available)
 
103
 
104
  ## Model Configuration
105
 
106
+ Matches `openai/gpt-oss-20b` configuration (dense variant):
107
+
108
  - **Hidden size**: 2880
109
  - **Intermediate size**: 2880
110
+ - **Number of layers**: 24
111
  - **Number of attention heads**: 64
112
  - **Number of key-value heads**: 8
113
  - **Head dimension**: 64
114
  - **Vocabulary size**: 201,088
115
  - **Max position embeddings**: 131,072
116
+ - **Initial context length**: 4,096
117
  - **Sliding window**: 128
118
  - **RoPE type**: YaRN with factor 32.0
119
+ - **SwiGLU limit**: 7.0
120
+ - **Total parameters**: ~2.4B
121
 
122
  ## License
123