Update README with working generation example

- Add Quick Start section with random initialization example
- Show actual generation code that works
- Include example output (random tokens)
- Update model configuration to reflect 24 layers
- Add note that config matches gpt-oss-20b
- Show ~2.4B parameters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

README.md +33 -17

README.md CHANGED Viewed

@@ -28,28 +28,39 @@ GptOssDense is a dense variant of the GptOss model architecture. While GptOss us
 ## Usage
-### Loading the Configuration and Model Class
-Since this repository contains the model architecture but not pre-trained weights, you can load the config and initialize a model:
 ```python
-from transformers import AutoConfig, AutoModelForCausalLM
-# Load config from Hub
-config = AutoConfig.from_pretrained(
-    "marksverdhei/gpt-oss-dense",
-    trust_remote_code=True
-)
 # Initialize model with random weights
-model = AutoModelForCausalLM.from_config(
-    config,
-    trust_remote_code=True
-)
-# If you have trained weights, save them:
-# model.save_pretrained("path/to/save")
-# Then upload: huggingface-cli upload marksverdhei/gpt-oss-dense path/to/save
 ```
 ### Loading Pre-trained Weights (when available)
@@ -92,16 +103,21 @@ model = GptOssDenseForCausalLM(config)
 ## Model Configuration
 - **Hidden size**: 2880
 - **Intermediate size**: 2880
-- **Number of layers**: 36
 - **Number of attention heads**: 64
 - **Number of key-value heads**: 8
 - **Head dimension**: 64
 - **Vocabulary size**: 201,088
 - **Max position embeddings**: 131,072
 - **Sliding window**: 128
 - **RoPE type**: YaRN with factor 32.0
 ## License

 ## Usage
+### Quick Start - Random Initialization
+Try the model with randomly initialized weights (outputs will be random):
 ```python
+from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load config and tokenizer
+config = AutoConfig.from_pretrained("marksverdhei/gpt-oss-dense", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")
 # Initialize model with random weights
+model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
+model.eval()
+# Generate text (will be random since model is not trained)
+prompt = "Hello, how are you?"
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        inputs.input_ids,
+        max_new_tokens=20,
+        do_sample=True,
+        temperature=1.0,
+        top_k=50,
+        pad_token_id=tokenizer.pad_token_id
+    )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Example output: "Hello, how are you? pronunci bhithCiudadstdafxipseігlanders導 conveyoruviainn"
+# (random tokens since model is not trained)
 ```
 ### Loading Pre-trained Weights (when available)
 ## Model Configuration
+Matches `openai/gpt-oss-20b` configuration (dense variant):
 - **Hidden size**: 2880
 - **Intermediate size**: 2880
+- **Number of layers**: 24
 - **Number of attention heads**: 64
 - **Number of key-value heads**: 8
 - **Head dimension**: 64
 - **Vocabulary size**: 201,088
 - **Max position embeddings**: 131,072
+- **Initial context length**: 4,096
 - **Sliding window**: 128
 - **RoPE type**: YaRN with factor 32.0
+- **SwiGLU limit**: 7.0
+- **Total parameters**: ~2.4B
 ## License