tperes commited on
Commit
2f67765
·
verified ·
1 Parent(s): 567462f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -174
README.md CHANGED
@@ -1,177 +1,4 @@
1
- ---
2
- license: apache-2.0
3
- base_model:
4
- - palmyra-mini-thinking-a
5
- tags:
6
- - mlx
7
- - qwen2
8
- - palmyra
9
- - thinking
10
- - reasoning
11
- ---
12
-
13
- # Palmyra Mini Thinking A - MLX BF16
14
-
15
- ## Model Description
16
-
17
- This is a bfloat16 precision version of the [palmyra-mini-thinking-a model](https://huggingface.co/Writer/palmyra-mini-thinking-a), optimized for Apple Silicon using the MLX framework. This model is based on the Qwen2 architecture and is specifically designed for reasoning tasks with explicit thinking capabilities through special `<think>` and `</think>` tokens.
18
-
19
- ## Quick Start
20
-
21
- ### Installation
22
-
23
- ```bash
24
- pip install mlx-lm
25
- ```
26
-
27
- ### Usage
28
-
29
- ```python
30
- from mlx_lm import load, generate
31
-
32
- # Load the model
33
- model, tokenizer = load("/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/MLX")
34
-
35
- # Generate text with thinking
36
- prompt = "Solve this step by step: What is 15% of 240?"
37
- response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)
38
- print(response)
39
- ```
40
-
41
- ## Technical Specifications
42
-
43
- ### Model Architecture
44
- - **Model Type**: `qwen2` (Qwen2 Architecture)
45
- - **Architecture**: `Qwen2ForCausalLM`
46
- - **Parameters**: ~1.7 billion parameters
47
- - **Precision**: bfloat16
48
- - **Specialization**: Reasoning and thinking tasks
49
-
50
- ### Core Parameters
51
- | Parameter | Value |
52
- |-----------|-------|
53
- | Hidden Size | 1,536 |
54
- | Intermediate Size | 8,960 |
55
- | Number of Layers | 28 |
56
- | Attention Heads | 12 |
57
- | Key-Value Heads | 2 |
58
- | Head Dimension | 128 |
59
- | Vocabulary Size | 151,665 |
60
-
61
- ### Attention Mechanism
62
- - **Attention Type**: Full attention across all 28 layers
63
- - **Max Position Embeddings**: 131,072 tokens
64
- - **Attention Dropout**: 0.0
65
- - **Sliding Window**: Not used
66
- - **Max Window Layers**: 21
67
-
68
- ### RoPE (Rotary Position Embedding) Configuration
69
- - **RoPE Theta**: 10,000
70
- - **RoPE Scaling**: None
71
-
72
- ### Thinking Capabilities
73
- - **Thinking Tokens**: `<think>` (151648) and `</think>` (151649)
74
- - **Reasoning Mode**: Explicit step-by-step reasoning
75
- - **Chat Template**: Automatically adds `<think>` tag for generation prompts
76
-
77
- ### File Structure
78
- ```
79
- palmyra-mini-thinking-a/MLX/
80
- ├── config.json # Model configuration
81
- ├── model.safetensors # Model weights (3.3GB)
82
- ├── model.safetensors.index.json # Model sharding index
83
- ├── tokenizer.json # Tokenizer configuration
84
- ├── tokenizer_config.json # Tokenizer settings
85
- ├── special_tokens_map.json # Special tokens mapping
86
- ├── chat_template.jinja # Chat template with thinking
87
- └── README.md # Model documentation
88
- ```
89
-
90
- ## Performance Characteristics
91
-
92
- ### Hardware Requirements
93
- - **Platform**: Apple Silicon (M1, M2, M3, M4 series)
94
- - **Memory**: ~3.3GB for model weights
95
- - **Recommended RAM**: 12GB+ for optimal performance
96
- - **Precision**: Full bfloat16 precision
97
-
98
- ### Layer Configuration
99
- All 28 layers use full attention mechanism as specified in the `layer_types` configuration, providing consistent attention patterns across the entire model depth.
100
-
101
- ## Training Details
102
-
103
- ### Tokenizer
104
- - **Type**: LlamaTokenizerFast with 151,665 vocabulary size
105
- - **Special Tokens**:
106
- - BOS Token ID: 151646 (`
107
- `)
108
- - EOS Token ID: 151643 (`
109
- `)
110
- - Pad Token ID: 151643 (`
111
- `)
112
- - Think Start: 151648 (`<think>`)
113
- - Think End: 151649 (`</think>`)
114
-
115
- ### Model Configuration
116
- - **Hidden Activation**: SiLU (Swish)
117
- - **Normalization**: RMSNorm (ε = 1e-06)
118
- - **Initializer Range**: 0.02
119
- - **Attention Dropout**: 0.0
120
- - **Word Embeddings**: Not tied
121
- - **Use Cache**: False (optimized for thinking tasks)
122
-
123
- ### Chat Template
124
- The model uses a specialized chat template that automatically initiates thinking mode:
125
- - User messages: `
126
- `
127
- - Assistant messages: `
128
- <|Assistant|><think>\n` (automatically adds thinking prompt)
129
- - Tool calling support with `<tool_call>` and `</tool_call>` tokens
130
- - Vision and multimodal tokens included
131
-
132
- ## Usage Examples
133
-
134
- ### Reasoning Task
135
- ```python
136
- prompt = """
137
- A train travels 120 miles in 2 hours. If it maintains the same speed, how far will it travel in 5 hours?
138
- <|Assistant|><think>
139
- """
140
-
141
- response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
142
- ```
143
-
144
- ### Problem Solving
145
- ```python
146
- prompt = """
147
- Explain why the sky appears blue during the day.
148
- <|Assistant|><think>
149
- """
150
-
151
- response = generate(model, tokenizer, prompt=prompt, max_tokens=400)
152
- ```
153
-
154
- ## Known Limitations
155
-
156
- 1. **Platform Dependency**: Optimized specifically for Apple Silicon; may not run on other platforms
157
- 2. **Memory Requirements**: Requires significant memory due to full precision weights
158
- 3. **Thinking Overhead**: Explicit thinking may increase response length and generation time
159
- 4. **Cache Disabled**: Model has `use_cache: false` which may impact inference speed
160
-
161
- ## Compatibility
162
-
163
- - **MLX-LM**: Requires recent version with Qwen2 support
164
- - **Apple Silicon**: M1, M2, M3, M4 series processors
165
- - **macOS**: Compatible with recent macOS versions supporting MLX
166
- - **Transformers**: Version 4.52.4+
167
-
168
- ## License
169
-
170
- Apache 2.0
171
-
172
- ------
173
-
174
- # Original model Card: palmyra-mini-thinking-a
175
 
176
  ## Model Details
177
 
 
1
+ # Model: palmyra-mini-thinking-a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Model Details
4