Writer
/

palmyra-mini-thinking-a

@@ -1,177 +1,4 @@
----
-license: apache-2.0
-base_model:
-- palmyra-mini-thinking-a
-tags:
-- mlx
-- qwen2
-- palmyra
-- thinking
-- reasoning
----
-# Palmyra Mini Thinking A - MLX BF16
-## Model Description
-This is a bfloat16 precision version of the [palmyra-mini-thinking-a model](https://huggingface.co/Writer/palmyra-mini-thinking-a), optimized for Apple Silicon using the MLX framework. This model is based on the Qwen2 architecture and is specifically designed for reasoning tasks with explicit thinking capabilities through special `<think>` and `</think>` tokens.
-## Quick Start
-### Installation
-```bash
-pip install mlx-lm
-```
-### Usage
-```python
-from mlx_lm import load, generate
-# Load the model
-model, tokenizer = load("/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/MLX")
-# Generate text with thinking
-prompt = "Solve this step by step: What is 15% of 240?"
-response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)
-print(response)
-```
-## Technical Specifications
-### Model Architecture
-- **Model Type**: `qwen2` (Qwen2 Architecture)
-- **Architecture**: `Qwen2ForCausalLM`
-- **Parameters**: ~1.7 billion parameters
-- **Precision**: bfloat16
-- **Specialization**: Reasoning and thinking tasks
-### Core Parameters
-| Parameter | Value |
-|-----------|-------|
-| Hidden Size | 1,536 |
-| Intermediate Size | 8,960 |
-| Number of Layers | 28 |
-| Attention Heads | 12 |
-| Key-Value Heads | 2 |
-| Head Dimension | 128 |
-| Vocabulary Size | 151,665 |
-### Attention Mechanism
-- **Attention Type**: Full attention across all 28 layers
-- **Max Position Embeddings**: 131,072 tokens
-- **Attention Dropout**: 0.0
-- **Sliding Window**: Not used
-- **Max Window Layers**: 21
-### RoPE (Rotary Position Embedding) Configuration
-- **RoPE Theta**: 10,000
-- **RoPE Scaling**: None
-### Thinking Capabilities
-- **Thinking Tokens**: `<think>` (151648) and `</think>` (151649)
-- **Reasoning Mode**: Explicit step-by-step reasoning
-- **Chat Template**: Automatically adds `<think>` tag for generation prompts
-### File Structure
-```
-palmyra-mini-thinking-a/MLX/
-├── config.json                    # Model configuration
-├── model.safetensors              # Model weights (3.3GB)
-├── model.safetensors.index.json   # Model sharding index
-├── tokenizer.json                 # Tokenizer configuration
-├── tokenizer_config.json          # Tokenizer settings
-├── special_tokens_map.json        # Special tokens mapping
-├── chat_template.jinja            # Chat template with thinking
-└── README.md                      # Model documentation
-```
-## Performance Characteristics
-### Hardware Requirements
-- **Platform**: Apple Silicon (M1, M2, M3, M4 series)
-- **Memory**: ~3.3GB for model weights
-- **Recommended RAM**: 12GB+ for optimal performance
-- **Precision**: Full bfloat16 precision
-### Layer Configuration
-All 28 layers use full attention mechanism as specified in the `layer_types` configuration, providing consistent attention patterns across the entire model depth.
-## Training Details
-### Tokenizer
-- **Type**: LlamaTokenizerFast with 151,665 vocabulary size
-- **Special Tokens**:
-  - BOS Token ID: 151646 (`
-`)
-  - EOS Token ID: 151643 (`
-`)
-  - Pad Token ID: 151643 (`
-`)
-  - Think Start: 151648 (`<think>`)
-  - Think End: 151649 (`</think>`)
-### Model Configuration
-- **Hidden Activation**: SiLU (Swish)
-- **Normalization**: RMSNorm (ε = 1e-06)
-- **Initializer Range**: 0.02
-- **Attention Dropout**: 0.0
-- **Word Embeddings**: Not tied
-- **Use Cache**: False (optimized for thinking tasks)
-### Chat Template
-The model uses a specialized chat template that automatically initiates thinking mode:
-- User messages: `
-`
-- Assistant messages: `
-<｜Assistant｜><think>\n` (automatically adds thinking prompt)
-- Tool calling support with `<tool_call>` and `</tool_call>` tokens
-- Vision and multimodal tokens included
-## Usage Examples
-### Reasoning Task
-```python
-prompt = """
-A train travels 120 miles in 2 hours. If it maintains the same speed, how far will it travel in 5 hours?
-<｜Assistant｜><think>
-"""
-response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
-```
-### Problem Solving
-```python
-prompt = """
-Explain why the sky appears blue during the day.
-<｜Assistant｜><think>
-"""
-response = generate(model, tokenizer, prompt=prompt, max_tokens=400)
-```
-## Known Limitations
-1. **Platform Dependency**: Optimized specifically for Apple Silicon; may not run on other platforms
-2. **Memory Requirements**: Requires significant memory due to full precision weights
-3. **Thinking Overhead**: Explicit thinking may increase response length and generation time
-4. **Cache Disabled**: Model has `use_cache: false` which may impact inference speed
-## Compatibility
-- **MLX-LM**: Requires recent version with Qwen2 support
-- **Apple Silicon**: M1, M2, M3, M4 series processors
-- **macOS**: Compatible with recent macOS versions supporting MLX
-- **Transformers**: Version 4.52.4+
-## License
-Apache 2.0
-------
-# Original model Card: palmyra-mini-thinking-a
 ## Model Details


1	+ # Model: palmyra-mini-thinking-a













































































































































































2
3	## Model Details
4