| | --- |
| | metrics: |
| | name: arc:easy |
| | value: 27.36 |
| | --- |
| | --- |
| | # Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer |
| |
|
| | ## Model Description |
| | This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary). |
| |
|
| | - **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE) |
| | - **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab) |
| | - **Parameters**: 11,014,272 |
| | - **Precision**: BF16 |
| | - **Format**: SafeTensors |
| | - **Vocabulary Size**: 32768 |
| | - **Use Case**: Desktop applications, balanced performance (true 16M params) |
| |
|
| | ## Configuration |
| | - vocab_size: 32768 |
| | - hidden_size: 128 |
| | - num_attention_heads: 16 |
| | - num_key_value_heads: 4 |
| | - num_hidden_layers: 8 |
| | - intermediate_size: 512 |
| | - head_dim: 128 |
| | - max_position_embeddings: 8192 |
| | |
| | ## Special Tokens |
| | - BOS: <|begin_of_text|> (id: 17) |
| | - EOS: <|end_of_text|> (id: 11) |
| | - PAD: <|pad|> (id: 0) |
| | |
| | ## Usage |
| | ```python |
| | from transformers import Qwen3ForCausalLM, AutoTokenizer |
| | |
| | model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer") |
| | tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer") |
| | |
| | # Generate text |
| | inputs = tokenizer("Hello, world!", return_tensors="pt") |
| | outputs = model.generate(**inputs, max_new_tokens=50) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |
| | # Batch processing (start small) |
| | texts = ["Hello", "How are you", "Good morning"] |
| | inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True) |
| | with torch.no_grad(): |
| | outputs = model.generate(**inputs, max_new_tokens=20) |
| | ``` |
| | |
| | ## Important Notes |
| | - Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary) |
| | - All token IDs must be < 32768 to avoid CUDA errors |
| | - Start with small batch sizes (1-4) and gradually increase |
| | - Use proper padding to prevent dimension mismatches |
| | - Model initialized with random weights - requires fine-tuning |
| | - Compatible with Qwen3 APIs but uses Falcon vocabulary |