| # Qwen3 8M Model with Falcon-H1-0.5B-Instruct Tokenizer | |
| ## Model Description | |
| This is an 8M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer. | |
| - **Architecture**: Qwen3 (transformer with Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE) | |
| - **Tokenizer**: Falcon-H1-0.5B-Instruct | |
| - **Parameters**: 2,183,552 | |
| - **Precision**: BF16 | |
| - **Format**: SafeTensors | |
| ## Configuration | |
| - vocab_size: 32768 | |
| - hidden_size: 64 | |
| - num_attention_heads: 4 | |
| - num_key_value_heads: 2 | |
| - num_hidden_layers: 2 | |
| - intermediate_size: 160 | |
| - head_dim: 16 | |
| - max_position_embeddings: 4096 | |
| ## Usage | |
| ```python | |
| from transformers import Qwen3ForCausalLM, AutoTokenizer | |
| model = Qwen3ForCausalLM.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer") | |
| tokenizer = AutoTokenizer.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer") | |
| # Generate text | |
| inputs = tokenizer("Hello, world!", return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=50) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Notes | |
| - This model uses the Qwen3 architecture but with Falcon-H1-0.5B-Instruct tokenizer | |
| - The model is initialized with random weights and should be fine-tuned for specific tasks | |
| - Compatible with the Qwen3 model family APIs and interfaces | |