# Qwen3 8M Model with Falcon-H1-0.5B-Instruct Tokenizer ## Model Description This is an 8M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer. - **Architecture**: Qwen3 (transformer with Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE) - **Tokenizer**: Falcon-H1-0.5B-Instruct - **Parameters**: 2,183,552 - **Precision**: BF16 - **Format**: SafeTensors ## Configuration - vocab_size: 32768 - hidden_size: 64 - num_attention_heads: 4 - num_key_value_heads: 2 - num_hidden_layers: 2 - intermediate_size: 160 - head_dim: 16 - max_position_embeddings: 4096 ## Usage ```python from transformers import Qwen3ForCausalLM, AutoTokenizer model = Qwen3ForCausalLM.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer") tokenizer = AutoTokenizer.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer") # Generate text inputs = tokenizer("Hello, world!", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Notes - This model uses the Qwen3 architecture but with Falcon-H1-0.5B-Instruct tokenizer - The model is initialized with random weights and should be fine-tuned for specific tasks - Compatible with the Qwen3 model family APIs and interfaces