C10X
/

test46

Model card Files Files and versions

test46 / README.md

C10X's picture

Upload 9 files

9455673 verified 4 months ago

|

history blame contribute delete

1.87 kB

	# Qwen3 8M Model with Falcon-H1-0.5B-Instruct Tokenizer

	## Model Description
	This is an 8M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).

	- Architecture: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
	- Tokenizer: Falcon-H1-0.5B-Instruct (32K vocab)
	- Parameters: 2,183,552
	- Precision: BF16
	- Format: SafeTensors
	- Vocabulary Size: 32768

	## Configuration
	- vocab_size: 32768
	- hidden_size: 64
	- num_attention_heads: 4
	- num_key_value_heads: 2
	- num_hidden_layers: 2
	- intermediate_size: 160
	- head_dim: 16
	- max_position_embeddings: 4096

	## Special Tokens
	- BOS: <\|begin_of_text\|> (id: 17)
	- EOS: <\|end_of_text\|> (id: 11)
	- PAD: <\|pad\|> (id: 0)

	## Usage
	```python
	from transformers import Qwen3ForCausalLM, AutoTokenizer

	model = Qwen3ForCausalLM.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")
	tokenizer = AutoTokenizer.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")

	# Generate text
	inputs = tokenizer("Hello, world!", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	# Batch processing (start small)
	texts = ["Hello", "How are you", "Good morning"]
	inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=20)
	```

	## Important Notes
	- Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
	- All token IDs must be < 32768 to avoid CUDA errors
	- Start with small batch sizes (1-4) and gradually increase
	- Use proper padding to prevent dimension mismatches
	- Model initialized with random weights - requires fine-tuning
	- Compatible with Qwen3 APIs but uses Falcon vocabulary