C10X
/

test46

Model card Files Files and versions

test46 / USAGE_GUIDE.md

C10X's picture

Upload 9 files

9455673 verified 4 months ago

|

history blame contribute delete

927 Bytes

	# Qwen3 Model with Falcon Tokenizer - Usage Guide

	## Model Details
	- Architecture: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
	- Tokenizer: Falcon-H1-0.5B-Instruct (32K vocabulary)
	- Special Tokens:
	- BOS: <\|begin_of_text\|> (id: 17)
	- EOS: <\|end_of_text\|> (id: 11)
	- PAD: <\|pad\|> (id: 0)

	## Important Notes
	1. This model combines Qwen3 architecture with Falcon tokenizer
	2. The vocabulary size is 32K (Falcon standard)
	3. Model uses Qwen3-specific features like q_norm/k_norm layers
	4. All token IDs should be within 0-32767 range

	## Batch Processing Tips
	- Use conservative batch sizes (start with 1-4)
	- Ensure all sequences are properly padded
	- Monitor CUDA memory usage
	- Use torch.no_grad() for inference

	## Common Issues
	- If CUDA errors occur, check token IDs are < 32768
	- Ensure proper padding with <\|pad\|> token
	- Use consistent tokenization for batches