C10X
/

test50

Model card Files Files and versions

test50 / README.md

C10X's picture

Update README.md

83aab6c verified 4 months ago

|

history blame contribute delete

1.94 kB

	---
	metrics:
	name: arc:easy
	value: 27.36
	---
	---
	# Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer

	## Model Description
	This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).

	- Architecture: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
	- Tokenizer: Falcon-H1-0.5B-Instruct (32K vocab)
	- Parameters: 11,014,272
	- Precision: BF16
	- Format: SafeTensors
	- Vocabulary Size: 32768
	- Use Case: Desktop applications, balanced performance (true 16M params)

	## Configuration
	- vocab_size: 32768
	- hidden_size: 128
	- num_attention_heads: 16
	- num_key_value_heads: 4
	- num_hidden_layers: 8
	- intermediate_size: 512
	- head_dim: 128
	- max_position_embeddings: 8192

	## Special Tokens
	- BOS: <\|begin_of_text\|> (id: 17)
	- EOS: <\|end_of_text\|> (id: 11)
	- PAD: <\|pad\|> (id: 0)

	## Usage
	```python
	from transformers import Qwen3ForCausalLM, AutoTokenizer

	model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
	tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")

	# Generate text
	inputs = tokenizer("Hello, world!", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	# Batch processing (start small)
	texts = ["Hello", "How are you", "Good morning"]
	inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=20)
	```

	## Important Notes
	- Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
	- All token IDs must be < 32768 to avoid CUDA errors
	- Start with small batch sizes (1-4) and gradually increase
	- Use proper padding to prevent dimension mismatches
	- Model initialized with random weights - requires fine-tuning
	- Compatible with Qwen3 APIs but uses Falcon vocabulary