C10X
/

test45

Model card Files Files and versions

test45 / README.md

C10X's picture

Upload 8 files

009cd62 verified 4 months ago

|

history blame contribute delete

1.33 kB

	# Qwen3 8M Model with Falcon-H1-0.5B-Instruct Tokenizer

	## Model Description
	This is an 8M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer.

	- Architecture: Qwen3 (transformer with Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
	- Tokenizer: Falcon-H1-0.5B-Instruct
	- Parameters: 2,183,552
	- Precision: BF16
	- Format: SafeTensors

	## Configuration
	- vocab_size: 32768
	- hidden_size: 64
	- num_attention_heads: 4
	- num_key_value_heads: 2
	- num_hidden_layers: 2
	- intermediate_size: 160
	- head_dim: 16
	- max_position_embeddings: 4096

	## Usage
	```python
	from transformers import Qwen3ForCausalLM, AutoTokenizer

	model = Qwen3ForCausalLM.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")
	tokenizer = AutoTokenizer.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")

	# Generate text
	inputs = tokenizer("Hello, world!", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Notes
	- This model uses the Qwen3 architecture but with Falcon-H1-0.5B-Instruct tokenizer
	- The model is initialized with random weights and should be fine-tuned for specific tasks
	- Compatible with the Qwen3 model family APIs and interfaces