Qwen3-8B-Instruct
This model is the language model component extracted from Qwen/Qwen3-VL-8B-Instruct, a vision-language model.
The vision components have been removed, leaving only the pure text-generation LLM, which can be used independently for text-only tasks.
Model Details
- Base Model: Qwen3-VL-8B-Instruct (language component only)
- Model Type: Qwen3ForCausalLM
- Parameters: ~8.2B (8,190,735,360)
- Model Size: ~16GB
- Precision: bfloat16
- License: Apache 2.0
Architecture
- Hidden Size: 4096
- Intermediate Size: 12288
- Number of Layers: 36
- Attention Heads: 32 (8 KV heads, GQA)
- Head Dimension: 128
- Vocabulary Size: 151,936
- Max Position Embeddings: 262,144
- RoPE Theta: 5,000,000
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "alexchen4ai/Qwen3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Extraction Process
This model was extracted from Qwen3-VL-8B-Instruct by:
- Loading all safetensors shards from the original model
- Filtering and extracting only the
model.language_model.*weights - Renaming keys to standard Qwen3 format (
model.*) - Preserving the
lm_headfor token prediction - Creating a compatible Qwen3ForCausalLM config
- Copying tokenizer files and generation config
Differences from Original
- Removed: All vision encoder components (
model.visual.*) - Removed: Vision-language projection layers
- Kept: Pure language model transformer layers
- Kept: Token embeddings and LM head
- Kept: All tokenizer files
Use Cases
This extracted model is suitable for:
- Pure text generation tasks
- Instruction following
- Chat applications
- Fine-tuning on text-only datasets
- Integration with frameworks expecting standard causal LMs
- Lower memory usage compared to the full VL model
Limitations
- This model does not support vision inputs (images/videos)
- For vision-language tasks, use the original Qwen3-VL-8B-Instruct
Citation
If you use this model, please cite the original Qwen3-VL work:
@article{qwen3vl,
title={Qwen3-VL: Towards Versatile Vision-Language Understanding},
author={Qwen Team},
year={2024}
}
Acknowledgments
- Original model by Qwen Team / Alibaba Cloud
- Extraction performed for easier deployment in text-only scenarios
- Downloads last month
- 87
Model tree for alexchen4ai/Qwen3-8B-Instruct
Base model
Qwen/Qwen3-VL-8B-Instruct