Qwen3-8B-Instruct

This model is the language model component extracted from Qwen/Qwen3-VL-8B-Instruct, a vision-language model.

The vision components have been removed, leaving only the pure text-generation LLM, which can be used independently for text-only tasks.

Model Details

Base Model: Qwen3-VL-8B-Instruct (language component only)
Model Type: Qwen3ForCausalLM
Parameters: ~8.2B (8,190,735,360)
Model Size: ~16GB
Precision: bfloat16
License: Apache 2.0

Architecture

Hidden Size: 4096
Intermediate Size: 12288
Number of Layers: 36
Attention Heads: 32 (8 KV heads, GQA)
Head Dimension: 128
Vocabulary Size: 151,936
Max Position Embeddings: 262,144
RoPE Theta: 5,000,000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "alexchen4ai/Qwen3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Extraction Process

This model was extracted from Qwen3-VL-8B-Instruct by:

Loading all safetensors shards from the original model
Filtering and extracting only the model.language_model.* weights
Renaming keys to standard Qwen3 format (model.*)
Preserving the lm_head for token prediction
Creating a compatible Qwen3ForCausalLM config
Copying tokenizer files and generation config

Differences from Original

Removed: All vision encoder components (model.visual.*)
Removed: Vision-language projection layers
Kept: Pure language model transformer layers
Kept: Token embeddings and LM head
Kept: All tokenizer files

Use Cases

This extracted model is suitable for:

Pure text generation tasks
Instruction following
Chat applications
Fine-tuning on text-only datasets
Integration with frameworks expecting standard causal LMs
Lower memory usage compared to the full VL model

Limitations

This model does not support vision inputs (images/videos)
For vision-language tasks, use the original Qwen3-VL-8B-Instruct

Citation

If you use this model, please cite the original Qwen3-VL work:

@article{qwen3vl,
  title={Qwen3-VL: Towards Versatile Vision-Language Understanding},
  author={Qwen Team},
  year={2024}
}

Acknowledgments

Original model by Qwen Team / Alibaba Cloud
Extraction performed for easier deployment in text-only scenarios

Downloads last month: 243

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for alexchen4ai/Qwen3-8B-Instruct

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(341)

this model

Finetunes

1 model