Qwen3-8B-Instruct

This model is the language model component extracted from Qwen/Qwen3-VL-8B-Instruct, a vision-language model.

The vision components have been removed, leaving only the pure text-generation LLM, which can be used independently for text-only tasks.

Model Details

  • Base Model: Qwen3-VL-8B-Instruct (language component only)
  • Model Type: Qwen3ForCausalLM
  • Parameters: ~8.2B (8,190,735,360)
  • Model Size: ~16GB
  • Precision: bfloat16
  • License: Apache 2.0

Architecture

  • Hidden Size: 4096
  • Intermediate Size: 12288
  • Number of Layers: 36
  • Attention Heads: 32 (8 KV heads, GQA)
  • Head Dimension: 128
  • Vocabulary Size: 151,936
  • Max Position Embeddings: 262,144
  • RoPE Theta: 5,000,000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "alexchen4ai/Qwen3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Extraction Process

This model was extracted from Qwen3-VL-8B-Instruct by:

  1. Loading all safetensors shards from the original model
  2. Filtering and extracting only the model.language_model.* weights
  3. Renaming keys to standard Qwen3 format (model.*)
  4. Preserving the lm_head for token prediction
  5. Creating a compatible Qwen3ForCausalLM config
  6. Copying tokenizer files and generation config

Differences from Original

  • Removed: All vision encoder components (model.visual.*)
  • Removed: Vision-language projection layers
  • Kept: Pure language model transformer layers
  • Kept: Token embeddings and LM head
  • Kept: All tokenizer files

Use Cases

This extracted model is suitable for:

  • Pure text generation tasks
  • Instruction following
  • Chat applications
  • Fine-tuning on text-only datasets
  • Integration with frameworks expecting standard causal LMs
  • Lower memory usage compared to the full VL model

Limitations

  • This model does not support vision inputs (images/videos)
  • For vision-language tasks, use the original Qwen3-VL-8B-Instruct

Citation

If you use this model, please cite the original Qwen3-VL work:

@article{qwen3vl,
  title={Qwen3-VL: Towards Versatile Vision-Language Understanding},
  author={Qwen Team},
  year={2024}
}

Acknowledgments

  • Original model by Qwen Team / Alibaba Cloud
  • Extraction performed for easier deployment in text-only scenarios
Downloads last month
87
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexchen4ai/Qwen3-8B-Instruct

Finetuned
(135)
this model