alexchen4ai
/

Qwen3-8B-Instruct

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-VL-8B-Instruct
+tags:
+- qwen3
+- text-generation
+- llm
+- extracted
+language:
+- en
+- zh
+pipeline_tag: text-generation
+---
+# Qwen3-8B-Instruct
+This model is the **language model component** extracted from [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct), a vision-language model.
+The vision components have been removed, leaving only the pure text-generation LLM, which can be used independently for text-only tasks.
+## Model Details
+- **Base Model**: Qwen3-VL-8B-Instruct (language component only)
+- **Model Type**: Qwen3ForCausalLM
+- **Parameters**: ~8.2B (8,190,735,360)
+- **Model Size**: ~16GB
+- **Precision**: bfloat16
+- **License**: Apache 2.0
+## Architecture
+- **Hidden Size**: 4096
+- **Intermediate Size**: 12288
+- **Number of Layers**: 36
+- **Attention Heads**: 32 (8 KV heads, GQA)
+- **Head Dimension**: 128
+- **Vocabulary Size**: 151,936
+- **Max Position Embeddings**: 262,144
+- **RoPE Theta**: 5,000,000
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "alexchen4ai/Qwen3-8B-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is the capital of France?"}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Extraction Process
+This model was extracted from Qwen3-VL-8B-Instruct by:
+1. Loading all safetensors shards from the original model
+2. Filtering and extracting only the `model.language_model.*` weights
+3. Renaming keys to standard Qwen3 format (`model.*`)
+4. Preserving the `lm_head` for token prediction
+5. Creating a compatible Qwen3ForCausalLM config
+6. Copying tokenizer files and generation config
+## Differences from Original
+- **Removed**: All vision encoder components (`model.visual.*`)
+- **Removed**: Vision-language projection layers
+- **Kept**: Pure language model transformer layers
+- **Kept**: Token embeddings and LM head
+- **Kept**: All tokenizer files
+## Use Cases
+This extracted model is suitable for:
+- Pure text generation tasks
+- Instruction following
+- Chat applications
+- Fine-tuning on text-only datasets
+- Integration with frameworks expecting standard causal LMs
+- Lower memory usage compared to the full VL model
+## Limitations
+- This model does **not** support vision inputs (images/videos)
+- For vision-language tasks, use the original [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
+## Citation
+If you use this model, please cite the original Qwen3-VL work:
+```bibtex
+@article{qwen3vl,
+  title={Qwen3-VL: Towards Versatile Vision-Language Understanding},
+  author={Qwen Team},
+  year={2024}
+}
+```
+## Acknowledgments
+- Original model by Qwen Team / Alibaba Cloud
+- Extraction performed for easier deployment in text-only scenarios