---
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- qwen
- qwen2.5
- instruct
- runpod
- serverless
language:
- en
- zh
pipeline_tag: text-generation
---

# Qwen2.5-0.5B-Instruct (Customizable Copy)

This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning.

## 📋 Model Details

- **Base Model:** Qwen/Qwen2.5-0.5B-Instruct
- **Size:** 0.5B parameters (~1GB)
- **Type:** Instruction-tuned language model
- **License:** Apache 2.0

## 🎯 Purpose

This repository contains a **modifiable copy** of Qwen 2.5 for:
- Fine-tuning on custom datasets
- Experimentation and testing
- RunPod serverless deployment
- Model modifications

## 🚀 Usage

### Direct Inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "marcosremar2/runpod_serverless_n2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What is artificial intelligence?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

### RunPod Serverless Deployment

```yaml
Environment Variables:
  MODEL_NAME: marcosremar2/runpod_serverless_n2
  HF_TOKEN: YOUR_TOKEN_HERE
  MAX_MODEL_LEN: 4096
  TRUST_REMOTE_CODE: true

GPU: RTX 4090 (24GB)
Min Workers: 0
Max Workers: 1
```

## 🔧 Fine-tuning

To fine-tune this model:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")

# Your fine-tuning code here
# ...

# Push back to your repo
model.push_to_hub("marcosremar2/runpod_serverless_n2")
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")
```

## 📊 Performance

| Metric | Value |
|--------|-------|
| Parameters | 0.5B |
| Context Length | 32K tokens |
| VRAM Required | ~1-2GB |
| Inference Speed | 200-300 tokens/sec (RTX 4090) |

## 🔗 Original Model

This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

For more information about the Qwen2.5 series, visit the original repository.

## 📄 License

Apache 2.0 - Same as the original Qwen model.

## 🙏 Credits

- **Original Model:** Qwen Team @ Alibaba Cloud
- **Repository:** Custom copy for modification and deployment