File size: 2,893 Bytes
fea3721
 
583c560
 
 
 
 
 
 
fea3721
 
583c560
fea3721
 
 
583c560
fea3721
583c560
fea3721
583c560
fea3721
583c560
 
 
 
fea3721
583c560
fea3721
583c560
 
 
 
 
fea3721
583c560
fea3721
583c560
fea3721
 
 
 
583c560
fea3721
 
 
 
 
 
 
 
583c560
fea3721
583c560
fea3721
 
583c560
fea3721
 
 
 
 
583c560
fea3721
 
 
 
 
 
 
 
583c560
fea3721
 
583c560
fea3721
583c560
 
 
 
 
 
fea3721
583c560
 
 
 
fea3721
583c560
fea3721
583c560
fea3721
583c560
 
fea3721
583c560
 
 
 
 
 
 
 
 
fea3721
583c560
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- qwen
- qwen2.5
- instruct
- runpod
- serverless
language:
- en
- zh
pipeline_tag: text-generation
---

# Qwen2.5-0.5B-Instruct (Customizable Copy)

This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning.

## πŸ“‹ Model Details

- **Base Model:** Qwen/Qwen2.5-0.5B-Instruct
- **Size:** 0.5B parameters (~1GB)
- **Type:** Instruction-tuned language model
- **License:** Apache 2.0

## 🎯 Purpose

This repository contains a **modifiable copy** of Qwen 2.5 for:
- Fine-tuning on custom datasets
- Experimentation and testing
- RunPod serverless deployment
- Model modifications

## πŸš€ Usage

### Direct Inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "marcosremar2/runpod_serverless_n2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What is artificial intelligence?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

### RunPod Serverless Deployment

```yaml
Environment Variables:
  MODEL_NAME: marcosremar2/runpod_serverless_n2
  HF_TOKEN: YOUR_TOKEN_HERE
  MAX_MODEL_LEN: 4096
  TRUST_REMOTE_CODE: true

GPU: RTX 4090 (24GB)
Min Workers: 0
Max Workers: 1
```

## πŸ”§ Fine-tuning

To fine-tune this model:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")

# Your fine-tuning code here
# ...

# Push back to your repo
model.push_to_hub("marcosremar2/runpod_serverless_n2")
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")
```

## πŸ“Š Performance

| Metric | Value |
|--------|-------|
| Parameters | 0.5B |
| Context Length | 32K tokens |
| VRAM Required | ~1-2GB |
| Inference Speed | 200-300 tokens/sec (RTX 4090) |

## πŸ”— Original Model

This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

For more information about the Qwen2.5 series, visit the original repository.

## πŸ“„ License

Apache 2.0 - Same as the original Qwen model.

## πŸ™ Credits

- **Original Model:** Qwen Team @ Alibaba Cloud
- **Repository:** Custom copy for modification and deployment