File size: 2,893 Bytes
fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 fea3721 583c560 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- qwen
- qwen2.5
- instruct
- runpod
- serverless
language:
- en
- zh
pipeline_tag: text-generation
---
# Qwen2.5-0.5B-Instruct (Customizable Copy)
This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning.
## π Model Details
- **Base Model:** Qwen/Qwen2.5-0.5B-Instruct
- **Size:** 0.5B parameters (~1GB)
- **Type:** Instruction-tuned language model
- **License:** Apache 2.0
## π― Purpose
This repository contains a **modifiable copy** of Qwen 2.5 for:
- Fine-tuning on custom datasets
- Experimentation and testing
- RunPod serverless deployment
- Model modifications
## π Usage
### Direct Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "marcosremar2/runpod_serverless_n2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "What is artificial intelligence?"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
### RunPod Serverless Deployment
```yaml
Environment Variables:
MODEL_NAME: marcosremar2/runpod_serverless_n2
HF_TOKEN: YOUR_TOKEN_HERE
MAX_MODEL_LEN: 4096
TRUST_REMOTE_CODE: true
GPU: RTX 4090 (24GB)
Min Workers: 0
Max Workers: 1
```
## π§ Fine-tuning
To fine-tune this model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")
# Your fine-tuning code here
# ...
# Push back to your repo
model.push_to_hub("marcosremar2/runpod_serverless_n2")
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")
```
## π Performance
| Metric | Value |
|--------|-------|
| Parameters | 0.5B |
| Context Length | 32K tokens |
| VRAM Required | ~1-2GB |
| Inference Speed | 200-300 tokens/sec (RTX 4090) |
## π Original Model
This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
For more information about the Qwen2.5 series, visit the original repository.
## π License
Apache 2.0 - Same as the original Qwen model.
## π Credits
- **Original Model:** Qwen Team @ Alibaba Cloud
- **Repository:** Custom copy for modification and deployment
|