|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-0.5B-Instruct |
|
|
tags: |
|
|
- qwen |
|
|
- qwen2.5 |
|
|
- instruct |
|
|
- runpod |
|
|
- serverless |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Qwen2.5-0.5B-Instruct (Customizable Copy) |
|
|
|
|
|
This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning. |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
- **Base Model:** Qwen/Qwen2.5-0.5B-Instruct |
|
|
- **Size:** 0.5B parameters (~1GB) |
|
|
- **Type:** Instruction-tuned language model |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
## π― Purpose |
|
|
|
|
|
This repository contains a **modifiable copy** of Qwen 2.5 for: |
|
|
- Fine-tuning on custom datasets |
|
|
- Experimentation and testing |
|
|
- RunPod serverless deployment |
|
|
- Model modifications |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Direct Inference |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "marcosremar2/runpod_serverless_n2" |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
prompt = "What is artificial intelligence?" |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
|
{"role": "user", "content": prompt} |
|
|
] |
|
|
|
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
|
|
generated_ids = model.generate( |
|
|
**model_inputs, |
|
|
max_new_tokens=512 |
|
|
) |
|
|
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### RunPod Serverless Deployment |
|
|
|
|
|
```yaml |
|
|
Environment Variables: |
|
|
MODEL_NAME: marcosremar2/runpod_serverless_n2 |
|
|
HF_TOKEN: YOUR_TOKEN_HERE |
|
|
MAX_MODEL_LEN: 4096 |
|
|
TRUST_REMOTE_CODE: true |
|
|
|
|
|
GPU: RTX 4090 (24GB) |
|
|
Min Workers: 0 |
|
|
Max Workers: 1 |
|
|
``` |
|
|
|
|
|
## π§ Fine-tuning |
|
|
|
|
|
To fine-tune this model: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2") |
|
|
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2") |
|
|
|
|
|
# Your fine-tuning code here |
|
|
# ... |
|
|
|
|
|
# Push back to your repo |
|
|
model.push_to_hub("marcosremar2/runpod_serverless_n2") |
|
|
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2") |
|
|
``` |
|
|
|
|
|
## π Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Parameters | 0.5B | |
|
|
| Context Length | 32K tokens | |
|
|
| VRAM Required | ~1-2GB | |
|
|
| Inference Speed | 200-300 tokens/sec (RTX 4090) | |
|
|
|
|
|
## π Original Model |
|
|
|
|
|
This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) |
|
|
|
|
|
For more information about the Qwen2.5 series, visit the original repository. |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 - Same as the original Qwen model. |
|
|
|
|
|
## π Credits |
|
|
|
|
|
- **Original Model:** Qwen Team @ Alibaba Cloud |
|
|
- **Repository:** Custom copy for modification and deployment |
|
|
|