--- license: apache-2.0 base_model: Qwen/Qwen2.5-0.5B-Instruct tags: - qwen - qwen2.5 - instruct - runpod - serverless language: - en - zh pipeline_tag: text-generation --- # Qwen2.5-0.5B-Instruct (Customizable Copy) This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning. ## 📋 Model Details - **Base Model:** Qwen/Qwen2.5-0.5B-Instruct - **Size:** 0.5B parameters (~1GB) - **Type:** Instruction-tuned language model - **License:** Apache 2.0 ## 🎯 Purpose This repository contains a **modifiable copy** of Qwen 2.5 for: - Fine-tuning on custom datasets - Experimentation and testing - RunPod serverless deployment - Model modifications ## 🚀 Usage ### Direct Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "marcosremar2/runpod_serverless_n2" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "What is artificial intelligence?" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ### RunPod Serverless Deployment ```yaml Environment Variables: MODEL_NAME: marcosremar2/runpod_serverless_n2 HF_TOKEN: YOUR_TOKEN_HERE MAX_MODEL_LEN: 4096 TRUST_REMOTE_CODE: true GPU: RTX 4090 (24GB) Min Workers: 0 Max Workers: 1 ``` ## 🔧 Fine-tuning To fine-tune this model: ```python from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2") tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2") # Your fine-tuning code here # ... # Push back to your repo model.push_to_hub("marcosremar2/runpod_serverless_n2") tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2") ``` ## 📊 Performance | Metric | Value | |--------|-------| | Parameters | 0.5B | | Context Length | 32K tokens | | VRAM Required | ~1-2GB | | Inference Speed | 200-300 tokens/sec (RTX 4090) | ## 🔗 Original Model This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) For more information about the Qwen2.5 series, visit the original repository. ## 📄 License Apache 2.0 - Same as the original Qwen model. ## 🙏 Credits - **Original Model:** Qwen Team @ Alibaba Cloud - **Repository:** Custom copy for modification and deployment