marcosremar2
/

runpod_serverless_n2

Text Generation

Model card Files Files and versions

runpod_serverless_n2 / README.md

marcosremar2's picture

Add README with usage instructions

583c560 verified 4 months ago

|

history blame contribute delete

2.89 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-0.5B-Instruct
	tags:
	- qwen
	- qwen2.5
	- instruct
	- runpod
	- serverless
	language:
	- en
	- zh
	pipeline_tag: text-generation
	---

	# Qwen2.5-0.5B-Instruct (Customizable Copy)

	This is a copy of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) for customization and fine-tuning.

	## 📋 Model Details

	- Base Model: Qwen/Qwen2.5-0.5B-Instruct
	- Size: 0.5B parameters (~1GB)
	- Type: Instruction-tuned language model
	- License: Apache 2.0

	## 🎯 Purpose

	This repository contains a modifiable copy of Qwen 2.5 for:
	- Fine-tuning on custom datasets
	- Experimentation and testing
	- RunPod serverless deployment
	- Model modifications

	## 🚀 Usage

	### Direct Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "marcosremar2/runpod_serverless_n2"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "What is artificial intelligence?"
	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	### RunPod Serverless Deployment

	```yaml
	Environment Variables:
	MODEL_NAME: marcosremar2/runpod_serverless_n2
	HF_TOKEN: YOUR_TOKEN_HERE
	MAX_MODEL_LEN: 4096
	TRUST_REMOTE_CODE: true

	GPU: RTX 4090 (24GB)
	Min Workers: 0
	Max Workers: 1
	```

	## 🔧 Fine-tuning

	To fine-tune this model:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

	model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
	tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")

	# Your fine-tuning code here
	# ...

	# Push back to your repo
	model.push_to_hub("marcosremar2/runpod_serverless_n2")
	tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")
	```

	## 📊 Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Parameters \| 0.5B \|
	\| Context Length \| 32K tokens \|
	\| VRAM Required \| ~1-2GB \|
	\| Inference Speed \| 200-300 tokens/sec (RTX 4090) \|

	## 🔗 Original Model

	This is based on: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

	For more information about the Qwen2.5 series, visit the original repository.

	## 📄 License

	Apache 2.0 - Same as the original Qwen model.

	## 🙏 Credits

	- Original Model: Qwen Team @ Alibaba Cloud
	- Repository: Custom copy for modification and deployment