|
|
--- |
|
|
base_model: unsloth/qwen2.5-math-1.5b |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:unsloth/qwen2.5-math-1.5b |
|
|
- lora |
|
|
- sft |
|
|
- transformers |
|
|
- trl |
|
|
- unsloth |
|
|
license: apache-2.0 |
|
|
title: TAV (CPU Version) |
|
|
sdk: gradio |
|
|
emoji: π |
|
|
colorFrom: green |
|
|
colorTo: red |
|
|
sdk_version: 5.49.1 |
|
|
hf_oauth: true |
|
|
--- |
|
|
|
|
|
# Model Card for TAV CPU Version |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
This is the TAV model (CPU compatible) for text-generation tasks. |
|
|
It is based on `unsloth/qwen2.5-math-1.5b` and uses PEFT adapters for fine-tuning. |
|
|
Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies. |
|
|
|
|
|
- **Developed by:** [Your Name / Organization] |
|
|
- **Shared by:** [Your Name / Organization] |
|
|
- **Model type:** Causal Language Model (Text Generation) |
|
|
- **Language(s):** English (with math/technical capability) |
|
|
- **License:** Apache-2.0 |
|
|
- **Finetuned from model:** unsloth/qwen2.5-math-1.5b |
|
|
|
|
|
### Model Sources |
|
|
- **Repository:** [Hugging Face Model Link] |
|
|
- **Demo:** [Hugging Face Space Link] |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
- Generate math/technical answers in English. |
|
|
- Use as a chatbot for educational purposes. |
|
|
- Integrate into CPU-only environments. |
|
|
|
|
|
### Downstream Use |
|
|
- Can be further fine-tuned for domain-specific tasks. |
|
|
- Suitable for research or teaching applications. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens). |
|
|
- Not suitable for real-time production under heavy load. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
- May produce biased or incorrect answers. |
|
|
- CPU inference is slower than GPU inference. |
|
|
- Limited context window due to CPU memory constraints. |
|
|
|
|
|
### Recommendations |
|
|
- Use with moderate token limits to avoid long processing times. |
|
|
- Not intended for high-throughput production environments. |
|
|
|
|
|
## How to Get Started |
|
|
Use the CPU-compatible pipeline in Python: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b") |
|
|
model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu") |
|
|
|
|
|
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1) |
|
|
|
|
|
output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True) |
|
|
print(output[0]["generated_text"]) |
|
|
|