A newer version of the Gradio SDK is available:
6.6.0
metadata
base_model: unsloth/qwen2.5-math-1.5b
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:unsloth/qwen2.5-math-1.5b
- lora
- sft
- transformers
- trl
- unsloth
license: apache-2.0
title: TAV (CPU Version)
sdk: gradio
emoji: π
colorFrom: green
colorTo: red
sdk_version: 5.49.1
hf_oauth: true
Model Card for TAV CPU Version
Model Details
Model Description
This is the TAV model (CPU compatible) for text-generation tasks.
It is based on unsloth/qwen2.5-math-1.5b and uses PEFT adapters for fine-tuning.
Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies.
- Developed by: [Your Name / Organization]
- Shared by: [Your Name / Organization]
- Model type: Causal Language Model (Text Generation)
- Language(s): English (with math/technical capability)
- License: Apache-2.0
- Finetuned from model: unsloth/qwen2.5-math-1.5b
Model Sources
- Repository: [Hugging Face Model Link]
- Demo: [Hugging Face Space Link]
Uses
Direct Use
- Generate math/technical answers in English.
- Use as a chatbot for educational purposes.
- Integrate into CPU-only environments.
Downstream Use
- Can be further fine-tuned for domain-specific tasks.
- Suitable for research or teaching applications.
Out-of-Scope Use
- Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens).
- Not suitable for real-time production under heavy load.
Bias, Risks, and Limitations
- May produce biased or incorrect answers.
- CPU inference is slower than GPU inference.
- Limited context window due to CPU memory constraints.
Recommendations
- Use with moderate token limits to avoid long processing times.
- Not intended for high-throughput production environments.
How to Get Started
Use the CPU-compatible pipeline in Python:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b")
model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1)
output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True)
print(output[0]["generated_text"])