--- base_model: unsloth/qwen2.5-math-1.5b library_name: peft pipeline_tag: text-generation tags: - base_model:unsloth/qwen2.5-math-1.5b - lora - sft - transformers - trl - unsloth license: apache-2.0 title: TAV (CPU Version) sdk: gradio emoji: 👀 colorFrom: green colorTo: red sdk_version: 5.49.1 hf_oauth: true --- # Model Card for TAV CPU Version ## Model Details ### Model Description This is the TAV model (CPU compatible) for text-generation tasks. It is based on `unsloth/qwen2.5-math-1.5b` and uses PEFT adapters for fine-tuning. Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies. - **Developed by:** [Your Name / Organization] - **Shared by:** [Your Name / Organization] - **Model type:** Causal Language Model (Text Generation) - **Language(s):** English (with math/technical capability) - **License:** Apache-2.0 - **Finetuned from model:** unsloth/qwen2.5-math-1.5b ### Model Sources - **Repository:** [Hugging Face Model Link] - **Demo:** [Hugging Face Space Link] ## Uses ### Direct Use - Generate math/technical answers in English. - Use as a chatbot for educational purposes. - Integrate into CPU-only environments. ### Downstream Use - Can be further fine-tuned for domain-specific tasks. - Suitable for research or teaching applications. ### Out-of-Scope Use - Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens). - Not suitable for real-time production under heavy load. ## Bias, Risks, and Limitations - May produce biased or incorrect answers. - CPU inference is slower than GPU inference. - Limited context window due to CPU memory constraints. ### Recommendations - Use with moderate token limits to avoid long processing times. - Not intended for high-throughput production environments. ## How to Get Started Use the CPU-compatible pipeline in Python: ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b") model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu") generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1) output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True) print(output[0]["generated_text"])