VV / README.md
Vivek16's picture
Update README.md
7a0a99d verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
base_model: unsloth/qwen2.5-math-1.5b
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:unsloth/qwen2.5-math-1.5b
  - lora
  - sft
  - transformers
  - trl
  - unsloth
license: apache-2.0
title: TAV (CPU Version)
sdk: gradio
emoji: πŸ‘€
colorFrom: green
colorTo: red
sdk_version: 5.49.1
hf_oauth: true

Model Card for TAV CPU Version

Model Details

Model Description

This is the TAV model (CPU compatible) for text-generation tasks.
It is based on unsloth/qwen2.5-math-1.5b and uses PEFT adapters for fine-tuning.
Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies.

  • Developed by: [Your Name / Organization]
  • Shared by: [Your Name / Organization]
  • Model type: Causal Language Model (Text Generation)
  • Language(s): English (with math/technical capability)
  • License: Apache-2.0
  • Finetuned from model: unsloth/qwen2.5-math-1.5b

Model Sources

  • Repository: [Hugging Face Model Link]
  • Demo: [Hugging Face Space Link]

Uses

Direct Use

  • Generate math/technical answers in English.
  • Use as a chatbot for educational purposes.
  • Integrate into CPU-only environments.

Downstream Use

  • Can be further fine-tuned for domain-specific tasks.
  • Suitable for research or teaching applications.

Out-of-Scope Use

  • Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens).
  • Not suitable for real-time production under heavy load.

Bias, Risks, and Limitations

  • May produce biased or incorrect answers.
  • CPU inference is slower than GPU inference.
  • Limited context window due to CPU memory constraints.

Recommendations

  • Use with moderate token limits to avoid long processing times.
  • Not intended for high-throughput production environments.

How to Get Started

Use the CPU-compatible pipeline in Python:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b")
model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu")

generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1)

output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True)
print(output[0]["generated_text"])