Spaces:

Vivek16
/

VV

Runtime error

App Files Files Community

VV / README.md

Vivek16

Update README.md

7a0a99d verified 4 months ago

preview code

raw

history blame contribute delete

2.32 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

metadata

base_model: unsloth/qwen2.5-math-1.5b
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:unsloth/qwen2.5-math-1.5b
  - lora
  - sft
  - transformers
  - trl
  - unsloth
license: apache-2.0
title: TAV (CPU Version)
sdk: gradio
emoji: 👀
colorFrom: green
colorTo: red
sdk_version: 5.49.1
hf_oauth: true

Model Card for TAV CPU Version

Model Details

Model Description

This is the TAV model (CPU compatible) for text-generation tasks.
It is based on unsloth/qwen2.5-math-1.5b and uses PEFT adapters for fine-tuning.
Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies.

Developed by: [Your Name / Organization]
Shared by: [Your Name / Organization]
Model type: Causal Language Model (Text Generation)
Language(s): English (with math/technical capability)
License: Apache-2.0
Finetuned from model: unsloth/qwen2.5-math-1.5b

Model Sources

Repository: [Hugging Face Model Link]
Demo: [Hugging Face Space Link]

Uses

Direct Use

Generate math/technical answers in English.
Use as a chatbot for educational purposes.
Integrate into CPU-only environments.

Downstream Use

Can be further fine-tuned for domain-specific tasks.
Suitable for research or teaching applications.

Out-of-Scope Use

Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens).
Not suitable for real-time production under heavy load.

Bias, Risks, and Limitations

May produce biased or incorrect answers.
CPU inference is slower than GPU inference.
Limited context window due to CPU memory constraints.

Recommendations

Use with moderate token limits to avoid long processing times.
Not intended for high-throughput production environments.

How to Get Started

Use the CPU-compatible pipeline in Python:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b")
model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu")

generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1)

output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True)
print(output[0]["generated_text"])