Spaces:

Vivek16
/

VV

Runtime error

App Files Files Community

VV / README.md

Vivek16

Update README.md

7a0a99d verified 4 months ago

preview code

raw

history blame contribute delete

2.32 kB

	---
	base_model: unsloth/qwen2.5-math-1.5b
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:unsloth/qwen2.5-math-1.5b
	- lora
	- sft
	- transformers
	- trl
	- unsloth
	license: apache-2.0
	title: TAV (CPU Version)
	sdk: gradio
	emoji: 👀
	colorFrom: green
	colorTo: red
	sdk_version: 5.49.1
	hf_oauth: true
	---

	# Model Card for TAV CPU Version

	## Model Details

	### Model Description
	This is the TAV model (CPU compatible) for text-generation tasks.
	It is based on `unsloth/qwen2.5-math-1.5b` and uses PEFT adapters for fine-tuning.
	Optimized to run on CPU environments without 4-bit quantization or bitsandbytes dependencies.

	- Developed by: [Your Name / Organization]
	- Shared by: [Your Name / Organization]
	- Model type: Causal Language Model (Text Generation)
	- Language(s): English (with math/technical capability)
	- License: Apache-2.0
	- Finetuned from model: unsloth/qwen2.5-math-1.5b

	### Model Sources
	- Repository: [Hugging Face Model Link]
	- Demo: [Hugging Face Space Link]

	## Uses

	### Direct Use
	- Generate math/technical answers in English.
	- Use as a chatbot for educational purposes.
	- Integrate into CPU-only environments.

	### Downstream Use
	- Can be further fine-tuned for domain-specific tasks.
	- Suitable for research or teaching applications.

	### Out-of-Scope Use
	- Not optimized for GPU-heavy inference or extremely long sequences (>1024 tokens).
	- Not suitable for real-time production under heavy load.

	## Bias, Risks, and Limitations
	- May produce biased or incorrect answers.
	- CPU inference is slower than GPU inference.
	- Limited context window due to CPU memory constraints.

	### Recommendations
	- Use with moderate token limits to avoid long processing times.
	- Not intended for high-throughput production environments.

	## How to Get Started
	Use the CPU-compatible pipeline in Python:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

	tokenizer = AutoTokenizer.from_pretrained("unsloth/qwen2.5-math-1.5b")
	model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-math-1.5b", device_map="cpu")

	generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1)

	output = generator("Hi, how are you?", max_new_tokens=128, do_sample=True)
	print(output[0]["generated_text"])