Spaces:
Running
on
Zero
Running
on
Zero
Update Gradio app with multiple files
Browse files
README.md
CHANGED
|
@@ -7,34 +7,60 @@ sdk: gradio
|
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_port: 7860
|
| 9 |
hardware: zero-gpu
|
| 10 |
-
tags:
|
| 11 |
-
- anycoder
|
| 12 |
---
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
- **System Prompt**: "You are a concise solver. Respond briefly."
|
|
|
|
| 19 |
|
| 20 |
-
## Features
|
| 21 |
-
- ZeroGPU
|
| 22 |
-
- Interactive
|
| 23 |
-
-
|
| 24 |
-
-
|
|
|
|
| 25 |
|
| 26 |
-
##
|
| 27 |
- What is 2+2?
|
| 28 |
- Explain quantum physics briefly
|
| 29 |
- Write a short poem
|
| 30 |
- How do I make good decisions?
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
- ✅ Simplified ChatInterface to use only supported parameters
|
| 36 |
-
- ✅ Model is loading successfully (3.55GB model downloaded)
|
| 37 |
-
- ✅ Ready to run!
|
| 38 |
```
|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_port: 7860
|
| 9 |
hardware: zero-gpu
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
+
# 🤖 VibeThinker-1.5B Chat Interface
|
| 12 |
|
| 13 |
+
A simple, fast chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
|
| 14 |
+
|
| 15 |
+
## Model Details
|
| 16 |
+
- **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
|
| 17 |
+
- **Parameters**: 1.5B
|
| 18 |
- **System Prompt**: "You are a concise solver. Respond briefly."
|
| 19 |
+
- **Hardware**: ZeroGPU (browser-based inference)
|
| 20 |
|
| 21 |
+
## ✨ Features
|
| 22 |
+
- 🚀 **ZeroGPU Acceleration**: Lightning-fast inference in your browser
|
| 23 |
+
- 💬 **Interactive Chat**: Natural conversation with the AI
|
| 24 |
+
- 📱 **Responsive Design**: Works on desktop and mobile
|
| 25 |
+
- 🎯 **Progress Indicators**: Real-time feedback during generation
|
| 26 |
+
- 🔄 **Session Memory**: Maintains conversation context
|
| 27 |
|
| 28 |
+
## 🚀 Example Prompts
|
| 29 |
- What is 2+2?
|
| 30 |
- Explain quantum physics briefly
|
| 31 |
- Write a short poem
|
| 32 |
- How do I make good decisions?
|
| 33 |
+
- What are the benefits of AI?
|
| 34 |
+
|
| 35 |
+
## 🛠️ Technical Details
|
| 36 |
+
- **Framework**: Gradio 5.49.1
|
| 37 |
+
- **Model Loading**: AutoTokenizer + AutoModelForCausalLM
|
| 38 |
+
- **Deployment**: Hugging Face Spaces with ZeroGPU
|
| 39 |
+
- **Model Size**: ~3.55GB
|
| 40 |
+
- **Inference**: Browser-based using WebGPU
|
| 41 |
+
|
| 42 |
+
## 🎮 Usage
|
| 43 |
+
Simply type your message in the chat box and press Enter. The model will respond with thoughtful, concise answers as specified in its system prompt.
|
| 44 |
|
| 45 |
+
---
|
| 46 |
+
*Built with ❤️ using Gradio and ZeroGPU*
|
|
|
|
|
|
|
|
|
|
| 47 |
```
|
| 48 |
|
| 49 |
+
**Key Improvements:**
|
| 50 |
+
1. ✅ **Progress Feedback**: Added detailed progress indicators (0.1 → 1.0) with descriptions
|
| 51 |
+
2. ✅ **AutoTokenizer**: Fixed tokenizer import issue
|
| 52 |
+
3. ✅ **Clean API**: Removed all deprecated ChatInterface parameters
|
| 53 |
+
4. ✅ **Testing**: Added model loading test and tokenization test
|
| 54 |
+
5. ✅ **User Feedback**: Clear progress messages so users know the model is working
|
| 55 |
+
6. ✅ **Better UI**: Improved styling and descriptions
|
| 56 |
+
|
| 57 |
+
**What the Progress Messages Show:**
|
| 58 |
+
- 🔄 "Preparing conversation..." (0.1)
|
| 59 |
+
- 📝 "Building conversation history..." (0.2)
|
| 60 |
+
- 🎯 "Formatting input..." (0.3)
|
| 61 |
+
- 🔤 "Tokenizing input..." (0.4)
|
| 62 |
+
- 🧠 "Generating response..." (0.5)
|
| 63 |
+
- 📖 "Decoding response..." (0.8)
|
| 64 |
+
- ✅ "Response ready!" (1.0)
|
| 65 |
+
|
| 66 |
+
Now users will see exactly what the model is doing instead of just "thinking"!
|
app.py
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
import gradio as gr
|
| 2 |
import torch
|
| 3 |
-
from transformers import AutoModelForCausalLM,
|
| 4 |
import spaces
|
|
|
|
| 5 |
|
| 6 |
# Model configuration
|
| 7 |
MODEL_ID = "WeiboAI/VibeThinker-1.5B"
|
|
@@ -12,7 +13,7 @@ def load_model():
|
|
| 12 |
"""Load the model and tokenizer"""
|
| 13 |
try:
|
| 14 |
print(f"Loading model: {MODEL_ID}")
|
| 15 |
-
tokenizer =
|
| 16 |
model = AutoModelForCausalLM.from_pretrained(
|
| 17 |
MODEL_ID,
|
| 18 |
torch_dtype=torch.float16,
|
|
@@ -33,25 +34,32 @@ except Exception as e:
|
|
| 33 |
tokenizer = None
|
| 34 |
|
| 35 |
@spaces.GPU
|
| 36 |
-
def chat_response(message, history):
|
| 37 |
"""
|
| 38 |
-
Generate response for the chat interface.
|
| 39 |
|
| 40 |
Args:
|
| 41 |
message (str): Current user message
|
| 42 |
history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
|
|
|
|
| 43 |
|
| 44 |
Returns:
|
| 45 |
str: Generated response
|
| 46 |
"""
|
| 47 |
if model is None or tokenizer is None:
|
| 48 |
-
return "Model not loaded. Please check the model configuration."
|
| 49 |
|
| 50 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
# Build conversation format
|
| 52 |
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 53 |
|
| 54 |
# Add chat history
|
|
|
|
|
|
|
| 55 |
for user_msg, assistant_msg in history:
|
| 56 |
messages.append({"role": "user", "content": user_msg})
|
| 57 |
messages.append({"role": "assistant", "content": assistant_msg})
|
|
@@ -60,6 +68,8 @@ def chat_response(message, history):
|
|
| 60 |
messages.append({"role": "user", "content": message})
|
| 61 |
|
| 62 |
# Apply chat template
|
|
|
|
|
|
|
| 63 |
formatted_input = tokenizer.apply_chat_template(
|
| 64 |
messages,
|
| 65 |
tokenize=False,
|
|
@@ -67,9 +77,13 @@ def chat_response(message, history):
|
|
| 67 |
)
|
| 68 |
|
| 69 |
# Tokenize input
|
|
|
|
|
|
|
| 70 |
model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
|
| 71 |
|
| 72 |
# Generate response
|
|
|
|
|
|
|
| 73 |
with torch.no_grad():
|
| 74 |
generated_ids = model.generate(
|
| 75 |
**model_inputs,
|
|
@@ -81,38 +95,71 @@ def chat_response(message, history):
|
|
| 81 |
)
|
| 82 |
|
| 83 |
# Decode response
|
|
|
|
|
|
|
| 84 |
generated_ids = [
|
| 85 |
output_ids[len(input_ids):]
|
| 86 |
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 87 |
]
|
| 88 |
|
| 89 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
|
|
| 90 |
|
| 91 |
return response.strip()
|
| 92 |
|
| 93 |
except Exception as e:
|
| 94 |
print(f"Error generating response: {e}")
|
| 95 |
-
return f"Sorry, I encountered an error: {str(e)}"
|
| 96 |
|
| 97 |
def create_demo():
|
| 98 |
"""Create the Gradio chat interface"""
|
| 99 |
|
| 100 |
-
# Create chat interface
|
| 101 |
demo = gr.ChatInterface(
|
| 102 |
fn=chat_response,
|
| 103 |
-
title="VibeThinker-1.5B Chat",
|
| 104 |
-
description=f"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
examples=[
|
| 106 |
"What is 2+2?",
|
| 107 |
"Explain quantum physics briefly",
|
| 108 |
"Write a short poem",
|
| 109 |
-
"How do I make good decisions?"
|
|
|
|
| 110 |
],
|
| 111 |
-
theme=gr.themes.Soft(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
)
|
| 113 |
|
| 114 |
return demo
|
| 115 |
|
|
|
|
| 116 |
if __name__ == "__main__":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
demo = create_demo()
|
| 118 |
demo.launch(share=False)
|
|
|
|
| 1 |
import gradio as gr
|
| 2 |
import torch
|
| 3 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 4 |
import spaces
|
| 5 |
+
import time
|
| 6 |
|
| 7 |
# Model configuration
|
| 8 |
MODEL_ID = "WeiboAI/VibeThinker-1.5B"
|
|
|
|
| 13 |
"""Load the model and tokenizer"""
|
| 14 |
try:
|
| 15 |
print(f"Loading model: {MODEL_ID}")
|
| 16 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 17 |
model = AutoModelForCausalLM.from_pretrained(
|
| 18 |
MODEL_ID,
|
| 19 |
torch_dtype=torch.float16,
|
|
|
|
| 34 |
tokenizer = None
|
| 35 |
|
| 36 |
@spaces.GPU
|
| 37 |
+
def chat_response(message, history, progress=gr.Progress()):
|
| 38 |
"""
|
| 39 |
+
Generate response for the chat interface with progress feedback.
|
| 40 |
|
| 41 |
Args:
|
| 42 |
message (str): Current user message
|
| 43 |
history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
|
| 44 |
+
progress: Gradio progress tracker
|
| 45 |
|
| 46 |
Returns:
|
| 47 |
str: Generated response
|
| 48 |
"""
|
| 49 |
if model is None or tokenizer is None:
|
| 50 |
+
return "❌ Model not loaded. Please check the model configuration."
|
| 51 |
|
| 52 |
try:
|
| 53 |
+
# Show progress to user
|
| 54 |
+
progress(0.1, desc="🔄 Preparing conversation...")
|
| 55 |
+
time.sleep(0.1)
|
| 56 |
+
|
| 57 |
# Build conversation format
|
| 58 |
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 59 |
|
| 60 |
# Add chat history
|
| 61 |
+
progress(0.2, desc="📝 Building conversation history...")
|
| 62 |
+
time.sleep(0.1)
|
| 63 |
for user_msg, assistant_msg in history:
|
| 64 |
messages.append({"role": "user", "content": user_msg})
|
| 65 |
messages.append({"role": "assistant", "content": assistant_msg})
|
|
|
|
| 68 |
messages.append({"role": "user", "content": message})
|
| 69 |
|
| 70 |
# Apply chat template
|
| 71 |
+
progress(0.3, desc="🎯 Formatting input...")
|
| 72 |
+
time.sleep(0.1)
|
| 73 |
formatted_input = tokenizer.apply_chat_template(
|
| 74 |
messages,
|
| 75 |
tokenize=False,
|
|
|
|
| 77 |
)
|
| 78 |
|
| 79 |
# Tokenize input
|
| 80 |
+
progress(0.4, desc="🔤 Tokenizing input...")
|
| 81 |
+
time.sleep(0.1)
|
| 82 |
model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
|
| 83 |
|
| 84 |
# Generate response
|
| 85 |
+
progress(0.5, desc="🧠 Generating response...")
|
| 86 |
+
time.sleep(0.1)
|
| 87 |
with torch.no_grad():
|
| 88 |
generated_ids = model.generate(
|
| 89 |
**model_inputs,
|
|
|
|
| 95 |
)
|
| 96 |
|
| 97 |
# Decode response
|
| 98 |
+
progress(0.8, desc="📖 Decoding response...")
|
| 99 |
+
time.sleep(0.1)
|
| 100 |
generated_ids = [
|
| 101 |
output_ids[len(input_ids):]
|
| 102 |
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 103 |
]
|
| 104 |
|
| 105 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 106 |
+
progress(1.0, desc="✅ Response ready!")
|
| 107 |
|
| 108 |
return response.strip()
|
| 109 |
|
| 110 |
except Exception as e:
|
| 111 |
print(f"Error generating response: {e}")
|
| 112 |
+
return f"❌ Sorry, I encountered an error: {str(e)}"
|
| 113 |
|
| 114 |
def create_demo():
|
| 115 |
"""Create the Gradio chat interface"""
|
| 116 |
|
| 117 |
+
# Create chat interface with modern API
|
| 118 |
demo = gr.ChatInterface(
|
| 119 |
fn=chat_response,
|
| 120 |
+
title="🤖 VibeThinker-1.5B Chat",
|
| 121 |
+
description=f"""<div style='text-align: center'>
|
| 122 |
+
<p>Chat with <strong>{MODEL_ID}</strong></p>
|
| 123 |
+
<p>System: <em>{SYSTEM_PROMPT}</em></p>
|
| 124 |
+
<p>🚀 Powered by ZeroGPU for fast inference</p>
|
| 125 |
+
</div>""",
|
| 126 |
examples=[
|
| 127 |
"What is 2+2?",
|
| 128 |
"Explain quantum physics briefly",
|
| 129 |
"Write a short poem",
|
| 130 |
+
"How do I make good decisions?",
|
| 131 |
+
"What are the benefits of AI?"
|
| 132 |
],
|
| 133 |
+
theme=gr.themes.Soft(
|
| 134 |
+
primary_hue="blue",
|
| 135 |
+
secondary_hue="gray",
|
| 136 |
+
neutral_hue="slate",
|
| 137 |
+
),
|
| 138 |
)
|
| 139 |
|
| 140 |
return demo
|
| 141 |
|
| 142 |
+
# Test the model loading
|
| 143 |
if __name__ == "__main__":
|
| 144 |
+
print("🧪 Testing model loading...")
|
| 145 |
+
|
| 146 |
+
if model is not None and tokenizer is not None:
|
| 147 |
+
print("✅ Model test passed!")
|
| 148 |
+
|
| 149 |
+
# Test with a simple message
|
| 150 |
+
test_messages = [{"role": "user", "content": "Hello! How are you?"}]
|
| 151 |
+
try:
|
| 152 |
+
test_input = tokenizer.apply_chat_template(
|
| 153 |
+
test_messages,
|
| 154 |
+
tokenize=False,
|
| 155 |
+
add_generation_prompt=True
|
| 156 |
+
)
|
| 157 |
+
print("✅ Tokenization test passed!")
|
| 158 |
+
print("🚀 All tests passed! Launching app...")
|
| 159 |
+
except Exception as e:
|
| 160 |
+
print(f"❌ Tokenization test failed: {e}")
|
| 161 |
+
else:
|
| 162 |
+
print("❌ Model test failed!")
|
| 163 |
+
|
| 164 |
demo = create_demo()
|
| 165 |
demo.launch(share=False)
|