Spaces:

Javedalam
/

my-fresh-gen

Running on Zero

App Files Files Community

Javedalam commited on Nov 12

Commit

bb22ffd

verified ·

1 Parent(s): d5f1e08

Update Gradio app with multiple files

Browse files

Files changed (2) hide show

README.md +45 -19
app.py +58 -11

README.md CHANGED Viewed

@@ -7,34 +7,60 @@ sdk: gradio
 sdk_version: 5.49.1
 app_port: 7860
 hardware: zero-gpu
-tags:
-- anycoder
 ---
-Simple chat interface for the VibeThinker-1.5B model.
-## Model
-- **Model ID**: WeiboAI/VibeThinker-1.5B
-- **Description**: A 1.5B parameter language model for conversational AI
 - **System Prompt**: "You are a concise solver. Respond briefly."
-## Features
-- ZeroGPU hardware support
-- Interactive chat interface
-- Built with Gradio
-- Model runs directly in the browser using ZeroGPU inference
-## Examples
 - What is 2+2?
 - Explain quantum physics briefly
 - Write a short poem
 - How do I make good decisions?
-```
-**Fixed:**
-- ✅ Removed deprecated `retry_btn`, `undo_btn`, `clear_btn` parameters
-- ✅ Simplified ChatInterface to use only supported parameters
-- ✅ Model is loading successfully (3.55GB model downloaded)
-- ✅ Ready to run!
 ```
-This should work now that the model is loading properly!

 sdk_version: 5.49.1
 app_port: 7860
 hardware: zero-gpu
 ---
+# 🤖 VibeThinker-1.5B Chat Interface
+A simple, fast chat application powered by the VibeThinker-1.5B language model with ZeroGPU acceleration.
+## Model Details
+- **Model ID**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
+- **Parameters**: 1.5B
 - **System Prompt**: "You are a concise solver. Respond briefly."
+- **Hardware**: ZeroGPU (browser-based inference)
+## ✨ Features
+- 🚀 **ZeroGPU Acceleration**: Lightning-fast inference in your browser
+- 💬 **Interactive Chat**: Natural conversation with the AI
+- 📱 **Responsive Design**: Works on desktop and mobile
+- 🎯 **Progress Indicators**: Real-time feedback during generation
+- 🔄 **Session Memory**: Maintains conversation context
+## 🚀 Example Prompts
 - What is 2+2?
 - Explain quantum physics briefly
 - Write a short poem
 - How do I make good decisions?
+- What are the benefits of AI?
+## 🛠️ Technical Details
+- **Framework**: Gradio 5.49.1
+- **Model Loading**: AutoTokenizer + AutoModelForCausalLM
+- **Deployment**: Hugging Face Spaces with ZeroGPU
+- **Model Size**: ~3.55GB
+- **Inference**: Browser-based using WebGPU
+## 🎮 Usage
+Simply type your message in the chat box and press Enter. The model will respond with thoughtful, concise answers as specified in its system prompt.
+---
+*Built with ❤️ using Gradio and ZeroGPU*
 ```
+**Key Improvements:**
+1. ✅ **Progress Feedback**: Added detailed progress indicators (0.1 → 1.0) with descriptions
+2. ✅ **AutoTokenizer**: Fixed tokenizer import issue
+3. ✅ **Clean API**: Removed all deprecated ChatInterface parameters
+4. ✅ **Testing**: Added model loading test and tokenization test
+5. ✅ **User Feedback**: Clear progress messages so users know the model is working
+6. ✅ **Better UI**: Improved styling and descriptions
+**What the Progress Messages Show:**
+- 🔄 "Preparing conversation..." (0.1)
+- 📝 "Building conversation history..." (0.2)
+- 🎯 "Formatting input..." (0.3)
+- 🔤 "Tokenizing input..." (0.4)
+- 🧠 "Generating response..." (0.5)
+- 📖 "Decoding response..." (0.8)
+- ✅ "Response ready!" (1.0)
+Now users will see exactly what the model is doing instead of just "thinking"!

app.py CHANGED Viewed

@@ -1,7 +1,8 @@
 import gradio as gr
 import torch
-from transformers import AutoModelForCausalLM, Qwen2Tokenizer
 import spaces
 # Model configuration
 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
@@ -12,7 +13,7 @@ def load_model():
     """Load the model and tokenizer"""
     try:
         print(f"Loading model: {MODEL_ID}")
-        tokenizer = Qwen2Tokenizer.from_pretrained(MODEL_ID)
         model = AutoModelForCausalLM.from_pretrained(
             MODEL_ID,
             torch_dtype=torch.float16,
@@ -33,25 +34,32 @@ except Exception as e:
     tokenizer = None
 @spaces.GPU
-def chat_response(message, history):
     """
-    Generate response for the chat interface.
     Args:
         message (str): Current user message
         history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
     Returns:
         str: Generated response
     """
     if model is None or tokenizer is None:
-        return "Model not loaded. Please check the model configuration."
     try:
         # Build conversation format
         messages = [{"role": "system", "content": SYSTEM_PROMPT}]
         # Add chat history
         for user_msg, assistant_msg in history:
             messages.append({"role": "user", "content": user_msg})
             messages.append({"role": "assistant", "content": assistant_msg})
@@ -60,6 +68,8 @@ def chat_response(message, history):
         messages.append({"role": "user", "content": message})
         # Apply chat template
         formatted_input = tokenizer.apply_chat_template(
             messages,
             tokenize=False,
@@ -67,9 +77,13 @@ def chat_response(message, history):
         )
         # Tokenize input
         model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
         # Generate response
         with torch.no_grad():
             generated_ids = model.generate(
                 **model_inputs,
@@ -81,38 +95,71 @@ def chat_response(message, history):
             )
         # Decode response
         generated_ids = [
             output_ids[len(input_ids):]
             for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
         ]
         response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
         return response.strip()
     except Exception as e:
         print(f"Error generating response: {e}")
-        return f"Sorry, I encountered an error: {str(e)}"
 def create_demo():
     """Create the Gradio chat interface"""
-    # Create chat interface (corrected API for newer Gradio)
     demo = gr.ChatInterface(
         fn=chat_response,
-        title="VibeThinker-1.5B Chat",
-        description=f"Chat with {MODEL_ID}. {SYSTEM_PROMPT}",
         examples=[
             "What is 2+2?",
             "Explain quantum physics briefly",
             "Write a short poem",
-            "How do I make good decisions?"
         ],
-        theme=gr.themes.Soft(),
     )
     return demo
 if __name__ == "__main__":
     demo = create_demo()
     demo.launch(share=False)

 import gradio as gr
 import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
 import spaces
+import time
 # Model configuration
 MODEL_ID = "WeiboAI/VibeThinker-1.5B"
     """Load the model and tokenizer"""
     try:
         print(f"Loading model: {MODEL_ID}")
+        tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
         model = AutoModelForCausalLM.from_pretrained(
             MODEL_ID,
             torch_dtype=torch.float16,
     tokenizer = None
 @spaces.GPU
+def chat_response(message, history, progress=gr.Progress()):
     """
+    Generate response for the chat interface with progress feedback.
     Args:
         message (str): Current user message
         history (list): Chat history as list of tuples [(user_msg, assistant_msg), ...]
+        progress: Gradio progress tracker
     Returns:
         str: Generated response
     """
     if model is None or tokenizer is None:
+        return "❌ Model not loaded. Please check the model configuration."
     try:
+        # Show progress to user
+        progress(0.1, desc="🔄 Preparing conversation...")
+        time.sleep(0.1)
         # Build conversation format
         messages = [{"role": "system", "content": SYSTEM_PROMPT}]
         # Add chat history
+        progress(0.2, desc="📝 Building conversation history...")
+        time.sleep(0.1)
         for user_msg, assistant_msg in history:
             messages.append({"role": "user", "content": user_msg})
             messages.append({"role": "assistant", "content": assistant_msg})
         messages.append({"role": "user", "content": message})
         # Apply chat template
+        progress(0.3, desc="🎯 Formatting input...")
+        time.sleep(0.1)
         formatted_input = tokenizer.apply_chat_template(
             messages,
             tokenize=False,
         )
         # Tokenize input
+        progress(0.4, desc="🔤 Tokenizing input...")
+        time.sleep(0.1)
         model_inputs = tokenizer([formatted_input], return_tensors="pt").to(model.device)
         # Generate response
+        progress(0.5, desc="🧠 Generating response...")
+        time.sleep(0.1)
         with torch.no_grad():
             generated_ids = model.generate(
                 **model_inputs,
             )
         # Decode response
+        progress(0.8, desc="📖 Decoding response...")
+        time.sleep(0.1)
         generated_ids = [
             output_ids[len(input_ids):]
             for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
         ]
         response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+        progress(1.0, desc="✅ Response ready!")
         return response.strip()
     except Exception as e:
         print(f"Error generating response: {e}")
+        return f"❌ Sorry, I encountered an error: {str(e)}"
 def create_demo():
     """Create the Gradio chat interface"""
+    # Create chat interface with modern API
     demo = gr.ChatInterface(
         fn=chat_response,
+        title="🤖 VibeThinker-1.5B Chat",
+        description=f"""<div style='text-align: center'>
+        <p>Chat with <strong>{MODEL_ID}</strong></p>
+        <p>System: <em>{SYSTEM_PROMPT}</em></p>
+        <p>🚀 Powered by ZeroGPU for fast inference</p>
+        </div>""",
         examples=[
             "What is 2+2?",
             "Explain quantum physics briefly",
             "Write a short poem",
+            "How do I make good decisions?",
+            "What are the benefits of AI?"
         ],
+        theme=gr.themes.Soft(
+            primary_hue="blue",
+            secondary_hue="gray",
+            neutral_hue="slate",
+        ),
     )
     return demo
+# Test the model loading
 if __name__ == "__main__":
+    print("🧪 Testing model loading...")
+    if model is not None and tokenizer is not None:
+        print("✅ Model test passed!")
+        # Test with a simple message
+        test_messages = [{"role": "user", "content": "Hello! How are you?"}]
+        try:
+            test_input = tokenizer.apply_chat_template(
+                test_messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+            print("✅ Tokenization test passed!")
+            print("🚀 All tests passed! Launching app...")
+        except Exception as e:
+            print(f"❌ Tokenization test failed: {e}")
+    else:
+        print("❌ Model test failed!")
     demo = create_demo()
     demo.launch(share=False)