Spaces:

thisisam
/

fara-7b-chat-test

Runtime error

App Files Files Community

thisisam commited on Dec 3, 2025

Commit

faf508c

1 Parent(s): a4a4f9a

Enable vision-language capabilities with transformers format

Browse files

Files changed (4) hide show

LOCAL_TESTING.md +0 -110
README.md +111 -41
app.py +357 -158
requirements.txt +2 -1

LOCAL_TESTING.md DELETED Viewed

@@ -1,110 +0,0 @@
-# Local Testing Instructions
-## Test Your Space Locally Before Deploying
-Before deploying to Hugging Face, you can test the app on your local machine.
-### Prerequisites
-1. Python 3.8 or higher installed
-2. Your Hugging Face token ready
-### Steps
-#### 1. Install Dependencies
-Open PowerShell/Terminal and navigate to this folder:
-```bash
-cd "c:/Users/Amir/OneDrive - Digital Health CRC Limited/Projects/url2md/fara-7b-space"
-```
-Install required packages:
-```bash
-pip install -r requirements.txt
-```
-#### 2. Set Your HuggingFace Token
-Create a `.env` file in this folder (it's already in .gitignore, so it won't be committed):
-```bash
-# PowerShell command to create .env file
-echo "HF_TOKEN=your_token_here" > .env
-```
-Replace `your_token_here` with your actual Hugging Face token.
-#### 3. Update app.py to Load .env (Temporary)
-For local testing only, add these lines at the top of `app.py`:
-```python
-from dotenv import load_dotenv
-load_dotenv()  # Load .env file
-```
-And install python-dotenv:
-```bash
-pip install python-dotenv
-```
-#### 4. Run the App Locally
-```bash
-python app.py
-```
-You should see output like:
-```
-Running on local URL:  http://127.0.0.1:7860
-```
-Open that URL in your browser to test!
-#### 5. Test the Chat
-- Type a message
-- Verify you get responses from Fara-7B
-- Test different temperatures and max_tokens settings
-- Check if streaming works properly
-### Important Notes
-⚠️ **Before Deploying:**
-- Remove the `load_dotenv()` code from `app.py` (Spaces use secrets, not .env)
-- Don't commit your `.env` file (already in .gitignore)
-- The Space will use the `HF_TOKEN` secret instead
-### Troubleshooting Local Testing
-**Import Error for dotenv:**
-```bash
-pip install python-dotenv
-```
-**Token Error:**
-- Check your token is correct in `.env`
-- Ensure no extra spaces or quotes
-- Verify token has inference permissions
-**Port Already in Use:**
-```bash
-# Kill the process or run on different port
-python app.py --server-port 7861
-```
-### Alternative: Quick Test Without .env
-You can also temporarily hardcode your token (FOR TESTING ONLY):
-```python
-client = InferenceClient(token="your_token_here")  # TEMPORARY - REMOVE BEFORE DEPLOYING
-```
-⚠️ **NEVER commit hardcoded tokens to git!**
----
-Once local testing works, you're ready to deploy to Hugging Face Spaces! See `DEPLOYMENT_GUIDE.md` for deployment instructions.

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Fara-7B Computer Use Agent
 emoji: 🤖
 colorFrom: purple
 colorTo: blue
@@ -8,75 +8,145 @@ sdk_version: 5.0.2
 app_file: app.py
 pinned: false
 license: mit
-short_description: Chat interface for Microsoft Fara-7B agentic model
 ---
-# Fara-7B: Computer Use Agent Chat Interface
-This Space provides a chat interface to interact with **Microsoft Fara-7B**, an efficient agentic model designed for computer use and web automation.
 ## 🌟 Features
-- **Interactive Chat**: Converse with Fara-7B about web automation tasks
-- **Streaming Responses**: Real-time response generation
-- **Customizable Parameters**: Adjust temperature and max tokens
-- **Clean UI**: Modern, user-friendly interface built with Gradio
 ## 🚀 About Fara-7B
-Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, it achieves state-of-the-art performance for:
-- 🛒 **Shopping automation**
-- ✈️ **Travel booking research**
-- 🍽️ **Restaurant reservations**
-- 📧 **Account workflows**
-- 🔍 **Information seeking**
-## ⚙️ Setup Instructions
-### For Space Owners:
-1. **Fork/Duplicate this Space** to your account
-2. Go to **Settings** → **Variables and secrets**
-3. Add a new secret:
-   - **Name**: `HF_TOKEN`
-   - **Value**: Your Hugging Face token (get it from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens))
-4. Ensure your token has **inference** permissions
-5. Restart the Space
-### Getting a Hugging Face Token:
-1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
-2. Click **New token**
-3. Select **Read** access (sufficient for inference)
-4. Copy the token and add it to Space secrets
-## 🎯 Usage
-Simply type your request in the chat box! Examples:
-- "Help me find Italian restaurants in Seattle"
-- "What steps would I take to book a flight to London?"
-- "How can I search for running shoes on an e-commerce site?"
-⚠️ **Note**: This is a text-only chat interface. For full computer use capabilities with screenshots and browser automation, check out the [Magentic-UI framework](https://github.com/microsoft/magentic-ui).
 ## 📚 Resources
-- [Model Card](https://huggingface.co/microsoft/Fara-7B)
-- [Microsoft Research](https://www.microsoft.com/en-us/research/)
-- [Magentic-UI](https://github.com/microsoft/magentic-ui) - Full computer use framework
-## 📝 License
-MIT License - See [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) for model license details.
 ## 🤝 Credits
 - **Model**: Microsoft Research
 - **Interface**: Built with Gradio
-- **Inference**: Hugging Face Inference API
 ---
-*This Space demonstrates the capabilities of Fara-7B through a simple chat interface. For production use cases requiring actual computer control, integrate with the full Magentic-UI framework.*

 ---
+title: Fara-7B Chat
 emoji: 🤖
 colorFrom: purple
 colorTo: blue
 app_file: app.py
 pinned: false
 license: mit
+short_description: Chat interface for Microsoft Fara-7B web automation agent
 ---
+# Fara-7B: Web Automation Agent Chat Interface
+This Space provides a chat interface to interact with **Microsoft Fara-7B**, a 7B parameter vision-language model designed for web automation and computer use.
 ## 🌟 Features
+- **Vision-Language Model**: Upload browser screenshots with your tasks
+- **Web Automation Planning**: Describes step-by-step actions for web tasks
+- **Safety-First**: Stops at "Critical Points" (checkout, personal info)
+- **Flexible Usage**: Works with or without screenshots
 ## 🚀 About Fara-7B
+Fara-7B is Microsoft's specialized agentic model for computer use. With 7 billion parameters, it can:
+- 📸 Understand browser screenshots
+- 🎯 Plan multi-step web automation tasks
+- 🔧 Use browser tools (click, type, scroll)
+- 🛑 Stop before sensitive actions (Critical Points)
+- 💡 Handle tasks like shopping, travel, research, and more
+### Key Capabilities
+- 🛒 **Shopping automation**: Find products, add to cart
+- ✈️ **Travel booking**: Search flights and hotels
+- 🍽️ **Restaurant search**: Find dining options
+- 📊 **Information extraction**: Research and data gathering
+- 🏛️ **Government portals**: Navigate and extract grant/funding info
+## 🎯 How to Use
+### Simple Text Tasks
+Just describe what you want to accomplish:
+- "Find healthcare grants on the NSW government website"
+- "Search for running shoes under $100"
+- "Look up Italian restaurants in Seattle with 4+ stars"
+### Advanced: With Screenshots
+1. Take a screenshot of the browser/website you're working with
+2. Upload the screenshot
+3. Describe your task
+4. Fara-7B will analyze the screenshot and plan the next actions
+## ⚙️ Setup
+### For This Space
+1. **Request Model Access**:
+   - Visit [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
+   - Click "Request access" if it's gated
+   - Wait for approval
+2. **Set HF_TOKEN** (Space owners only):
+   - Go to Space Settings → Variables and secrets
+   - Add secret: `HF_TOKEN` = your HuggingFace token
+   - Get token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
+### Use Locally with Transformers
+```python
+from transformers import pipeline
+pipe = pipeline("image-text-to-text", model="microsoft/Fara-7B")
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "url": "screenshot.jpg"},
+            {"type": "text", "text": "Find running shoes under $100"}
+        ]
+    },
+]
+result = pipe(text=messages)
+```
+### Full Browser Automation (vLLM + CLI)
+For actual browser control with live automation:
+```bash
+# 1. Clone repository
+git clone https://github.com/microsoft/fara.git
+cd fara
+# 2. Setup environment
+python3 -m venv .venv
+source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -e .
+playwright install
+# 3. Host the model
+vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
+# 4. Run tasks (in another terminal)
+fara-cli --task "your web automation task"
+```
+**System Requirements**:
+- GPU with 16GB+ VRAM
+- Or use `--tensor-parallel-size 2` if limited memory
 ## 📚 Resources
+- **Model Card**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
+- **GitHub Repository**: [microsoft/fara](https://github.com/microsoft/fara)
+- **Microsoft Research**: [Research Page](https://www.microsoft.com/en-us/research/)
+## ⚠️ Important Notes
+### Inference API Limitations
+This Space attempts to use the HuggingFace Inference API, but:
+- The API may not be fully available for Fara-7B
+- If unavailable, demo responses will be provided instead
+- For full functionality, host locally with vLLM (see above)
+### Critical Points
+Fara-7B is designed to stop at "Critical Points":
+- **Checkout/Purchase**: Stops before payment
+- **Booking**: Stops before entering personal info
+- **Account Creation**: Stops before submitting sensitive data
+- **Communication**: Stops before making calls or sending emails
+This ensures safety and gives you control over sensitive actions.
 ## 🤝 Credits
 - **Model**: Microsoft Research
 - **Interface**: Built with Gradio
+- **Infrastructure**: HuggingFace Spaces
+## 📝 License
+MIT License - See [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) for model license details.
 ---
+*Experience web automation AI with Fara-7B. For production use cases requiring actual browser control, integrate with the full vLLM setup or use the Magentic-UI framework.*

app.py CHANGED Viewed

@@ -1,211 +1,410 @@
 import gradio as gr
 from huggingface_hub import InferenceClient
 import os
-import json
 # Initialize the Inference Client
 client = InferenceClient(token=os.getenv("HF_TOKEN"))
-def chat_with_fara(message, history):
     """
-    Interact with Fara-7B using the correct format for agent tasks
     """
     try:
-        # Build the prompt in the expected format for Fara-7B
-        # Fara-7B is designed for web automation tasks with specific structure
-        system_prompt = """You are Fara, a web automation agent. You help users with web-based tasks by providing step-by-step guidance for browser automation."""
-        # Format messages for the model
         messages = [
-            {"role": "system", "content": system_prompt},
-            {"role": "user", "content": message}
         ]
-        # Use the conversational endpoint which is more appropriate
-        response = client.conversational(
-            text=message,
-            model="microsoft/Fara-7B",
-            max_length=500,
-            temperature=0.7
-        )
-        # Extract the response
-        if hasattr(response, 'generated_text'):
-            return response.generated_text
-        elif isinstance(response, str):
-            return response
-        else:
-            return str(response)
-    except Exception as e:
-        error_msg = f"❌ Error: {str(e)}"
-        # Provide specific guidance based on common errors
-        if "401" in str(e):
-            error_msg += "\n\n🔐 Authentication failed. Please check:"
-            error_msg += "\n- Your HF_TOKEN is set in Space secrets"
-            error_msg += "\n- You have requested access to microsoft/Fara-7B"
-            error_msg += "\n- Your token has the necessary permissions"
-        elif "404" in str(e):
-            error_msg += "\n\n🔍 Model not found. The model might be:"
-            error_msg += "\n- Private and requiring access request"
-            error_msg += "\n- Temporarily unavailable"
-        elif "403" in str(e):
-            error_msg += "\n\n🚫 Access forbidden. You need to:"
-            error_msg += "\n- Visit https://huggingface.co/microsoft/Fara-7B"
-            error_msg += "\n- Click 'Access repository' to request access"
-            error_msg += "\n- Wait for approval from Microsoft"
-        return error_msg
-# Alternative: Use text generation with proper formatting
-def chat_with_fara_text_generation(message, history):
-    """
-    Alternative approach using text generation with proper prompt formatting
-    """
-    try:
-        # Format prompt for agent tasks
-        prompt = f"""<|system|>
-You are Fara, a web automation agent designed to help users with web-based tasks.
-When responding:
-1. Break down complex web tasks into steps
-2. Suggest specific actions that could be automated
-3. Identify potential challenges in web automation
-4. Provide practical guidance for browser automation
-<|user|>
-{message}
-<|assistant|>
-"""
-        response = client.text_generation(
-            prompt=prompt,
-            model="microsoft/Fara-7B",
-            max_new_tokens=500,
-            temperature=0.7,
-            do_sample=True
-        )
-        # Clean the response
-        if "<|assistant|>" in response:
-            response = response.split("<|assistant|>")[-1].strip()
-        return response
     except Exception as e:
-        return f"❌ Text generation error: {str(e)}"
-# Fallback function for when Fara-7B is not accessible
-def fallback_chat(message, history):
     """
-    Fallback when Fara-7B is not accessible
     """
-    fallback_responses = {
-        "web automation": "For web automation tasks like the NSW grants search, you would typically:\n\n1. Navigate to https://www.nsw.gov.au/grants-and-funding\n2. Use search functionality to filter for 'healthcare' grants\n3. Extract the list of available funding opportunities\n4. Provide summaries with eligibility criteria and deadlines",
-        "general": "I'd be happy to help with web automation tasks! For tasks like finding grants on government websites, the process involves:\n- Website navigation\n- Search and filtering\n- Data extraction\n- Result organization"
-    }
-    # Simple keyword-based fallback
     message_lower = message.lower()
-    if any(keyword in message_lower for keyword in ['grant', 'funding', 'nsw', 'healthcare']):
-        return fallback_responses["web automation"]
-    else:
-        return fallback_responses["general"]
-def smart_chat_handler(message, history):
-    """
-    Smart handler that tries multiple approaches
-    """
-    # First try the conversational API
-    try:
-        response = chat_with_fara(message, history)
-        if "Error" not in response and "error" not in response.lower():
-            return response
-    except:
-        pass
-    # Then try text generation
-    try:
-        response = chat_with_fara_text_generation(message, history)
-        if "Error" not in response and "error" not in response.lower():
-            return response
-    except:
-        pass
-    # Finally use fallback
-    return fallback_chat(message, history)
 # Create the Gradio interface
-with gr.Blocks(theme=gr.themes.Soft()) as demo:
     gr.Markdown(
         """
-        # 🤖 Fara-7B Web Automation Assistant
-        **Microsoft's specialized agent for web automation tasks**
-        This interface connects to the Fara-7B model designed for:
-        - Web navigation and automation
-        - Task planning for browser actions
-        - Step-by-step guidance for web tasks
-        ⚠️ **Note**: Access to Fara-7B requires permission from Microsoft.
         """
     )
-    # Add access information
-    with gr.Accordion("🔐 Access Requirements", open=False):
         gr.Markdown("""
-        To use Fara-7B, you need:
-        1. **Access Request**: Visit [the model page](https://huggingface.co/microsoft/Fara-7B) and click "Access repository"
-        2. **HF_TOKEN**: Add your Hugging Face token in Space secrets
-        3. **Wait for Approval**: Microsoft needs to approve your access request
-        If you don't have access yet, this demo will show how Fara-7B would respond to web automation tasks.
         """)
     chatbot = gr.Chatbot(
         height=500,
-        label="Web Automation Chat",
-        show_label=True
     )
     with gr.Row():
-        msg = gr.Textbox(
-            label="Web Task Description",
-            placeholder="Example: Go to NSW grants website and find healthcare funding...",
-            lines=2,
-            scale=4
-        )
-        send_btn = gr.Button("Execute Task", scale=1, variant="primary")
     with gr.Row():
         clear_btn = gr.Button("Clear Chat")
-        method_btn = gr.Button("Check Access Status")
-    output_status = gr.Textbox(label="Status", visible=False)
-    def respond(message, chat_history):
-        response = smart_chat_handler(message, chat_history)
-        chat_history.append((message, response))
-        return "", chat_history
-    def check_access():
-        try:
-            # Simple test to check if model is accessible
-            test_client = InferenceClient(token=os.getenv("HF_TOKEN"))
-            test_response = test_client.model_status("microsoft/Fara-7B")
-            return "✅ Fara-7B is accessible!"
-        except Exception as e:
-            return f"❌ Access issue: {str(e)}"
-    msg.submit(respond, [msg, chatbot], [msg, chatbot])
-    send_btn.click(respond, [msg, chatbot], [msg, chatbot])
-    clear_btn.click(lambda: ([], ""), outputs=[chatbot, msg])
-    method_btn.click(check_access, outputs=output_status)
 if __name__ == "__main__":
-    # On HuggingFace Spaces, we don't need to specify server settings
-    # The platform handles this automatically
     demo.launch()

 import gradio as gr
 from huggingface_hub import InferenceClient
 import os
+from PIL import Image
+import requests
+from io import BytesIO
 # Initialize the Inference Client
 client = InferenceClient(token=os.getenv("HF_TOKEN"))
+def create_demo_screenshot(task_type="general"):
     """
+    Create a simple placeholder screenshot for demo purposes
+    In actual use, this would be a real browser screenshot
+    """
+    # For now, return None - we'll use text-only mode
+    return None
+def chat_with_fara(message, history, image=None):
+    """
+    Interact with Fara-7B using the vision-language model API
     """
     try:
+        # Build the proper message format for Fara-7B
+        system_prompt = """You are a web automation agent that performs actions on websites to fulfill user requests by calling various tools.
+You should stop execution at Critical Points. A Critical Point occurs in tasks like:
+- Checkout, Book, Purchase, Call, Email, Order
+A Critical Point requires the user's permission or personal/sensitive information (name, email, credit card, address, payment information, resume, etc.) to complete a transaction (purchase, reservation, sign-up, etc.), or to communicate as a human would (call, email, apply to a job, etc.).
+Guideline: Solve the task as far as possible up until a Critical Point.
+Examples:
+- If the task is to "call a restaurant to make a reservation," do not actually make the call. Instead, navigate to the restaurant's page and find the phone number.
+- If the task is to "order new size 12 running shoes," do not place the order. Instead, search for the right shoes that meet the criteria and add them to the cart.
+Some tasks, like answering questions, may not encounter a Critical Point at all."""
+        # Prepare messages in the format expected by Fara-7B
         messages = [
+            {"role": "system", "content": system_prompt}
         ]
+        # Add history
+        if history:
+            for h in history:
+                if h["role"] in ["user", "assistant"]:
+                    messages.append(h)
+        # Add current message
+        user_content = []
+        # Add image if provided
+        if image is not None:
+            user_content.append({"type": "image", "image": image})
+        # Add text
+        user_content.append({"type": "text", "text": message})
+        messages.append({
+            "role": "user",
+            "content": user_content if len(user_content) > 1 else message
+        })
+        # Try to use the Inference API
+        try:
+            response = client.chat_completion(
+                messages=messages,
+                model="microsoft/Fara-7B",
+                max_tokens=512,
+                temperature=0.7,
+            )
+            # Extract the response
+            if hasattr(response, 'choices') and len(response.choices) > 0:
+                return response.choices[0].message.content
+            else:
+                raise Exception("Unexpected response format")
+        except Exception as api_error:
+            error_str = str(api_error).lower()
+            # Check for specific errors
+            if "no api" in error_str or "not found" in error_str or "404" in error_str:
+                # Model doesn't have Inference API - provide helpful demo response
+                return generate_demo_response(message)
+            elif "401" in error_str or "unauthorized" in error_str:
+                return """❌ **Authentication Error**
+Please check:
+1. Your `HF_TOKEN` is set in Space secrets
+2. You have requested access to [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
+3. Your token has read permissions
+To use Fara-7B locally instead:
+```bash
+git clone https://github.com/microsoft/fara.git
+cd fara
+pip install -e .
+playwright install
+vllm serve "microsoft/Fara-7B" --port 5000
+```
+"""
+            elif "403" in error_str or "forbidden" in error_str:
+                return """❌ **Access Forbidden**
+You need to request access to the model:
+1. Visit: https://huggingface.co/microsoft/Fara-7B
+2. Click "Request access to this repository"
+3. Wait for Microsoft to approve your request
+Once approved, make sure your `HF_TOKEN` is set in Space secrets.
+"""
+            else:
+                # Unknown error - try demo mode
+                return f"⚠️ API Error: {str(api_error)}\n\n**Demo Response:**\n\n" + generate_demo_response(message)
     except Exception as e:
+        return f"❌ Error: {str(e)}\n\nPlease check the Space logs for more details."
+def generate_demo_response(message):
     """
+    Generate a helpful demo response when the API is not available
     """
     message_lower = message.lower()
+    # Shopping/E-commerce tasks
+    if any(word in message_lower for word in ['buy', 'shop', 'purchase', 'order', 'cart', 'shoes', 'product']):
+        return """🛒 **Task: Shopping/Purchase**
+**Action Plan:**
+1. 🔍 Navigate to e-commerce website
+2. 🔎 Search for: [extracted product from your query]
+3. 📋 Apply filters: price, rating, availability
+4. ✅ Select best match
+5. ➕ Add to cart
+6. 🛑 **STOP** - Critical Point: Checkout requires payment info
+**What I would do with a screenshot:**
+- Identify search bar location
+- Read product listings
+- Click appropriate buttons
+- Navigate to cart
+**Next steps for you:**
+- Review cart
+- Complete checkout manually
+💡 *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
+"""
+    # Travel/booking tasks
+    elif any(word in message_lower for word in ['flight', 'hotel', 'travel', 'book', 'trip']):
+        return """✈️ **Task: Travel Booking**
+**Action Plan:**
+1. 🌐 Navigate to travel site
+2. 📅 Enter dates and destination
+3. 🔍 Search options
+4. 💰 Sort by price/rating
+5. 📊 Compare top results
+6. 🛑 **STOP** - Critical Point: Booking requires personal info
+**What I would do with a screenshot:**
+- Find date pickers
+- Enter search criteria
+- Click search button
+- Read results table
+**Next steps for you:**
+- Review options
+- Complete booking manually
+💡 *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
+"""
+    # Restaurant tasks
+    elif any(word in message_lower for word in ['restaurant', 'food', 'dining', 'reservation']):
+        return """🍽️ **Task: Restaurant Search**
+**Action Plan:**
+1. 🔎 Search for restaurants
+2. 📍 Filter by location and cuisine
+3. ⭐ Check ratings and reviews
+4. 📞 Find contact info
+5. 🛑 **STOP** - Critical Point: Reservation requires personal info
+**What I would do with a screenshot:**
+- Identify search results
+- Read restaurant details
+- Extract phone number
+- Locate reservation link
+**Next steps for you:**
+- Call or book reservation manually
+💡 *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
+"""
+    # Government/grants (your specific use case!)
+    elif any(word in message_lower for word in ['grant', 'funding', 'government', 'nsw', 'healthcare']):
+        return """🏛️ **Task: Government Grants Research**
+**Action Plan:**
+1. 🌐 Navigate to government grants portal
+2. 🔎 Use search functionality
+3. 📋 Filter by: healthcare, eligibility, deadline
+4. 📊 Extract grant information
+5. ✅ **COMPLETE** - No Critical Point
+**What I would do with a screenshot:**
+- Locate search bar
+- Read grant listings
+- Extract key details:
+  - Grant title
+  - Funding amount
+  - Eligibility criteria
+  - Application deadline
+  - Contact information
+**Example output:**
+```
+Grant: Healthcare Innovation Fund
+Amount: $50,000 - $500,000
+Eligibility: Registered healthcare providers
+Deadline: March 31, 2024
+Link: [grant URL]
+```
+💡 *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
+"""
+    # General response
+    else:
+        return """🤖 **Fara-7B Web Automation Agent**
+I help with web automation tasks! I can:
+✅ Shopping & e-commerce
+✅ Travel & booking
+✅ Restaurant search
+✅ Information extraction
+✅ Government portals & grants
+✅ Account navigation
+**How I work:**
+1. 📸 Analyze browser screenshot (when provided)
+2. 🎯 Understand your goal
+3. 📝 Plan step-by-step actions
+4. 🔧 Use browser tools (click, type, scroll)
+5. 🛑 Stop at Critical Points (checkout, personal info)
+**Example tasks:**
+- "Find running shoes under $100"
+- "Search for flights to Tokyo"
+- "Find healthcare grants on the NSW government website"
+- "Look up Italian restaurants in Seattle"
+**To use with screenshots:**
+Upload a browser screenshot and describe your task!
+💡 *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM:*
+```bash
+vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
+```
+"""
 # Create the Gradio interface
+with gr.Blocks(theme=gr.themes.Soft(), title="Fara-7B Chat") as demo:
     gr.Markdown(
         """
+        # 🤖 Fara-7B Web Automation Agent
+        **Microsoft's specialized vision-language model for web automation**
+        Fara-7B can analyze browser screenshots and plan web automation tasks.
+        💡 **How to use:**
+        - Upload a browser screenshot (optional)
+        - Describe your web automation task
+        - Fara-7B will plan the actions needed
+        ⚠️ **Note**: The Inference API may not be fully available for this model. For complete functionality including actual browser control, host locally with vLLM (see instructions below).
         """
     )
+    with gr.Accordion("📚 About Fara-7B & Setup Instructions", open=False):
         gr.Markdown("""
+        ### What is Fara-7B?
+        Fara-7B is a 7B parameter vision-language model designed for computer use. It can:
+        - Understand browser screenshots
+        - Plan multi-step web automation tasks
+        - Use tools (click, type, scroll, etc.)
+        - Stop at "Critical Points" for safety
+        ### Using Transformers Library (Colab/Local)
+        ```python
+        from transformers import pipeline
+        pipe = pipeline("image-text-to-text", model="microsoft/Fara-7B")
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image", "url": "screenshot.jpg"},
+                    {"type": "text", "text": "Find running shoes"}
+                ]
+            },
+        ]
+        result = pipe(text=messages)
+        ```
+        ### Full Browser Automation (Local)
+        ```bash
+        # Clone repository
+        git clone https://github.com/microsoft/fara.git
+        cd fara
+        # Setup environment
+        python3 -m venv .venv
+        source .venv/bin/activate
+        pip install -e .
+        playwright install
+        # Host model
+        vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
+        # Run tasks
+        fara-cli --task "your task here"
+        ```
+        **Resources:**
+        - Model: https://huggingface.co/microsoft/Fara-7B
+        - GitHub: https://github.com/microsoft/fara
         """)
     chatbot = gr.Chatbot(
         height=500,
+        label="Chat",
+        show_label=True,
+        type="messages"
     )
     with gr.Row():
+        with gr.Column(scale=3):
+            msg = gr.Textbox(
+                label="Task Description",
+                placeholder="Example: Find healthcare grants on the NSW government website...",
+                lines=2
+            )
+        with gr.Column(scale=1):
+            image_input = gr.Image(
+                label="Browser Screenshot (Optional)",
+                type="pil",
+                height=100
+            )
     with gr.Row():
+        send_btn = gr.Button("Send", variant="primary")
         clear_btn = gr.Button("Clear Chat")
+    gr.Markdown("""
+    ### 💡 Tips for Best Results
+    - **With screenshot**: Upload a browser screenshot and describe what you want to accomplish
+    - **Without screenshot**: Describe the web task, and Fara-7B will plan the approach
+    - **Be specific**: Include details like website, search criteria, budget, etc.
+    - **Critical Points**: Fara-7B will stop before checkout, booking, or entering personal info
+    ### 🎯 Example Tasks
+    - "Find healthcare grants for digital health projects in Australia"
+    - "Search for running shoes under $100 on this e-commerce page"
+    - "Look up restaurants in Seattle with 4+ stars for Italian food"
+    - "Find the contact information on this website"
+    """)
+    def respond(message, image, chat_history):
+        if not message.strip():
+            return chat_history, None
+        # Add user message to history
+        user_msg = {"role": "user", "content": message}
+        chat_history.append(user_msg)
+        # Get response from Fara
+        response = chat_with_fara(message, chat_history, image)
+        # Add assistant response to history
+        assistant_msg = {"role": "assistant", "content": response}
+        chat_history.append(assistant_msg)
+        return chat_history, None
+    def clear_chat():
+        return [], None
+    msg.submit(respond, [msg, image_input, chatbot], [chatbot, image_input]).then(
+        lambda: ("", None), None, [msg, image_input]
+    )
+    send_btn.click(respond, [msg, image_input, chatbot], [chatbot, image_input]).then(
+        lambda: ("", None), None, [msg, image_input]
+    )
+    clear_btn.click(clear_chat, outputs=[chatbot, image_input])
 if __name__ == "__main__":
     demo.launch()

requirements.txt CHANGED Viewed

@@ -1,2 +1,3 @@
 gradio==5.0.2
-huggingface-hub==0.26.2

 gradio==5.0.2
+huggingface-hub==0.26.2
+Pillow