thisisam commited on
Commit
faf508c
Β·
1 Parent(s): a4a4f9a

Enable vision-language capabilities with transformers format

Browse files
Files changed (4) hide show
  1. LOCAL_TESTING.md +0 -110
  2. README.md +111 -41
  3. app.py +357 -158
  4. requirements.txt +2 -1
LOCAL_TESTING.md DELETED
@@ -1,110 +0,0 @@
1
- # Local Testing Instructions
2
-
3
- ## Test Your Space Locally Before Deploying
4
-
5
- Before deploying to Hugging Face, you can test the app on your local machine.
6
-
7
- ### Prerequisites
8
-
9
- 1. Python 3.8 or higher installed
10
- 2. Your Hugging Face token ready
11
-
12
- ### Steps
13
-
14
- #### 1. Install Dependencies
15
-
16
- Open PowerShell/Terminal and navigate to this folder:
17
-
18
- ```bash
19
- cd "c:/Users/Amir/OneDrive - Digital Health CRC Limited/Projects/url2md/fara-7b-space"
20
- ```
21
-
22
- Install required packages:
23
-
24
- ```bash
25
- pip install -r requirements.txt
26
- ```
27
-
28
- #### 2. Set Your HuggingFace Token
29
-
30
- Create a `.env` file in this folder (it's already in .gitignore, so it won't be committed):
31
-
32
- ```bash
33
- # PowerShell command to create .env file
34
- echo "HF_TOKEN=your_token_here" > .env
35
- ```
36
-
37
- Replace `your_token_here` with your actual Hugging Face token.
38
-
39
- #### 3. Update app.py to Load .env (Temporary)
40
-
41
- For local testing only, add these lines at the top of `app.py`:
42
-
43
- ```python
44
- from dotenv import load_dotenv
45
- load_dotenv() # Load .env file
46
- ```
47
-
48
- And install python-dotenv:
49
- ```bash
50
- pip install python-dotenv
51
- ```
52
-
53
- #### 4. Run the App Locally
54
-
55
- ```bash
56
- python app.py
57
- ```
58
-
59
- You should see output like:
60
- ```
61
- Running on local URL: http://127.0.0.1:7860
62
- ```
63
-
64
- Open that URL in your browser to test!
65
-
66
- #### 5. Test the Chat
67
-
68
- - Type a message
69
- - Verify you get responses from Fara-7B
70
- - Test different temperatures and max_tokens settings
71
- - Check if streaming works properly
72
-
73
- ### Important Notes
74
-
75
- ⚠️ **Before Deploying:**
76
- - Remove the `load_dotenv()` code from `app.py` (Spaces use secrets, not .env)
77
- - Don't commit your `.env` file (already in .gitignore)
78
- - The Space will use the `HF_TOKEN` secret instead
79
-
80
- ### Troubleshooting Local Testing
81
-
82
- **Import Error for dotenv:**
83
- ```bash
84
- pip install python-dotenv
85
- ```
86
-
87
- **Token Error:**
88
- - Check your token is correct in `.env`
89
- - Ensure no extra spaces or quotes
90
- - Verify token has inference permissions
91
-
92
- **Port Already in Use:**
93
- ```bash
94
- # Kill the process or run on different port
95
- python app.py --server-port 7861
96
- ```
97
-
98
- ### Alternative: Quick Test Without .env
99
-
100
- You can also temporarily hardcode your token (FOR TESTING ONLY):
101
-
102
- ```python
103
- client = InferenceClient(token="your_token_here") # TEMPORARY - REMOVE BEFORE DEPLOYING
104
- ```
105
-
106
- ⚠️ **NEVER commit hardcoded tokens to git!**
107
-
108
- ---
109
-
110
- Once local testing works, you're ready to deploy to Hugging Face Spaces! See `DEPLOYMENT_GUIDE.md` for deployment instructions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Fara-7B Computer Use Agent
3
  emoji: πŸ€–
4
  colorFrom: purple
5
  colorTo: blue
@@ -8,75 +8,145 @@ sdk_version: 5.0.2
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: Chat interface for Microsoft Fara-7B agentic model
12
  ---
13
 
14
- # Fara-7B: Computer Use Agent Chat Interface
15
 
16
- This Space provides a chat interface to interact with **Microsoft Fara-7B**, an efficient agentic model designed for computer use and web automation.
17
 
18
  ## 🌟 Features
19
 
20
- - **Interactive Chat**: Converse with Fara-7B about web automation tasks
21
- - **Streaming Responses**: Real-time response generation
22
- - **Customizable Parameters**: Adjust temperature and max tokens
23
- - **Clean UI**: Modern, user-friendly interface built with Gradio
24
 
25
  ## πŸš€ About Fara-7B
26
 
27
- Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, it achieves state-of-the-art performance for:
28
 
29
- - πŸ›’ **Shopping automation**
30
- - ✈️ **Travel booking research**
31
- - 🍽️ **Restaurant reservations**
32
- - πŸ“§ **Account workflows**
33
- - πŸ” **Information seeking**
34
 
35
- ## βš™οΈ Setup Instructions
36
 
37
- ### For Space Owners:
 
 
 
 
38
 
39
- 1. **Fork/Duplicate this Space** to your account
40
- 2. Go to **Settings** β†’ **Variables and secrets**
41
- 3. Add a new secret:
42
- - **Name**: `HF_TOKEN`
43
- - **Value**: Your Hugging Face token (get it from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens))
44
- 4. Ensure your token has **inference** permissions
45
- 5. Restart the Space
46
 
47
- ### Getting a Hugging Face Token:
 
 
 
 
48
 
49
- 1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
50
- 2. Click **New token**
51
- 3. Select **Read** access (sufficient for inference)
52
- 4. Copy the token and add it to Space secrets
 
53
 
54
- ## 🎯 Usage
55
 
56
- Simply type your request in the chat box! Examples:
57
 
58
- - "Help me find Italian restaurants in Seattle"
59
- - "What steps would I take to book a flight to London?"
60
- - "How can I search for running shoes on an e-commerce site?"
 
61
 
62
- ⚠️ **Note**: This is a text-only chat interface. For full computer use capabilities with screenshots and browser automation, check out the [Magentic-UI framework](https://github.com/microsoft/magentic-ui).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ## πŸ“š Resources
65
 
66
- - [Model Card](https://huggingface.co/microsoft/Fara-7B)
67
- - [Microsoft Research](https://www.microsoft.com/en-us/research/)
68
- - [Magentic-UI](https://github.com/microsoft/magentic-ui) - Full computer use framework
69
 
70
- ## πŸ“ License
71
 
72
- MIT License - See [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) for model license details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ## 🀝 Credits
75
 
76
  - **Model**: Microsoft Research
77
  - **Interface**: Built with Gradio
78
- - **Inference**: Hugging Face Inference API
 
 
 
 
79
 
80
  ---
81
 
82
- *This Space demonstrates the capabilities of Fara-7B through a simple chat interface. For production use cases requiring actual computer control, integrate with the full Magentic-UI framework.*
 
1
  ---
2
+ title: Fara-7B Chat
3
  emoji: πŸ€–
4
  colorFrom: purple
5
  colorTo: blue
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: Chat interface for Microsoft Fara-7B web automation agent
12
  ---
13
 
14
+ # Fara-7B: Web Automation Agent Chat Interface
15
 
16
+ This Space provides a chat interface to interact with **Microsoft Fara-7B**, a 7B parameter vision-language model designed for web automation and computer use.
17
 
18
  ## 🌟 Features
19
 
20
+ - **Vision-Language Model**: Upload browser screenshots with your tasks
21
+ - **Web Automation Planning**: Describes step-by-step actions for web tasks
22
+ - **Safety-First**: Stops at "Critical Points" (checkout, personal info)
23
+ - **Flexible Usage**: Works with or without screenshots
24
 
25
  ## πŸš€ About Fara-7B
26
 
27
+ Fara-7B is Microsoft's specialized agentic model for computer use. With 7 billion parameters, it can:
28
 
29
+ - πŸ“Έ Understand browser screenshots
30
+ - 🎯 Plan multi-step web automation tasks
31
+ - πŸ”§ Use browser tools (click, type, scroll)
32
+ - πŸ›‘ Stop before sensitive actions (Critical Points)
33
+ - πŸ’‘ Handle tasks like shopping, travel, research, and more
34
 
35
+ ### Key Capabilities
36
 
37
+ - πŸ›’ **Shopping automation**: Find products, add to cart
38
+ - ✈️ **Travel booking**: Search flights and hotels
39
+ - 🍽️ **Restaurant search**: Find dining options
40
+ - πŸ“Š **Information extraction**: Research and data gathering
41
+ - πŸ›οΈ **Government portals**: Navigate and extract grant/funding info
42
 
43
+ ## 🎯 How to Use
 
 
 
 
 
 
44
 
45
+ ### Simple Text Tasks
46
+ Just describe what you want to accomplish:
47
+ - "Find healthcare grants on the NSW government website"
48
+ - "Search for running shoes under $100"
49
+ - "Look up Italian restaurants in Seattle with 4+ stars"
50
 
51
+ ### Advanced: With Screenshots
52
+ 1. Take a screenshot of the browser/website you're working with
53
+ 2. Upload the screenshot
54
+ 3. Describe your task
55
+ 4. Fara-7B will analyze the screenshot and plan the next actions
56
 
57
+ ## βš™οΈ Setup
58
 
59
+ ### For This Space
60
 
61
+ 1. **Request Model Access**:
62
+ - Visit [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
63
+ - Click "Request access" if it's gated
64
+ - Wait for approval
65
 
66
+ 2. **Set HF_TOKEN** (Space owners only):
67
+ - Go to Space Settings β†’ Variables and secrets
68
+ - Add secret: `HF_TOKEN` = your HuggingFace token
69
+ - Get token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
70
+
71
+ ### Use Locally with Transformers
72
+
73
+ ```python
74
+ from transformers import pipeline
75
+
76
+ pipe = pipeline("image-text-to-text", model="microsoft/Fara-7B")
77
+ messages = [
78
+ {
79
+ "role": "user",
80
+ "content": [
81
+ {"type": "image", "url": "screenshot.jpg"},
82
+ {"type": "text", "text": "Find running shoes under $100"}
83
+ ]
84
+ },
85
+ ]
86
+ result = pipe(text=messages)
87
+ ```
88
+
89
+ ### Full Browser Automation (vLLM + CLI)
90
+
91
+ For actual browser control with live automation:
92
+
93
+ ```bash
94
+ # 1. Clone repository
95
+ git clone https://github.com/microsoft/fara.git
96
+ cd fara
97
+
98
+ # 2. Setup environment
99
+ python3 -m venv .venv
100
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
101
+ pip install -e .
102
+ playwright install
103
+
104
+ # 3. Host the model
105
+ vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
106
+
107
+ # 4. Run tasks (in another terminal)
108
+ fara-cli --task "your web automation task"
109
+ ```
110
+
111
+ **System Requirements**:
112
+ - GPU with 16GB+ VRAM
113
+ - Or use `--tensor-parallel-size 2` if limited memory
114
 
115
  ## πŸ“š Resources
116
 
117
+ - **Model Card**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
118
+ - **GitHub Repository**: [microsoft/fara](https://github.com/microsoft/fara)
119
+ - **Microsoft Research**: [Research Page](https://www.microsoft.com/en-us/research/)
120
 
121
+ ## ⚠️ Important Notes
122
 
123
+ ### Inference API Limitations
124
+
125
+ This Space attempts to use the HuggingFace Inference API, but:
126
+ - The API may not be fully available for Fara-7B
127
+ - If unavailable, demo responses will be provided instead
128
+ - For full functionality, host locally with vLLM (see above)
129
+
130
+ ### Critical Points
131
+
132
+ Fara-7B is designed to stop at "Critical Points":
133
+ - **Checkout/Purchase**: Stops before payment
134
+ - **Booking**: Stops before entering personal info
135
+ - **Account Creation**: Stops before submitting sensitive data
136
+ - **Communication**: Stops before making calls or sending emails
137
+
138
+ This ensures safety and gives you control over sensitive actions.
139
 
140
  ## 🀝 Credits
141
 
142
  - **Model**: Microsoft Research
143
  - **Interface**: Built with Gradio
144
+ - **Infrastructure**: HuggingFace Spaces
145
+
146
+ ## πŸ“ License
147
+
148
+ MIT License - See [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) for model license details.
149
 
150
  ---
151
 
152
+ *Experience web automation AI with Fara-7B. For production use cases requiring actual browser control, integrate with the full vLLM setup or use the Magentic-UI framework.*
app.py CHANGED
@@ -1,211 +1,410 @@
1
  import gradio as gr
2
  from huggingface_hub import InferenceClient
3
  import os
4
- import json
 
 
5
 
6
  # Initialize the Inference Client
7
  client = InferenceClient(token=os.getenv("HF_TOKEN"))
8
 
9
- def chat_with_fara(message, history):
10
  """
11
- Interact with Fara-7B using the correct format for agent tasks
 
 
 
 
 
 
 
 
12
  """
13
  try:
14
- # Build the prompt in the expected format for Fara-7B
15
- # Fara-7B is designed for web automation tasks with specific structure
16
- system_prompt = """You are Fara, a web automation agent. You help users with web-based tasks by providing step-by-step guidance for browser automation."""
17
-
18
- # Format messages for the model
 
 
 
 
 
 
 
 
 
 
 
19
  messages = [
20
- {"role": "system", "content": system_prompt},
21
- {"role": "user", "content": message}
22
  ]
23
 
24
- # Use the conversational endpoint which is more appropriate
25
- response = client.conversational(
26
- text=message,
27
- model="microsoft/Fara-7B",
28
- max_length=500,
29
- temperature=0.7
30
- )
31
-
32
- # Extract the response
33
- if hasattr(response, 'generated_text'):
34
- return response.generated_text
35
- elif isinstance(response, str):
36
- return response
37
- else:
38
- return str(response)
39
-
40
- except Exception as e:
41
- error_msg = f"❌ Error: {str(e)}"
42
-
43
- # Provide specific guidance based on common errors
44
- if "401" in str(e):
45
- error_msg += "\n\nπŸ” Authentication failed. Please check:"
46
- error_msg += "\n- Your HF_TOKEN is set in Space secrets"
47
- error_msg += "\n- You have requested access to microsoft/Fara-7B"
48
- error_msg += "\n- Your token has the necessary permissions"
49
- elif "404" in str(e):
50
- error_msg += "\n\nπŸ” Model not found. The model might be:"
51
- error_msg += "\n- Private and requiring access request"
52
- error_msg += "\n- Temporarily unavailable"
53
- elif "403" in str(e):
54
- error_msg += "\n\n🚫 Access forbidden. You need to:"
55
- error_msg += "\n- Visit https://huggingface.co/microsoft/Fara-7B"
56
- error_msg += "\n- Click 'Access repository' to request access"
57
- error_msg += "\n- Wait for approval from Microsoft"
58
-
59
- return error_msg
60
-
61
- # Alternative: Use text generation with proper formatting
62
- def chat_with_fara_text_generation(message, history):
63
- """
64
- Alternative approach using text generation with proper prompt formatting
65
- """
66
- try:
67
- # Format prompt for agent tasks
68
- prompt = f"""<|system|>
69
- You are Fara, a web automation agent designed to help users with web-based tasks.
70
-
71
- When responding:
72
- 1. Break down complex web tasks into steps
73
- 2. Suggest specific actions that could be automated
74
- 3. Identify potential challenges in web automation
75
- 4. Provide practical guidance for browser automation
76
-
77
- <|user|>
78
- {message}
79
- <|assistant|>
80
- """
81
 
82
- response = client.text_generation(
83
- prompt=prompt,
84
- model="microsoft/Fara-7B",
85
- max_new_tokens=500,
86
- temperature=0.7,
87
- do_sample=True
88
- )
89
 
90
- # Clean the response
91
- if "<|assistant|>" in response:
92
- response = response.split("<|assistant|>")[-1].strip()
93
 
94
- return response
 
 
 
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  except Exception as e:
97
- return f"❌ Text generation error: {str(e)}"
98
 
99
- # Fallback function for when Fara-7B is not accessible
100
- def fallback_chat(message, history):
101
  """
102
- Fallback when Fara-7B is not accessible
103
  """
104
- fallback_responses = {
105
- "web automation": "For web automation tasks like the NSW grants search, you would typically:\n\n1. Navigate to https://www.nsw.gov.au/grants-and-funding\n2. Use search functionality to filter for 'healthcare' grants\n3. Extract the list of available funding opportunities\n4. Provide summaries with eligibility criteria and deadlines",
106
-
107
- "general": "I'd be happy to help with web automation tasks! For tasks like finding grants on government websites, the process involves:\n- Website navigation\n- Search and filtering\n- Data extraction\n- Result organization"
108
- }
109
-
110
- # Simple keyword-based fallback
111
  message_lower = message.lower()
112
- if any(keyword in message_lower for keyword in ['grant', 'funding', 'nsw', 'healthcare']):
113
- return fallback_responses["web automation"]
114
- else:
115
- return fallback_responses["general"]
116
 
117
- def smart_chat_handler(message, history):
118
- """
119
- Smart handler that tries multiple approaches
120
- """
121
- # First try the conversational API
122
- try:
123
- response = chat_with_fara(message, history)
124
- if "Error" not in response and "error" not in response.lower():
125
- return response
126
- except:
127
- pass
 
 
 
 
 
 
 
 
 
128
 
129
- # Then try text generation
130
- try:
131
- response = chat_with_fara_text_generation(message, history)
132
- if "Error" not in response and "error" not in response.lower():
133
- return response
134
- except:
135
- pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
137
- # Finally use fallback
138
- return fallback_chat(message, history)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
  # Create the Gradio interface
141
- with gr.Blocks(theme=gr.themes.Soft()) as demo:
142
  gr.Markdown(
143
  """
144
- # πŸ€– Fara-7B Web Automation Assistant
145
 
146
- **Microsoft's specialized agent for web automation tasks**
147
 
148
- This interface connects to the Fara-7B model designed for:
149
- - Web navigation and automation
150
- - Task planning for browser actions
151
- - Step-by-step guidance for web tasks
152
 
153
- ⚠️ **Note**: Access to Fara-7B requires permission from Microsoft.
 
 
 
 
 
154
  """
155
  )
156
 
157
- # Add access information
158
- with gr.Accordion("πŸ” Access Requirements", open=False):
159
  gr.Markdown("""
160
- To use Fara-7B, you need:
161
- 1. **Access Request**: Visit [the model page](https://huggingface.co/microsoft/Fara-7B) and click "Access repository"
162
- 2. **HF_TOKEN**: Add your Hugging Face token in Space secrets
163
- 3. **Wait for Approval**: Microsoft needs to approve your access request
 
 
 
164
 
165
- If you don't have access yet, this demo will show how Fara-7B would respond to web automation tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  """)
167
 
168
  chatbot = gr.Chatbot(
169
  height=500,
170
- label="Web Automation Chat",
171
- show_label=True
 
172
  )
173
 
174
  with gr.Row():
175
- msg = gr.Textbox(
176
- label="Web Task Description",
177
- placeholder="Example: Go to NSW grants website and find healthcare funding...",
178
- lines=2,
179
- scale=4
180
- )
181
- send_btn = gr.Button("Execute Task", scale=1, variant="primary")
 
 
 
 
 
182
 
183
  with gr.Row():
 
184
  clear_btn = gr.Button("Clear Chat")
185
- method_btn = gr.Button("Check Access Status")
186
 
187
- output_status = gr.Textbox(label="Status", visible=False)
 
188
 
189
- def respond(message, chat_history):
190
- response = smart_chat_handler(message, chat_history)
191
- chat_history.append((message, response))
192
- return "", chat_history
193
 
194
- def check_access():
195
- try:
196
- # Simple test to check if model is accessible
197
- test_client = InferenceClient(token=os.getenv("HF_TOKEN"))
198
- test_response = test_client.model_status("microsoft/Fara-7B")
199
- return "βœ… Fara-7B is accessible!"
200
- except Exception as e:
201
- return f"❌ Access issue: {str(e)}"
202
-
203
- msg.submit(respond, [msg, chatbot], [msg, chatbot])
204
- send_btn.click(respond, [msg, chatbot], [msg, chatbot])
205
- clear_btn.click(lambda: ([], ""), outputs=[chatbot, msg])
206
- method_btn.click(check_access, outputs=output_status)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207
 
208
  if __name__ == "__main__":
209
- # On HuggingFace Spaces, we don't need to specify server settings
210
- # The platform handles this automatically
211
  demo.launch()
 
1
  import gradio as gr
2
  from huggingface_hub import InferenceClient
3
  import os
4
+ from PIL import Image
5
+ import requests
6
+ from io import BytesIO
7
 
8
  # Initialize the Inference Client
9
  client = InferenceClient(token=os.getenv("HF_TOKEN"))
10
 
11
+ def create_demo_screenshot(task_type="general"):
12
  """
13
+ Create a simple placeholder screenshot for demo purposes
14
+ In actual use, this would be a real browser screenshot
15
+ """
16
+ # For now, return None - we'll use text-only mode
17
+ return None
18
+
19
+ def chat_with_fara(message, history, image=None):
20
+ """
21
+ Interact with Fara-7B using the vision-language model API
22
  """
23
  try:
24
+ # Build the proper message format for Fara-7B
25
+ system_prompt = """You are a web automation agent that performs actions on websites to fulfill user requests by calling various tools.
26
+ You should stop execution at Critical Points. A Critical Point occurs in tasks like:
27
+ - Checkout, Book, Purchase, Call, Email, Order
28
+
29
+ A Critical Point requires the user's permission or personal/sensitive information (name, email, credit card, address, payment information, resume, etc.) to complete a transaction (purchase, reservation, sign-up, etc.), or to communicate as a human would (call, email, apply to a job, etc.).
30
+
31
+ Guideline: Solve the task as far as possible up until a Critical Point.
32
+
33
+ Examples:
34
+ - If the task is to "call a restaurant to make a reservation," do not actually make the call. Instead, navigate to the restaurant's page and find the phone number.
35
+ - If the task is to "order new size 12 running shoes," do not place the order. Instead, search for the right shoes that meet the criteria and add them to the cart.
36
+
37
+ Some tasks, like answering questions, may not encounter a Critical Point at all."""
38
+
39
+ # Prepare messages in the format expected by Fara-7B
40
  messages = [
41
+ {"role": "system", "content": system_prompt}
 
42
  ]
43
 
44
+ # Add history
45
+ if history:
46
+ for h in history:
47
+ if h["role"] in ["user", "assistant"]:
48
+ messages.append(h)
49
+
50
+ # Add current message
51
+ user_content = []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
+ # Add image if provided
54
+ if image is not None:
55
+ user_content.append({"type": "image", "image": image})
 
 
 
 
56
 
57
+ # Add text
58
+ user_content.append({"type": "text", "text": message})
 
59
 
60
+ messages.append({
61
+ "role": "user",
62
+ "content": user_content if len(user_content) > 1 else message
63
+ })
64
 
65
+ # Try to use the Inference API
66
+ try:
67
+ response = client.chat_completion(
68
+ messages=messages,
69
+ model="microsoft/Fara-7B",
70
+ max_tokens=512,
71
+ temperature=0.7,
72
+ )
73
+
74
+ # Extract the response
75
+ if hasattr(response, 'choices') and len(response.choices) > 0:
76
+ return response.choices[0].message.content
77
+ else:
78
+ raise Exception("Unexpected response format")
79
+
80
+ except Exception as api_error:
81
+ error_str = str(api_error).lower()
82
+
83
+ # Check for specific errors
84
+ if "no api" in error_str or "not found" in error_str or "404" in error_str:
85
+ # Model doesn't have Inference API - provide helpful demo response
86
+ return generate_demo_response(message)
87
+ elif "401" in error_str or "unauthorized" in error_str:
88
+ return """❌ **Authentication Error**
89
+
90
+ Please check:
91
+ 1. Your `HF_TOKEN` is set in Space secrets
92
+ 2. You have requested access to [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
93
+ 3. Your token has read permissions
94
+
95
+ To use Fara-7B locally instead:
96
+ ```bash
97
+ git clone https://github.com/microsoft/fara.git
98
+ cd fara
99
+ pip install -e .
100
+ playwright install
101
+ vllm serve "microsoft/Fara-7B" --port 5000
102
+ ```
103
+ """
104
+ elif "403" in error_str or "forbidden" in error_str:
105
+ return """❌ **Access Forbidden**
106
+
107
+ You need to request access to the model:
108
+ 1. Visit: https://huggingface.co/microsoft/Fara-7B
109
+ 2. Click "Request access to this repository"
110
+ 3. Wait for Microsoft to approve your request
111
+
112
+ Once approved, make sure your `HF_TOKEN` is set in Space secrets.
113
+ """
114
+ else:
115
+ # Unknown error - try demo mode
116
+ return f"⚠️ API Error: {str(api_error)}\n\n**Demo Response:**\n\n" + generate_demo_response(message)
117
+
118
  except Exception as e:
119
+ return f"❌ Error: {str(e)}\n\nPlease check the Space logs for more details."
120
 
121
+ def generate_demo_response(message):
 
122
  """
123
+ Generate a helpful demo response when the API is not available
124
  """
 
 
 
 
 
 
 
125
  message_lower = message.lower()
126
+
127
+ # Shopping/E-commerce tasks
128
+ if any(word in message_lower for word in ['buy', 'shop', 'purchase', 'order', 'cart', 'shoes', 'product']):
129
+ return """πŸ›’ **Task: Shopping/Purchase**
130
 
131
+ **Action Plan:**
132
+ 1. πŸ” Navigate to e-commerce website
133
+ 2. πŸ”Ž Search for: [extracted product from your query]
134
+ 3. πŸ“‹ Apply filters: price, rating, availability
135
+ 4. βœ… Select best match
136
+ 5. βž• Add to cart
137
+ 6. πŸ›‘ **STOP** - Critical Point: Checkout requires payment info
138
+
139
+ **What I would do with a screenshot:**
140
+ - Identify search bar location
141
+ - Read product listings
142
+ - Click appropriate buttons
143
+ - Navigate to cart
144
+
145
+ **Next steps for you:**
146
+ - Review cart
147
+ - Complete checkout manually
148
+
149
+ πŸ’‘ *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
150
+ """
151
 
152
+ # Travel/booking tasks
153
+ elif any(word in message_lower for word in ['flight', 'hotel', 'travel', 'book', 'trip']):
154
+ return """✈️ **Task: Travel Booking**
155
+
156
+ **Action Plan:**
157
+ 1. 🌐 Navigate to travel site
158
+ 2. πŸ“… Enter dates and destination
159
+ 3. πŸ” Search options
160
+ 4. πŸ’° Sort by price/rating
161
+ 5. πŸ“Š Compare top results
162
+ 6. πŸ›‘ **STOP** - Critical Point: Booking requires personal info
163
+
164
+ **What I would do with a screenshot:**
165
+ - Find date pickers
166
+ - Enter search criteria
167
+ - Click search button
168
+ - Read results table
169
+
170
+ **Next steps for you:**
171
+ - Review options
172
+ - Complete booking manually
173
+
174
+ πŸ’‘ *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
175
+ """
176
 
177
+ # Restaurant tasks
178
+ elif any(word in message_lower for word in ['restaurant', 'food', 'dining', 'reservation']):
179
+ return """🍽️ **Task: Restaurant Search**
180
+
181
+ **Action Plan:**
182
+ 1. πŸ”Ž Search for restaurants
183
+ 2. πŸ“ Filter by location and cuisine
184
+ 3. ⭐ Check ratings and reviews
185
+ 4. πŸ“ž Find contact info
186
+ 5. πŸ›‘ **STOP** - Critical Point: Reservation requires personal info
187
+
188
+ **What I would do with a screenshot:**
189
+ - Identify search results
190
+ - Read restaurant details
191
+ - Extract phone number
192
+ - Locate reservation link
193
+
194
+ **Next steps for you:**
195
+ - Call or book reservation manually
196
+
197
+ πŸ’‘ *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
198
+ """
199
+
200
+ # Government/grants (your specific use case!)
201
+ elif any(word in message_lower for word in ['grant', 'funding', 'government', 'nsw', 'healthcare']):
202
+ return """πŸ›οΈ **Task: Government Grants Research**
203
+
204
+ **Action Plan:**
205
+ 1. 🌐 Navigate to government grants portal
206
+ 2. πŸ”Ž Use search functionality
207
+ 3. πŸ“‹ Filter by: healthcare, eligibility, deadline
208
+ 4. πŸ“Š Extract grant information
209
+ 5. βœ… **COMPLETE** - No Critical Point
210
+
211
+ **What I would do with a screenshot:**
212
+ - Locate search bar
213
+ - Read grant listings
214
+ - Extract key details:
215
+ - Grant title
216
+ - Funding amount
217
+ - Eligibility criteria
218
+ - Application deadline
219
+ - Contact information
220
+
221
+ **Example output:**
222
+ ```
223
+ Grant: Healthcare Innovation Fund
224
+ Amount: $50,000 - $500,000
225
+ Eligibility: Registered healthcare providers
226
+ Deadline: March 31, 2024
227
+ Link: [grant URL]
228
+ ```
229
+
230
+ πŸ’‘ *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM.*
231
+ """
232
+
233
+ # General response
234
+ else:
235
+ return """πŸ€– **Fara-7B Web Automation Agent**
236
+
237
+ I help with web automation tasks! I can:
238
+
239
+ βœ… Shopping & e-commerce
240
+ βœ… Travel & booking
241
+ βœ… Restaurant search
242
+ βœ… Information extraction
243
+ βœ… Government portals & grants
244
+ βœ… Account navigation
245
+
246
+ **How I work:**
247
+ 1. πŸ“Έ Analyze browser screenshot (when provided)
248
+ 2. 🎯 Understand your goal
249
+ 3. πŸ“ Plan step-by-step actions
250
+ 4. πŸ”§ Use browser tools (click, type, scroll)
251
+ 5. πŸ›‘ Stop at Critical Points (checkout, personal info)
252
+
253
+ **Example tasks:**
254
+ - "Find running shoes under $100"
255
+ - "Search for flights to Tokyo"
256
+ - "Find healthcare grants on the NSW government website"
257
+ - "Look up Italian restaurants in Seattle"
258
+
259
+ **To use with screenshots:**
260
+ Upload a browser screenshot and describe your task!
261
+
262
+ πŸ’‘ *Note: The Inference API may not be available for this model. For full functionality, host locally with vLLM:*
263
+ ```bash
264
+ vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
265
+ ```
266
+ """
267
 
268
  # Create the Gradio interface
269
+ with gr.Blocks(theme=gr.themes.Soft(), title="Fara-7B Chat") as demo:
270
  gr.Markdown(
271
  """
272
+ # πŸ€– Fara-7B Web Automation Agent
273
 
274
+ **Microsoft's specialized vision-language model for web automation**
275
 
276
+ Fara-7B can analyze browser screenshots and plan web automation tasks.
 
 
 
277
 
278
+ πŸ’‘ **How to use:**
279
+ - Upload a browser screenshot (optional)
280
+ - Describe your web automation task
281
+ - Fara-7B will plan the actions needed
282
+
283
+ ⚠️ **Note**: The Inference API may not be fully available for this model. For complete functionality including actual browser control, host locally with vLLM (see instructions below).
284
  """
285
  )
286
 
287
+ with gr.Accordion("πŸ“š About Fara-7B & Setup Instructions", open=False):
 
288
  gr.Markdown("""
289
+ ### What is Fara-7B?
290
+
291
+ Fara-7B is a 7B parameter vision-language model designed for computer use. It can:
292
+ - Understand browser screenshots
293
+ - Plan multi-step web automation tasks
294
+ - Use tools (click, type, scroll, etc.)
295
+ - Stop at "Critical Points" for safety
296
 
297
+ ### Using Transformers Library (Colab/Local)
298
+
299
+ ```python
300
+ from transformers import pipeline
301
+
302
+ pipe = pipeline("image-text-to-text", model="microsoft/Fara-7B")
303
+ messages = [
304
+ {
305
+ "role": "user",
306
+ "content": [
307
+ {"type": "image", "url": "screenshot.jpg"},
308
+ {"type": "text", "text": "Find running shoes"}
309
+ ]
310
+ },
311
+ ]
312
+ result = pipe(text=messages)
313
+ ```
314
+
315
+ ### Full Browser Automation (Local)
316
+
317
+ ```bash
318
+ # Clone repository
319
+ git clone https://github.com/microsoft/fara.git
320
+ cd fara
321
+
322
+ # Setup environment
323
+ python3 -m venv .venv
324
+ source .venv/bin/activate
325
+ pip install -e .
326
+ playwright install
327
+
328
+ # Host model
329
+ vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
330
+
331
+ # Run tasks
332
+ fara-cli --task "your task here"
333
+ ```
334
+
335
+ **Resources:**
336
+ - Model: https://huggingface.co/microsoft/Fara-7B
337
+ - GitHub: https://github.com/microsoft/fara
338
  """)
339
 
340
  chatbot = gr.Chatbot(
341
  height=500,
342
+ label="Chat",
343
+ show_label=True,
344
+ type="messages"
345
  )
346
 
347
  with gr.Row():
348
+ with gr.Column(scale=3):
349
+ msg = gr.Textbox(
350
+ label="Task Description",
351
+ placeholder="Example: Find healthcare grants on the NSW government website...",
352
+ lines=2
353
+ )
354
+ with gr.Column(scale=1):
355
+ image_input = gr.Image(
356
+ label="Browser Screenshot (Optional)",
357
+ type="pil",
358
+ height=100
359
+ )
360
 
361
  with gr.Row():
362
+ send_btn = gr.Button("Send", variant="primary")
363
  clear_btn = gr.Button("Clear Chat")
 
364
 
365
+ gr.Markdown("""
366
+ ### πŸ’‘ Tips for Best Results
367
 
368
+ - **With screenshot**: Upload a browser screenshot and describe what you want to accomplish
369
+ - **Without screenshot**: Describe the web task, and Fara-7B will plan the approach
370
+ - **Be specific**: Include details like website, search criteria, budget, etc.
371
+ - **Critical Points**: Fara-7B will stop before checkout, booking, or entering personal info
372
 
373
+ ### 🎯 Example Tasks
374
+
375
+ - "Find healthcare grants for digital health projects in Australia"
376
+ - "Search for running shoes under $100 on this e-commerce page"
377
+ - "Look up restaurants in Seattle with 4+ stars for Italian food"
378
+ - "Find the contact information on this website"
379
+ """)
380
+
381
+ def respond(message, image, chat_history):
382
+ if not message.strip():
383
+ return chat_history, None
384
+
385
+ # Add user message to history
386
+ user_msg = {"role": "user", "content": message}
387
+ chat_history.append(user_msg)
388
+
389
+ # Get response from Fara
390
+ response = chat_with_fara(message, chat_history, image)
391
+
392
+ # Add assistant response to history
393
+ assistant_msg = {"role": "assistant", "content": response}
394
+ chat_history.append(assistant_msg)
395
+
396
+ return chat_history, None
397
+
398
+ def clear_chat():
399
+ return [], None
400
+
401
+ msg.submit(respond, [msg, image_input, chatbot], [chatbot, image_input]).then(
402
+ lambda: ("", None), None, [msg, image_input]
403
+ )
404
+ send_btn.click(respond, [msg, image_input, chatbot], [chatbot, image_input]).then(
405
+ lambda: ("", None), None, [msg, image_input]
406
+ )
407
+ clear_btn.click(clear_chat, outputs=[chatbot, image_input])
408
 
409
  if __name__ == "__main__":
 
 
410
  demo.launch()
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  gradio==5.0.2
2
- huggingface-hub==0.26.2
 
 
1
  gradio==5.0.2
2
+ huggingface-hub==0.26.2
3
+ Pillow