Spaces:

Liorlsa9
/

market-analysis

Runtime error

App Files Files Community

Liorlsa9 commited on May 10, 2025

Commit

a5be4c1

1 Parent(s): ee97e75

added files methods and cleanup

Browse files

Files changed (4) hide show

.gitignore +1 -0
README.md +44 -1
app.py +46 -75
requirements.txt +6 -1

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .env

README.md CHANGED Viewed

@@ -10,4 +10,47 @@ pinned: false
 license: mit
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

 license: mit
 ---
+# Market Analysis Tool (Hugging Face Spaces)
+This app provides competitive intelligence for small businesses using Gradio and OpenAI. It finds competitors in a given city and business category, scrapes their websites, and provides actionable business improvement suggestions.
+## Setup Instructions
+1. **Clone or upload this repository to Hugging Face Spaces.**
+2. **Create a `.env` file at the project root with your API keys:**
+```
+OPENAI_API_KEY=your_openai_api_key_here
+GEO_API_KEY=your_geoapify_api_key_here
+```
+(You can copy `.env.example` as a template.)
+3. **Install dependencies:**
+Hugging Face Spaces will automatically install from `requirements.txt`. If running locally:
+```
+pip install -r requirements.txt
+```
+4. **Run the app:**
+```
+python app.py
+```
+or, on Hugging Face Spaces, it will launch automatically.
+## Usage
+- Enter your business name and city (currently supports Netivot, Israel).
+- The app will find competitors, analyze their websites, and suggest improvements for your business.
+## Environment Variables
+- `OPENAI_API_KEY`: Your OpenAI API key
+- `GEO_API_KEY`: Your Geoapify API key
+## License
+MIT

app.py CHANGED Viewed

@@ -1,33 +1,13 @@
-# -*- coding: utf-8 -*-
-"""synthetic_data_generator.ipynb
-Automatically generated by Colab.
-Original file is located at
-    https://colab.research.google.com/drive/1Favva8SJYH_uFh8AuoVhRZnmyjJrTP8c
-# Week 3 project - Create dataset about competitors
-# Brief - a research tool for businesses about their client in their area
-The tool will:
-1.   Find businesses across the same location using google maps.
-2.   Compare business plans and services
-3.   advise and help to client to imporve their bussiness accroding to the   their competitors
-# imports and installations
-"""
-!pip install bs4 openai google-api-python-client gradio
-"""**Define categories**"""
 from openai import OpenAI
-from google.colab import userdata
 import json
 import gradio as gr
 categories = """
 `accommodation`
@@ -748,32 +728,31 @@ categories = """
 """
-openai_key = userdata.get('OPENAI_API_KEY')
-geo_api_key = userdata.get('GEO_API_KEY')
 import requests
 from requests.structures import CaseInsensitiveDict
-def get_competitors_data(category="commercial",limit=50,place_id="51d8aaf091586a414059288705ad76154040f00102f9015f13990300000000c002089203084865727a6c697961"):
-  print(f"get_competitors_data: category-{category} place_id={place_id}")
-  url = f"https://api.geoapify.com/v2/places?categories={category}&filter=place:{place_id}&limit={limit}&apiKey={geo_api_key}"
-  response = requests.get(url)
-  result = response.json()
-  websites = []
-  print(f"result: {result}")
-  print(result.get("features"))
-  for item in result["features"]:
-    if "website" in item["properties"] and item["properties"]["website"]:
-      websites.append(item["properties"]["website"])
-  return websites
-def get_place_id(city):
-  print(f"get_place_id city: {city}")
-  url = f"https://api.geoapify.com/v1/geocode/search?text={city}&filter=countrycode:il&apiKey={geo_api_key}"
-  response = requests.get(url)
-  place_id = response.json()['features'][0].get("properties")['place_id']
-  return place_id
 import re
 from urllib.parse import urlparse
@@ -812,14 +791,14 @@ from bs4 import BeautifulSoup
 from urllib.parse import urlparse, urljoin
 def extract_data(websites):
-  print(f"extract_data: {websites}")
-  websites_data = []
-  for website in websites:
-    if is_business_website(website):
-      homepage = get_homepage_url(website)
-      data = extract_and_clean_website_data(homepage, base_url=None)
-      websites_data = {"url":website, "data":data}
-  return websites_data
 def extract_and_clean_website_data(url, base_url=None):
@@ -852,7 +831,6 @@ def extract_and_clean_website_data(url, base_url=None):
     if base_url is None:
         base_url = urlparse(url).netloc
-        print(base_url)
         if not base_url.startswith("http"):
             base_url = f"{urlparse(url).scheme}://{base_url}"
@@ -878,24 +856,14 @@ You are a market analysis agent specializing in competitive intelligence for sma
  Here is a comprehensive list of supported categories from the Geoapify API. When calling the tool,
  choose the most appropriate category that best describes the user's business to find relevant competitors in their area.
-**Geoapify API Supported Categories:**
 # %s
-**Example Usage:**
-If the user's business is a "pizza place," you would use the category `catering.restaurant.pizza,catering.restaurant.italian,catering.restaurant` (use several categories to find more businesses) with the `get_competitors_data` tool. If it's a "clothing store for women," you would use `commercial.clothing.women`.
 Remember to choose the most specific and relevant category for the user's business to get the most accurate competitor data. If you are unsure, you can ask the user for clarification on their business type.
-```
-<tools>
-'get_place_id': This tool retrieves a unique place identifier based on a provided city name. The 'city' parameter should be a string representing the target city (e.g., "Netivot"). The output is a string representing the place ID (e.g., "ChIJD6pJnvN9AhURN9WyDAkoA_Y" for Netivot).
-'get_competitors_data': This tool identifies and retrieves relevant data for competitors within the specified geographical area (obtained using 'get_place_id') and business category. The business category should be inferred from the user's business name. This tool will utilize a Geoapify API category to search for competitors. The output is a list of dictionaries, where each dictionary contains competitor information, including their website URL (e.g., [{"website": "https://competitor1.com", "location": {...}}, {"website": "https://competitor2.com", "location": {...}}]).
-'extract_data': This tool scrapes and extracts textual content from a list of competitor websites provided as input (a list of URLs from the 'get_competitors_data' output). The output is a list of dictionaries, where each dictionary contains the original URL and the extracted data from that website (e.g., [{"url": "https://competitor1.com", "data": "Extracted content from competitor 1's website."}, {"url": "https://competitor2.com", "data": "Extracted content from competitor 2's website."}]).
-</tools>
 Your workflow should be as follows:
@@ -906,7 +874,7 @@ Your workflow should be as follows:
 5.  Call the 'extract_data' tool with the list of competitor website URLs obtained in step 4 to scrape and extract content from each website.
 6.  Analyze the extracted data from the competitor websites to identify their strengths, weaknesses, offerings, and strategies.
 7.  Based on your analysis of the competitive landscape and the user's presumed business, generate a concise, actionable list of major improvements the client can implement to enhance their business and attract more customers. Ensure these recommendations are strategic and directly address potential areas for competitive advantage.
-```""" % (categories)
 # Define the function as a tool for the Assistant
 get_place_id_tool = {
@@ -981,6 +949,8 @@ extract_data_tool = {
 tools = [get_place_id_tool, get_competitors_data_tool, extract_data_tool]
 def message_to_gpt(message, history):
     messages = [{"role": "system", "content": system_message}]
     # Build the message history
@@ -1010,12 +980,16 @@ def message_to_gpt(message, history):
             print(f"Unexpected finish reason: {response.choices[0].finish_reason}")
             break  # Or handle differently based on your needs
     # Return the assistant's final response content
     return response.choices[0].message.content
 def handle_tool_call(message):
     tool_call = message.tool_calls[0]
-    print(f"Inside handle_tool_call with this tool: {tool_call.function.name}")
     arguments = json.loads(tool_call.function.arguments)
     if tool_call.function.name == "extract_data":
@@ -1030,7 +1004,6 @@ def handle_tool_call(message):
     elif tool_call.function.name == "get_place_id":
         city = arguments.get("city")
         tool_result = get_place_id(city)
-        print(f"tool_result: {tool_result}")
         response = {
             "role": "tool",
             "content": json.dumps({"place_id": tool_result}),  # Return place_id as a JSON object
@@ -1058,8 +1031,6 @@ def handle_tool_call(message):
     return response
 if __name__ == "__main__":
-gr.ChatInterface(fn=message_to_gpt, type="messages").launch(debug=True)

+# This app is ready for Hugging Face Spaces. Environment variables are loaded from a .env file.
+# Usage: Set OPENAI_API_KEY and GEO_API_KEY in your environment or in a .env file at the project root.
 from openai import OpenAI
 import json
 import gradio as gr
+import os
+from dotenv import load_dotenv
+load_dotenv()
 categories = """
 `accommodation`
 """
+openai_key = os.environ.get('OPENAI_API_KEY')
+geo_api_key = os.environ.get('GEO_API_KEY')
 import requests
 from requests.structures import CaseInsensitiveDict
+chain_of_thought = []
+def get_competitors_data(category="commercial", limit=50, place_id="51d8aaf091586a414059288705ad76154040f00102f9015f13990300000000c002089203084865727a6c697961"):
+    chain_of_thought.append(f"Calling get_competitors_data with category='{category}', limit={limit}, place_id='{place_id}'")
+    url = f"https://api.geoapify.com/v2/places?categories={category}&filter=place:{place_id}&limit={limit}&apiKey={geo_api_key}"
+    response = requests.get(url)
+    result = response.json()
+    websites = []
+    for item in result["features"]:
+        if "website" in item["properties"] and item["properties"]["website"]:
+            websites.append(item["properties"]["website"])
+    return websites
+def get_place_id(city):
+    chain_of_thought.append(f"Calling get_place_id with city='{city}'")
+    url = f"https://api.geoapify.com/v1/geocode/search?text={city}&filter=countrycode:il&apiKey={geo_api_key}"
+    response = requests.get(url)
+    place_id = response.json()['features'][0].get("properties")['place_id']
+    return place_id
 import re
 from urllib.parse import urlparse
 from urllib.parse import urlparse, urljoin
 def extract_data(websites):
+    chain_of_thought.append(f"Calling extract_data for {len(websites)} websites")
+    websites_data = []
+    for website in websites:
+        if is_business_website(website):
+            homepage = get_homepage_url(website)
+            data = extract_and_clean_website_data(homepage, base_url=None)
+            websites_data = {"url": website, "data": data}
+    return websites_data
 def extract_and_clean_website_data(url, base_url=None):
     if base_url is None:
         base_url = urlparse(url).netloc
         if not base_url.startswith("http"):
             base_url = f"{urlparse(url).scheme}://{base_url}"
  Here is a comprehensive list of supported categories from the Geoapify API. When calling the tool,
  choose the most appropriate category that best describes the user's business to find relevant competitors in their area.
+Geoapify API Supported Categories:
 # %s
+Example Usage:
+If the user's business is a "pizza place," you would use the category catering.restaurant.pizza,catering.restaurant.italian,catering.restaurant.italian (use several categories to find more businesses) with the get_competitors_data tool. If it's a "clothing store for women," you would use commercial.clothing.women.
 Remember to choose the most specific and relevant category for the user's business to get the most accurate competitor data. If you are unsure, you can ask the user for clarification on their business type.
 Your workflow should be as follows:
 5.  Call the 'extract_data' tool with the list of competitor website URLs obtained in step 4 to scrape and extract content from each website.
 6.  Analyze the extracted data from the competitor websites to identify their strengths, weaknesses, offerings, and strategies.
 7.  Based on your analysis of the competitive landscape and the user's presumed business, generate a concise, actionable list of major improvements the client can implement to enhance their business and attract more customers. Ensure these recommendations are strategic and directly address potential areas for competitive advantage.
+""" % (categories)
 # Define the function as a tool for the Assistant
 get_place_id_tool = {
 tools = [get_place_id_tool, get_competitors_data_tool, extract_data_tool]
 def message_to_gpt(message, history):
+    global chain_of_thought
+    chain_of_thought = []
     messages = [{"role": "system", "content": system_message}]
     # Build the message history
             print(f"Unexpected finish reason: {response.choices[0].finish_reason}")
             break  # Or handle differently based on your needs
+    # Print the chain of thought for debugging/inspection
+    print("Chain of Thought:")
+    for step in chain_of_thought:
+        print(step)
     # Return the assistant's final response content
     return response.choices[0].message.content
 def handle_tool_call(message):
     tool_call = message.tool_calls[0]
     arguments = json.loads(tool_call.function.arguments)
     if tool_call.function.name == "extract_data":
     elif tool_call.function.name == "get_place_id":
         city = arguments.get("city")
         tool_result = get_place_id(city)
         response = {
             "role": "tool",
             "content": json.dumps({"place_id": tool_result}),  # Return place_id as a JSON object
     return response
+# This app is ready for Hugging Face Spaces. Environment variables are loaded from a .env file.
 if __name__ == "__main__":
+    gr.ChatInterface(fn=message_to_gpt, type="messages").launch(debug=True)

requirements.txt CHANGED Viewed

	@@ -1 +1,6 @@
1	- huggingface_hub==0.25.2

+huggingface_hub==0.25.2
+gradio
+openai
+requests
+beautifulsoup4
+python-dotenv