TRUBETSKOY commited on
Commit
c2e2262
·
verified ·
1 Parent(s): 85f7dc7

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +92 -12
  2. amazon_scraper.py +332 -0
  3. mcp.json +11 -0
  4. pyproject.toml +9 -0
  5. requirements.txt +0 -0
  6. uv.lock +0 -0
README.md CHANGED
@@ -1,12 +1,92 @@
1
- ---
2
- title: Amazon Mcp Server
3
- emoji: 📈
4
- colorFrom: pink
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 5.47.2
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: amazon-mcp-server
3
+ app_file: amazon_scraper.py
4
+ sdk: gradio
5
+ sdk_version: 5.47.2
6
+ ---
7
+ # Amazon MCP Server
8
+
9
+ This is a Model Context Protocol (MCP) server for scraping Amazon products and searching for products on Amazon.
10
+
11
+ ## Setup
12
+
13
+ 1. **Clone the repository:**
14
+ ```bash
15
+ git clone https://github.com/r123singh/amazon-mcp-server.git
16
+ ```
17
+ 2. **Navigate to the project directory:**
18
+ ```bash
19
+ cd amazon-mcp-server
20
+ ```
21
+ 3. **Create a virtual environment:**
22
+ ```bash
23
+ python -m venv venv
24
+ ```
25
+ 4. **Activate the virtual environment:**
26
+ - On Linux/macOS:
27
+ ```bash
28
+ source venv/bin/activate
29
+ ```
30
+ - On Windows:
31
+ ```bash
32
+ venv\Scripts\activate
33
+ ```
34
+ 5. **Install dependencies:**
35
+ ```bash
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ 6. **No API keys or tokens are required.**
40
+
41
+ 7. **Configure MCP JSON:**
42
+ Create a `mcp.json` file with:
43
+ ```json
44
+ {
45
+ "mcpServers": {
46
+ "amazon": {
47
+ "command": "{PATH_TO_DIRECTORY}\\amazon-mcp-server\\venv\\Scripts\\python.exe",
48
+ "args": [
49
+ "{PATH_TO_DIRECTORY}\\amazon-mcp-server\\server.py"
50
+ ]
51
+ }
52
+ }
53
+ }
54
+ ```
55
+ Replace `{PATH_TO_DIRECTORY}` with the absolute path to this directory (use `pwd` or `cd` to get the path).
56
+
57
+ ## Available Tools
58
+
59
+ The server provides the following tools for interacting with Amazon:
60
+
61
+ - **Scrape a product:**
62
+ `scrape_product(product_url)`
63
+ Scrape product details (name, price, image, rating, reviews, availability, description) from a given Amazon product URL.
64
+
65
+ - **Search for products:**
66
+ `search_products(query, max_results)`
67
+ Search for products on Amazon by keyword and return a list of results.
68
+
69
+ ## Usage
70
+
71
+ Once configured, the MCP server can be started using the standard MCP client configuration. The server provides a natural language interface to interact with Amazon through the available tools.
72
+
73
+ **Example usage:**
74
+ - "Get details for this Amazon product: [product URL]"
75
+ - "Search Amazon for 'wireless headphones', show top 3 results"
76
+
77
+ ## Notes
78
+
79
+ - No API key or authentication is required.
80
+ - The server scrapes publicly available Amazon product and search pages.
81
+ - For best results, use valid Amazon product URLs and clear search queries.
82
+
83
+ ## Contributing
84
+
85
+ Contributions are welcome! Please open an issue or submit a pull request.
86
+
87
+ ## License
88
+
89
+ This project is licensed under the MIT License. See the LICENSE file for details.
90
+
91
+
92
+
amazon_scraper.py ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # amazon_scraper.py
2
+ # This is an Amazon products scraper compatible with Gradio.
3
+ # It can be run as a standalone Gradio app or its functions can be loaded as tools.
4
+
5
+ import httpx
6
+ import re
7
+ import gradio as gr
8
+ from bs4 import BeautifulSoup
9
+ from urllib.parse import urlparse
10
+ from typing import List, Dict
11
+
12
+ # --- Helper Functions for Web Scraping ---
13
+
14
+ async def fetch_amazon_page(url: str) -> str:
15
+ """Helper function to fetch Amazon product page"""
16
+ headers = {
17
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
18
+ 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
19
+ 'Accept-Language': 'en-US,en;q=0.5',
20
+ 'Accept-Encoding': 'gzip, deflate',
21
+ 'Connection': 'keep-alive',
22
+ 'Upgrade-Insecure-Requests': '1',
23
+ }
24
+
25
+ async with httpx.AsyncClient() as client:
26
+ response = await client.get(url, headers=headers, timeout=15.0)
27
+ response.raise_for_status()
28
+ return response.text
29
+
30
+ def clean_price(price_text: str) -> str:
31
+ """
32
+ Cleans and extracts the numerical price from a string.
33
+ """
34
+ if not price_text:
35
+ return "Price not available"
36
+ # Find the first occurrence of a currency symbol followed by numbers
37
+ match = re.search(r'([\$\£\€]?\d[\d,.]*)', price_text)
38
+ if match:
39
+ return match.group(1)
40
+ return "Price not available"
41
+ def extract_product_data(html_content: str, url: str) -> dict:
42
+ """Extract product information from Amazon page HTML"""
43
+ soup = BeautifulSoup(html_content, 'html.parser')
44
+
45
+ # Initialize product data
46
+ product_data = {
47
+ 'name': 'Product name not found',
48
+ 'price': 'Price not available',
49
+ 'image_url': 'Image not found',
50
+ 'rating': 'Rating not available',
51
+ 'reviews_count': 'Reviews not available',
52
+ 'availability': 'Availability not found',
53
+ 'description': 'Description not available',
54
+ 'url': url
55
+ }
56
+
57
+ try:
58
+ # Extract product name
59
+ name_selectors = [
60
+ '#productTitle',
61
+ 'h1.a-size-large',
62
+ '.a-size-large.product-title-word-break',
63
+ 'h1[data-automation-id="product-title"]'
64
+ ]
65
+
66
+ for selector in name_selectors:
67
+ name_elem = soup.select_one(selector)
68
+ if name_elem:
69
+ product_data['name'] = name_elem.get_text().strip()
70
+ break
71
+
72
+ # Extract price
73
+ price_selectors = [
74
+ '.a-price-whole',
75
+ '.a-price .a-offscreen',
76
+ '.a-price-range .a-price-range-min .a-offscreen',
77
+ '.a-price .a-price-symbol + span',
78
+ '[data-a-color="price"] .a-offscreen'
79
+ ]
80
+
81
+ for selector in price_selectors:
82
+ price_elem = soup.select_one(selector)
83
+ if price_elem:
84
+ product_data['price'] = clean_price(price_elem.get_text())
85
+ break
86
+
87
+ # Extract image URL
88
+ image_selectors = [
89
+ '#landingImage',
90
+ '#imgBlkFront',
91
+ '.a-dynamic-image',
92
+ '[data-old-hires]'
93
+ ]
94
+
95
+ for selector in image_selectors:
96
+ img_elem = soup.select_one(selector)
97
+ if img_elem:
98
+ img_url = img_elem.get('src') or img_elem.get('data-old-hires')
99
+ if img_url:
100
+ if img_url.startswith('//'):
101
+ img_url = 'https:' + img_url
102
+ product_data['image_url'] = img_url
103
+ break
104
+
105
+ # Extract rating
106
+ rating_selectors = [
107
+ '.a-icon-alt',
108
+ '[data-hook="rating-out-of-text"]',
109
+ '.a-icon-star-small .a-icon-alt'
110
+ ]
111
+
112
+ for selector in rating_selectors:
113
+ rating_elem = soup.select_one(selector)
114
+ if rating_elem:
115
+ rating_text = rating_elem.get_text()
116
+ rating_match = re.search(r'(\d+\.?\d*)', rating_text)
117
+ if rating_match:
118
+ product_data['rating'] = f"{rating_match.group(1)} out of 5"
119
+ break
120
+
121
+ # Extract reviews count
122
+ reviews_selectors = [
123
+ '#acrCustomerReviewText',
124
+ '[data-hook="total-review-count"]',
125
+ '.a-size-base.s-underline-text'
126
+ ]
127
+
128
+ for selector in reviews_selectors:
129
+ reviews_elem = soup.select_one(selector)
130
+ if reviews_elem:
131
+ reviews_text = reviews_elem.get_text()
132
+ reviews_match = re.search(r'(\d+(?:,\d+)*)', reviews_text)
133
+ if reviews_match:
134
+ product_data['reviews_count'] = f"{reviews_match.group(1)} reviews"
135
+ break
136
+
137
+ # Extract availability
138
+ availability_selectors = [
139
+ '#availability .a-size-medium',
140
+ '#availability span',
141
+ '.a-size-medium.a-color-success'
142
+ ]
143
+
144
+ for selector in availability_selectors:
145
+ avail_elem = soup.select_one(selector)
146
+ if avail_elem:
147
+ product_data['availability'] = avail_elem.get_text().strip()
148
+ break
149
+
150
+ # Extract description
151
+ desc_selectors = [
152
+ '#productDescription p',
153
+ '#feature-bullets .a-list-item',
154
+ '.a-expander-content p'
155
+ ]
156
+
157
+ for selector in desc_selectors:
158
+ desc_elem = soup.select_one(selector)
159
+ if desc_elem:
160
+ product_data['description'] = desc_elem.get_text().strip()
161
+ break
162
+
163
+ except Exception as e:
164
+ product_data['error'] = f"Error parsing product data: {str(e)}"
165
+
166
+ return product_data
167
+
168
+ def extract_search_results(html_content: str, max_results: int) -> list:
169
+ """Extract product information from Amazon search results"""
170
+ soup = BeautifulSoup(html_content, 'html.parser')
171
+ products = []
172
+
173
+ # Find product containers
174
+ product_containers = soup.select('[data-component-type="s-search-result"]')
175
+
176
+ for container in product_containers[:max_results]:
177
+ try:
178
+ product = {
179
+ 'name': 'Product name not found',
180
+ 'price': 'Price not available',
181
+ 'image_url': 'Image not found',
182
+ 'rating': 'Rating not available',
183
+ 'url': 'URL not found'
184
+ }
185
+
186
+ # Extract product name
187
+ name_elem = container.select_one('a h2 span')
188
+ if name_elem:
189
+ product['name'] = name_elem.get_text().strip()
190
+
191
+ # Extract product URL
192
+ url_elem = container.select_one('a')
193
+ if url_elem:
194
+ product_url = url_elem.get('href')
195
+ if product_url:
196
+ if product_url.startswith('/'):
197
+ product_url = 'https://www.amazon.com' + product_url
198
+ product['url'] = product_url
199
+
200
+ # Extract price
201
+ price_elem = container.select_one('.a-price-whole')
202
+ if price_elem:
203
+ product['price'] = clean_price(price_elem.get_text())
204
+
205
+ # Extract image
206
+ img_elem = container.select_one('img.s-image')
207
+ if img_elem:
208
+ img_url = img_elem.get('src')
209
+ if img_url:
210
+ product['image_url'] = img_url
211
+
212
+ # Extract rating
213
+ rating_elem = container.select_one('.a-icon-alt')
214
+ if rating_elem:
215
+ rating_text = rating_elem.get_text()
216
+ rating_match = re.search(r'(\d+\.?\d*)', rating_text)
217
+ if rating_match:
218
+ product['rating'] = f"{rating_match.group(1)} out of 5"
219
+
220
+ products.append(product)
221
+
222
+ except Exception as e:
223
+ print(f"Error extracting product data: {str(e)}")
224
+
225
+ return products
226
+
227
+ # --- Formatting Functions for Display ---
228
+
229
+ def format_product_details(product: dict) -> str:
230
+ """Formats a single product's details into a Markdown string."""
231
+ return (
232
+ f"## {product.get('name', 'N/A')}\n"
233
+ f"**Price:** {product.get('price', 'N/A')}\n\n"
234
+ f"![Product Image]({product.get('image_url', '')})\n\n"
235
+ f"**URL:** {product.get('url', 'N/A')}"
236
+ )
237
+
238
+ def format_search_results(products: list, query: str) -> str:
239
+ """Formats a list of search results into a single Markdown string."""
240
+ if not products:
241
+ return f"No products found for '{query}'."
242
+
243
+ result = f"# Search Results for '{query}'\n\n---\n\n"
244
+ for product in products:
245
+ result += (
246
+ f"### {product.get('name', 'N/A')}\n"
247
+ f"**Price:** {product.get('price', 'N/A')}\n"
248
+ f"**URL:** <{product.get('url', 'N/A')}>\n\n---\n\n"
249
+ )
250
+ return result
251
+
252
+ # --- Gradio Tool Functions ---
253
+
254
+ async def scrape_product(product_url: str) -> str:
255
+ """
256
+ Scrapes product information from a single Amazon product URL.
257
+
258
+ Args:
259
+ product_url: The full URL of the Amazon product page.
260
+
261
+ Returns:
262
+ A Markdown formatted string with the product's name, price, image, and URL.
263
+ """
264
+ try:
265
+ parsed_url = urlparse(product_url)
266
+ if 'amazon' not in parsed_url.netloc:
267
+ return "Error: Please provide a valid Amazon product URL."
268
+
269
+ html_content = await fetch_amazon_page(product_url)
270
+ product_data = extract_product_data(html_content, product_url)
271
+ return format_product_details(product_data)
272
+
273
+ except httpx.HTTPStatusError as e:
274
+ return f"HTTP Error: {e.response.status_code}. Amazon may have blocked the request."
275
+ except Exception as e:
276
+ return f"An error occurred: {str(e)}"
277
+
278
+ async def search_products(query: str, max_results: int = 5) -> str:
279
+ """
280
+ Searches for products on Amazon and returns a list of results.
281
+
282
+ Args:
283
+ query: The search term (e.g., "laptop stand").
284
+ max_results: The maximum number of results to return.
285
+
286
+ Returns:
287
+ A Markdown formatted string with the search results.
288
+ """
289
+ try:
290
+ search_url = f"https://www.amazon.com/s?k={query.replace(' ', '+')}"
291
+ html_content = await fetch_amazon_page(search_url)
292
+ products = extract_search_results(html_content, max_results)
293
+ return format_search_results(products, query)
294
+
295
+ except Exception as e:
296
+ return f"An error occurred during search: {str(e)}"
297
+
298
+ # --- Gradio Interface (for standalone execution) ---
299
+
300
+ if __name__ == "__main__":
301
+ print("Starting Amazon Scraper Gradio App...")
302
+
303
+ with gr.Blocks(theme=gr.themes.Soft(), title="Amazon Scraper") as demo:
304
+ gr.Markdown("# 🤖 Amazon Product Scraper")
305
+ gr.Markdown("Use the tools below to search for products or scrape a specific product URL.")
306
+
307
+ with gr.Tabs():
308
+ with gr.TabItem("Search Products"):
309
+ with gr.Row():
310
+ search_query_input = gr.Textbox(label="Search Query", placeholder="e.g., mechanical keyboard")
311
+ max_results_input = gr.Number(label="Max Results", value=5, step=1, minimum=1, maximum=20)
312
+ search_button = gr.Button("Search", variant="primary")
313
+ search_output = gr.Markdown(label="Search Results")
314
+
315
+ with gr.TabItem("Scrape Product by URL"):
316
+ url_input = gr.Textbox(label="Amazon Product URL", placeholder="Paste a full Amazon URL here...")
317
+ scrape_button = gr.Button("Scrape", variant="primary")
318
+ scrape_output = gr.Markdown(label="Product Details")
319
+
320
+ search_button.click(
321
+ fn=search_products,
322
+ inputs=[search_query_input, max_results_input],
323
+ outputs=search_output
324
+ )
325
+
326
+ scrape_button.click(
327
+ fn=scrape_product,
328
+ inputs=[url_input],
329
+ outputs=scrape_output
330
+ )
331
+
332
+ demo.launch(mcp_server=True, share=True)
mcp.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "mcpServers": {
3
+ "trello": {
4
+ "command": "python",
5
+ "args": ["-m", "mcp.server.fastmcp", "server.py"],
6
+ "env": {
7
+ "PYTHONPATH": "."
8
+ }
9
+ }
10
+ }
11
+ }
pyproject.toml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "amazon-mcp-server"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.13"
7
+ dependencies = [
8
+ "gradio[mcp]>=5.47.2",
9
+ ]
requirements.txt ADDED
Binary file (2.55 kB). View file
 
uv.lock ADDED
The diff for this file is too large to render. See raw diff