Upload workflows/local_business_hunter.md with huggingface_hub
Browse files
workflows/local_business_hunter.md
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Workflow: Local Business Hunter (Ideal)
|
| 2 |
+
|
| 3 |
+
**Objective**: Automatically discover and research local leads for a specific niche and location with 0 budget.
|
| 4 |
+
|
| 5 |
+
## Required Inputs
|
| 6 |
+
- `NICHE`: The industry or business type (e.g., "Dentist", "Plumber", "HVAC").
|
| 7 |
+
- `LOCATION`: The city or area to target (e.g., "Austin, TX", "London, UK").
|
| 8 |
+
|
| 9 |
+
## Steps
|
| 10 |
+
|
| 11 |
+
1. **Discovery (google_maps_scraper.py)**
|
| 12 |
+
- Launch a stealth browser using Playwright.
|
| 13 |
+
- Search Google Maps for `NICHE` in `LOCATION`.
|
| 14 |
+
- Scroll through results to find up to 50 businesses.
|
| 15 |
+
- Extract: **Name**, **Website**, **Phone**, **Rating**, and **Review Count**.
|
| 16 |
+
- Save to `.tmp/raw_leads.json`.
|
| 17 |
+
|
| 18 |
+
2. **Enrichment (contact_enricher.py)**
|
| 19 |
+
- Read `.tmp/raw_leads.json`.
|
| 20 |
+
- For each business with a website:
|
| 21 |
+
- Visit the website using `httpx` and `BeautifulSoup`.
|
| 22 |
+
- Scrape the homepage and "Contact/About" pages for:
|
| 23 |
+
- **Emails** (using regex).
|
| 24 |
+
- **Facebook URL**.
|
| 25 |
+
- **Instagram URL**.
|
| 26 |
+
- **LinkedIn URL**.
|
| 27 |
+
- Filter out duplicate or junk entries.
|
| 28 |
+
|
| 29 |
+
3. **Synthesis**
|
| 30 |
+
- Combine the scraped data from Google Maps with the enriched contact info.
|
| 31 |
+
- Save the final dataset as `leads_NICHE_LOCATION.csv`.
|
| 32 |
+
|
| 33 |
+
## Edge Cases
|
| 34 |
+
- **No Website**: If a lead has no website, highlight it as a "High Opportunity" for web design services but skip enrichment.
|
| 35 |
+
- **CAPTCHA**: If blocked, the agent should alert the user to check the IP/proxy or wait.
|
| 36 |
+
- **Dead Links**: Sites that return 404 or connection errors should be marked as "Inactive".
|