Saad4web's picture
Here’s a ready-to-use β€œmeta-prompt” you can feed into your AI agent to kick off the local build of your Flashscore scraper: You are a Senior JavaScript Automation Engineer. Your task is to scaffold and implement, step by step, a local Flashscore data-scraping tool in Node.js, using Playwright (or Puppeteer) and Cheerio. Follow these requirements exactly: 1. **Project Initialization** - Create a new npm project (`npm init -y`). - Install dependencies: ```bash npm install playwright cheerio axios dotenv fs-extra node-cron ``` 2. **File Structure** Build this directory tree: flashscore-scraper/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ scrapers/ β”‚ β”‚ β”œβ”€β”€ base-scraper.js # launches browser, handles sessions, stealth β”‚ β”‚ β”œβ”€β”€ match-summary.js # extracts match info & events β”‚ β”‚ └── lineups.js # extracts formations & lineups β”‚ β”œβ”€β”€ utils/ β”‚ β”‚ β”œβ”€β”€ browser-manager.js # singleton browser/context manager β”‚ β”‚ β”œβ”€β”€ data-processor.js # cleans & normalizes scraped data β”‚ β”‚ └── proxy-manager.js # rotates proxies & delays β”‚ β”œβ”€β”€ models/ β”‚ β”‚ β”œβ”€β”€ match-data.js # JS class/schema for match summary β”‚ β”‚ └── team-data.js # JS class/schema for lineup data β”‚ └── index.js # CLI entrypoint & cron scheduler β”œβ”€β”€ config/ β”‚ └── settings.js # base URL, selectors, proxy list, cron schedule β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ matches/ # JSON output files β”‚ └── cache/ # temporary HTML snapshots └── package.json 3. **Stealth & Throttling** In `base-scraper.js`, implement: - Realistic `User-Agent`, random delays (2–8 s) between actions. - Puppeteer extra stealth plugin or Playwright stealth options. - Proxy rotation every 50 requests. - Block images & ads via request interception. 4. **Scraper Modules** - **match-summary.js**: Navigate to a match URL, wait for `.match-summary` selector, scrape: - Teams, final score, date & time, half-time score. - Events array: goals (scorer/time/assist), cards, substitutions, injuries. - **lineups.js**: Navigate to `/lineups`, wait for lineup container, scrape: - Starting XI, substitutes, coaching staff, formation map. 5. **Data Models & Processing** - Define `MatchData` and `TeamData` classes with clear fields. - In `data-processor.js`, normalize time stamps, convert date strings to ISO, validate numeric scores. 6. **Scheduling & CLI** - In `index.js`, read a match URL from CLI or `.env`. - Schedule daily runs via `node-cron` (configurable cron expression). - Save JSON to `data/matches/<matchId>.json`. 7. **Error Handling & Logging** - Retry up to 3 times on network or selector errors with exponential backoff. - Log successes and failures to a rotating log file in `data/logs/`. 8. **Next Steps (after MVP)** - Add an Express API wrapper (`/api/match/:id`). - Build a simple dashboard to visualize scraped stats. - Integrate a caching layer (Redis or file-based) for repeated queries. Please generate all boilerplate code accordingly, with comments explaining each major section. Start by creating `src/utils/browser-manager.js` and `src/scrapers/base-scraper.js`. Proceed one module at a time, and after each file, run a quick example invocation to verify connectivity to Flashscore.com. - Initial Deployment
f061186 verified