File size: 16,146 Bytes
f881423
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
import React, { useState } from 'react';
import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import {
  Book,
  Search,
  ExternalLink,
  Home,
  Cpu,
  Plug,
  Database,
  Terminal,
} from 'lucide-react';
import { classNames } from '@/utils/helpers';

interface DocsPageProps {
  className?: string;
}

interface DocSection {
  id: string;
  title: string;
  icon: React.ElementType;
  content: string;
}

// Documentation content
const userGuideContent = `
# ScrapeRL Documentation

Welcome to ScrapeRL - an advanced Reinforcement Learning-powered web scraping environment.

---

## Getting Started

### What is ScrapeRL?

ScrapeRL is an intelligent web scraping system that uses Reinforcement Learning (RL) to learn and adapt scraping strategies. Unlike traditional scrapers, ScrapeRL can:

- **Learn from experience** - Improve scraping strategies over time
- **Adapt to changes** - Handle website structure changes automatically
- **Multi-agent coordination** - Use specialized agents for different tasks
- **Memory-enhanced** - Remember patterns and optimize future runs

### Quick Start

1. **Enter a Target URL** - Provide the webpage you want to scrape
2. **Write an Instruction** - Describe what data you want to extract
3. **Configure Options** - Select model, agents, and plugins
4. **Start Episode** - Click Start and watch the magic happen!

### Example Task

\`\`\`
URL: https://example.com/products
Instruction: Extract all product names, prices, and descriptions
Task Type: Medium
\`\`\`

---

## Dashboard Overview

The dashboard is your command center for monitoring and controlling scraping operations.

### Layout Structure

| Section | Description |
|---------|-------------|
| **Input Bar** | Enter URL, instruction, and configure task |
| **Left Sidebar** | View active agents, MCPs, skills, and tools |
| **Center Area** | Main visualization and current observation |
| **Right Sidebar** | Memory stats, extracted data, recent actions |
| **Bottom Logs** | Real-time terminal-style log output |

### Task Types

| Type | Description | Use Case |
|------|-------------|----------|
| 🟒 **Low** | Simple single-page scraping | Product page, article text |
| 🟑 **Medium** | Multi-page with navigation | Search results, listings |
| πŸ”΄ **High** | Complex interactive tasks | Login-required, forms |

---

## Agents

ScrapeRL uses a multi-agent architecture where specialized agents handle different aspects of scraping.

### Available Agents

| Agent | Role | Description |
|-------|------|-------------|
| **Coordinator** | 🎯 Orchestrator | Manages all other agents |
| **Scraper** | πŸ“„ Extractor | Extracts data from content |
| **Navigator** | 🧭 Navigation | Handles page navigation |
| **Analyzer** | πŸ” Analysis | Analyzes data patterns |
| **Validator** | βœ… Validation | Validates data quality |

---

## Plugins

Extend ScrapeRL's capabilities with plugins.

### Categories

- **MCPs** - Browser automation (Browser Use, Puppeteer, Playwright)
- **Skills** - Task capabilities (Web Scraping, Data Extraction)
- **APIs** - External services (Firecrawl, Jina Reader, Serper)
- **Vision** - Visual AI (GPT-4V, Gemini Vision, Claude Vision)

---

## Memory System

| Layer | Purpose | Retention |
|-------|---------|-----------|
| **Working** | Current task | Session |
| **Episodic** | Experiences | Persistent |
| **Semantic** | Patterns | Persistent |
| **Procedural** | Actions | Persistent |

---

## API Keys

Configure in **Settings > API Keys**:

| Provider | Models |
|----------|--------|
| Groq | GPT-OSS 120B (Default) |
| Google | Gemini 2.5 Flash |
| OpenAI | GPT-4 Turbo |
| Anthropic | Claude 3 Opus |

---

## Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| \`Ctrl + Enter\` | Start/Stop episode |
| \`Ctrl + L\` | Clear logs |
| \`Escape\` | Close popups |
`;

const agentsContent = `
# Agents Documentation

## Multi-Agent Architecture

ScrapeRL employs a sophisticated multi-agent system where each agent specializes in specific tasks.

### Coordinator Agent

The brain of the operation. It:
- Decides which agents to activate
- Plans the scraping strategy
- Handles error recovery
- Optimizes resource usage

### Scraper Agent

Responsible for data extraction:
- HTML parsing and element selection
- Text content extraction
- Structured data identification
- Pattern recognition

### Navigator Agent

Handles all page interactions:
- URL navigation
- Link clicking
- Form submissions
- Pagination handling

### Analyzer Agent

Processes and analyzes data:
- Data validation
- Pattern detection
- Quality assessment
- Anomaly detection

### Validator Agent

Ensures data quality:
- Schema validation
- Completeness checks
- Duplicate detection
- Format verification

## Agent Communication

Agents communicate through a shared memory system:

\`\`\`
Coordinator -> Scraper: "Extract product data"
Scraper -> Memory: "Store extracted items"
Memory -> Analyzer: "New data available"
Analyzer -> Validator: "Validate these records"
Validator -> Coordinator: "Validation complete"
\`\`\`
`;

const pluginsContent = `
# Plugins Documentation

## Plugin Categories

### MCPs (Model Context Protocols)

Browser automation tools that integrate with AI models.

#### Browser Use
- AI-powered browser control
- Natural language commands
- Visual understanding
- Automatic element detection

#### Puppeteer MCP
- Headless Chrome automation
- Screenshot capture
- PDF generation
- Network interception

#### Playwright MCP
- Cross-browser support
- Mobile emulation
- Video recording
- Trace viewer

### Skills

Specialized capabilities for specific tasks.

#### Web Scraping
- CSS/XPath selectors
- Data extraction patterns
- Pagination handling
- Rate limiting

#### Data Extraction
- JSON/XML parsing
- Table extraction
- List processing
- Content classification

### APIs

External service integrations.

#### Firecrawl
- High-performance crawling
- JavaScript rendering
- Proxy rotation
- Rate limiting

#### Jina Reader
- Content extraction API
- Clean text output
- Structured data
- Multi-format support

### Vision Models

Visual understanding capabilities.

#### GPT-4 Vision
- Image analysis
- Screenshot understanding
- UI element detection
- Text extraction from images

## Installing Plugins

1. Navigate to Plugins page
2. Browse categories
3. Click Install on desired plugin
4. Configure API keys if required
`;

const memoryContent = `
# Memory System Documentation

## Hierarchical Memory Architecture

ScrapeRL uses a four-layer memory system inspired by human cognitive architecture.

### Working Memory

**Purpose:** Active task context

- Current URL and page state
- Active extraction targets
- Temporary calculations
- Session-specific data

**Retention:** Cleared after each episode

### Episodic Memory

**Purpose:** Experience records

- Past scraping sessions
- Success/failure patterns
- Timing data
- Action sequences

**Retention:** Persistent across sessions

### Semantic Memory

**Purpose:** Learned knowledge

- Website patterns
- Extraction rules
- Domain knowledge
- Best practices

**Retention:** Long-term persistent

### Procedural Memory

**Purpose:** Action sequences

- Navigation patterns
- Interaction sequences
- Recovery procedures
- Optimization strategies

**Retention:** Long-term persistent

## Memory Operations

### Store
\`\`\`json
{
  "content": "Product prices on example.com follow pattern...",
  "memory_type": "semantic",
  "metadata": {
    "domain": "example.com",
    "confidence": 0.95
  }
}
\`\`\`

### Query
\`\`\`json
{
  "query": "price extraction patterns",
  "memory_type": "semantic",
  "limit": 10
}
\`\`\`

### Consolidation

Automatic promotion of important memories:
- Working β†’ Episodic: At episode end
- Episodic β†’ Semantic: Pattern detection
- Episodic β†’ Procedural: Action sequences
`;

const apiContent = `
# API Reference

## Base URL

\`\`\`
http://localhost:7860/api
\`\`\`

## Health Check

### GET /health

Check system status.

**Response:**
\`\`\`json
{
  "status": "healthy",
  "version": "0.1.0",
  "timestamp": "2026-03-28T00:00:00Z"
}
\`\`\`

## Episode Endpoints

### POST /episode/reset

Start a new episode.

**Request:**
\`\`\`json
{
  "task_id": "scrape-products"
}
\`\`\`

### POST /episode/step

Execute an action.

**Request:**
\`\`\`json
{
  "action": "navigate",
  "params": { "url": "https://example.com" }
}
\`\`\`

### GET /episode/state

Get current state.

## Memory Endpoints

### POST /memory/store

Store a memory entry.

### POST /memory/query

Query memories.

### GET /memory/stats/overview

Get memory statistics.

## Plugin Endpoints

### GET /plugins/

List all plugins.

### POST /plugins/install

Install a plugin.

### POST /plugins/uninstall

Uninstall a plugin.

## Settings Endpoints

### GET /settings/

Get current settings.

### POST /settings/api-key

Update API key.

### POST /settings/model

Select active model.
`;

const docs: DocSection[] = [
  { id: 'guide', title: 'User Guide', icon: Home, content: userGuideContent },
  { id: 'agents', title: 'Agents', icon: Cpu, content: agentsContent },
  { id: 'plugins', title: 'Plugins', icon: Plug, content: pluginsContent },
  { id: 'memory', title: 'Memory System', icon: Database, content: memoryContent },
  { id: 'api', title: 'API Reference', icon: Terminal, content: apiContent },
];

export const DocsPage: React.FC<DocsPageProps> = ({ className }) => {
  const [activeDoc, setActiveDoc] = useState<string>('guide');
  const [searchQuery, setSearchQuery] = useState('');

  const currentDoc = docs.find((d) => d.id === activeDoc) || docs[0];

  return (
    <div className={classNames('flex h-[calc(100vh-120px)]', className)}>
      {/* Left Sidebar - Navigation */}
      <div className="w-64 flex-shrink-0 bg-gray-800/30 border-r border-gray-700/50 flex flex-col">
        <div className="p-4 border-b border-gray-700/50">
          <h2 className="text-lg font-semibold text-white flex items-center gap-2">
            <Book className="w-5 h-5 text-cyan-400" />
            Documentation
          </h2>
          <p className="text-xs text-gray-500 mt-1">Learn how to use ScrapeRL</p>
        </div>

        {/* Search */}
        <div className="p-3 border-b border-gray-700/50">
          <div className="relative">
            <Search className="absolute left-3 top-1/2 -translate-y-1/2 w-4 h-4 text-gray-500" />
            <input
              type="text"
              placeholder="Search docs..."
              value={searchQuery}
              onChange={(e) => setSearchQuery(e.target.value)}
              className="w-full pl-9 pr-3 py-2 bg-gray-900/50 border border-gray-700/50 rounded-lg text-sm text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-cyan-500/50"
            />
          </div>
        </div>

        {/* Navigation */}
        <nav className="flex-1 p-3 space-y-1 overflow-y-auto">
          {docs.map((doc) => {
            const Icon = doc.icon;
            const isActive = activeDoc === doc.id;
            return (
              <button
                key={doc.id}
                onClick={() => setActiveDoc(doc.id)}
                className={classNames(
                  'w-full flex items-center gap-3 px-3 py-2.5 rounded-lg text-left transition-all',
                  isActive
                    ? 'bg-cyan-500/20 border border-cyan-500/30 text-cyan-400'
                    : 'hover:bg-gray-700/50 text-gray-400 hover:text-gray-200'
                )}
              >
                <Icon className={classNames('w-4 h-4', isActive ? 'text-cyan-400' : 'text-gray-500')} />
                <span className="text-sm font-medium">{doc.title}</span>
              </button>
            );
          })}
        </nav>

        {/* Footer */}
        <div className="p-4 border-t border-gray-700/50">
          <a
            href="https://github.com/NeerajCodz/scrapeRL"
            target="_blank"
            rel="noopener noreferrer"
            className="flex items-center gap-2 text-xs text-gray-500 hover:text-gray-300 transition-colors"
          >
            <ExternalLink className="w-3 h-3" />
            View on GitHub
          </a>
        </div>
      </div>

      {/* Main Content - Markdown Viewer */}
      <div className="flex-1 overflow-y-auto">
        <div className="max-w-4xl mx-auto p-8">
          <article className="prose prose-invert prose-sm max-w-none">
            <ReactMarkdown
              remarkPlugins={[remarkGfm]}
              components={{
                h1: ({ children }) => (
                  <h1 className="text-3xl font-bold text-white mb-6 pb-4 border-b border-gray-700/50">
                    {children}
                  </h1>
                ),
                h2: ({ children }) => (
                  <h2 className="text-2xl font-semibold text-white mt-8 mb-4">{children}</h2>
                ),
                h3: ({ children }) => (
                  <h3 className="text-xl font-semibold text-gray-200 mt-6 mb-3">{children}</h3>
                ),
                h4: ({ children }) => (
                  <h4 className="text-lg font-medium text-gray-300 mt-4 mb-2">{children}</h4>
                ),
                p: ({ children }) => <p className="text-gray-400 mb-4 leading-relaxed">{children}</p>,
                ul: ({ children }) => <ul className="list-disc list-inside text-gray-400 mb-4 space-y-1">{children}</ul>,
                ol: ({ children }) => <ol className="list-decimal list-inside text-gray-400 mb-4 space-y-1">{children}</ol>,
                li: ({ children }) => <li className="text-gray-400">{children}</li>,
                strong: ({ children }) => <strong className="text-white font-semibold">{children}</strong>,
                em: ({ children }) => <em className="text-gray-300">{children}</em>,
                code: ({ children, className }) => {
                  const isBlock = className?.includes('language-');
                  if (isBlock) {
                    return (
                      <code className="block bg-gray-900 rounded-lg p-4 text-sm font-mono text-gray-300 overflow-x-auto">
                        {children}
                      </code>
                    );
                  }
                  return (
                    <code className="bg-gray-800 text-cyan-400 px-1.5 py-0.5 rounded text-sm font-mono">
                      {children}
                    </code>
                  );
                },
                pre: ({ children }) => <pre className="mb-4">{children}</pre>,
                blockquote: ({ children }) => (
                  <blockquote className="border-l-4 border-cyan-500/50 pl-4 italic text-gray-400 my-4">
                    {children}
                  </blockquote>
                ),
                table: ({ children }) => (
                  <div className="overflow-x-auto mb-4">
                    <table className="w-full border-collapse">{children}</table>
                  </div>
                ),
                thead: ({ children }) => <thead className="bg-gray-800/50">{children}</thead>,
                th: ({ children }) => (
                  <th className="px-4 py-2 text-left text-xs font-semibold text-gray-300 border-b border-gray-700">
                    {children}
                  </th>
                ),
                td: ({ children }) => (
                  <td className="px-4 py-2 text-sm text-gray-400 border-b border-gray-800">{children}</td>
                ),
                hr: () => <hr className="border-gray-700/50 my-8" />,
                a: ({ href, children }) => (
                  <a
                    href={href}
                    className="text-cyan-400 hover:text-cyan-300 underline underline-offset-2"
                    target="_blank"
                    rel="noopener noreferrer"
                  >
                    {children}
                  </a>
                ),
              }}
            >
              {currentDoc.content}
            </ReactMarkdown>
          </article>
        </div>
      </div>
    </div>
  );
};

export default DocsPage;