Ruhivig65 commited on
Commit
2ebd518
·
verified ·
1 Parent(s): c149fe9

Upload 3 files

Browse files
Files changed (3) hide show
  1. Dockerfile +78 -0
  2. README.md +42 -0
  3. requirements.txt +31 -0
Dockerfile ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ============================================
2
+ # Dockerfile for Hugging Face Spaces
3
+ # Novel Scraper with Playwright
4
+ # ============================================
5
+
6
+ # --- Stage 1: Use Python 3.10 slim base ---
7
+ FROM python:3.10-slim-bookworm
8
+
9
+ # --- Hugging Face requires port 7860 ---
10
+ ENV PORT=7860
11
+ ENV PYTHONUNBUFFERED=1
12
+ ENV PYTHONDONTWRITEBYTECODE=1
13
+ ENV DEBIAN_FRONTEND=noninteractive
14
+
15
+ # --- Create non-root user (HF requirement) ---
16
+ RUN useradd -m -u 1000 user
17
+
18
+ # --- Install system dependencies for Playwright ---
19
+ RUN apt-get update && apt-get install -y --no-install-recommends \
20
+ # Playwright browser dependencies
21
+ libnss3 \
22
+ libnspr4 \
23
+ libatk1.0-0 \
24
+ libatk-bridge2.0-0 \
25
+ libcups2 \
26
+ libdrm2 \
27
+ libdbus-1-3 \
28
+ libxkbcommon0 \
29
+ libatspi2.0-0 \
30
+ libxcomposite1 \
31
+ libxdamage1 \
32
+ libxfixes3 \
33
+ libxrandr2 \
34
+ libgbm1 \
35
+ libpango-1.0-0 \
36
+ libcairo2 \
37
+ libasound2 \
38
+ libwayland-client0 \
39
+ # Additional libs often needed
40
+ libglib2.0-0 \
41
+ libgtk-3-0 \
42
+ libx11-xcb1 \
43
+ fonts-liberation \
44
+ fonts-noto-cjk \
45
+ wget \
46
+ ca-certificates \
47
+ && rm -rf /var/lib/apt/lists/*
48
+
49
+ # --- Set working directory ---
50
+ WORKDIR /home/user/app
51
+
52
+ # --- Copy requirements first (Docker cache optimization) ---
53
+ COPY --chown=user:user requirements.txt .
54
+
55
+ # --- Install Python dependencies ---
56
+ RUN pip install --no-cache-dir --upgrade pip && \
57
+ pip install --no-cache-dir -r requirements.txt
58
+
59
+ # --- Install Playwright browsers (Chromium only to save space) ---
60
+ # Must run as user who will execute the app
61
+ USER user
62
+ RUN playwright install chromium
63
+
64
+ # --- Copy application code ---
65
+ COPY --chown=user:user . .
66
+
67
+ # --- Create screenshots directory ---
68
+ RUN mkdir -p /home/user/app/app/static/screenshots
69
+
70
+ # --- Expose the port ---
71
+ EXPOSE 7860
72
+
73
+ # --- Health check ---
74
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
75
+ CMD python -c "import httpx; httpx.get('http://localhost:7860/health')" || exit 1
76
+
77
+ # --- Start the application ---
78
+ CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1", "--timeout-keep-alive", "120"]
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📚 Novel Scraper Pro
2
+
3
+ Concurrent web novel scraper with captcha intervention support.
4
+ Runs on Hugging Face Spaces with Render PostgreSQL database.
5
+
6
+ ## 🚀 Quick Deploy to Hugging Face
7
+
8
+ ### Step 1: Create Render Database
9
+ 1. Go to [render.com](https://render.com)
10
+ 2. Create a new PostgreSQL database (Free tier)
11
+ 3. Copy the **External Database URL**
12
+ 4. Replace `postgresql://` with `postgresql+asyncpg://`
13
+
14
+ ### Step 2: Create Hugging Face Space
15
+ 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
16
+ 2. Click "Create new Space"
17
+ 3. Choose **Docker** as the SDK
18
+ 4. Set visibility to **Private** (recommended)
19
+
20
+ ### Step 3: Set Environment Variables
21
+ In your Space settings, add:
22
+
23
+ | Variable | Value |
24
+ |----------|-------|
25
+ | `DATABASE_URL` | `postgresql+asyncpg://user:pass@host:5432/dbname` |
26
+ | `MAX_CONCURRENT_BROWSERS` | `3` |
27
+ | `MIN_DELAY_SECONDS` | `3` |
28
+ | `MAX_DELAY_SECONDS` | `8` |
29
+
30
+ ### Step 4: Upload Files
31
+ Upload all project files to your Space repository.
32
+ The Space will automatically build and deploy.
33
+
34
+ ### Step 5: Use It!
35
+ 1. Open your Space URL
36
+ 2. Add novel URLs in the form
37
+ 3. Click "Start All"
38
+ 4. Watch the dashboard for progress
39
+ 5. Solve captchas when alerted
40
+ 6. Download completed novels as .txt
41
+
42
+ ## 📁 Project Structure
requirements.txt ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ============================================
2
+ # Novel Scraper - Exact Stable Versions
3
+ # Tested & Compatible with Python 3.10
4
+ # ============================================
5
+
6
+ # --- Web Framework ---
7
+ fastapi==0.110.0
8
+ uvicorn[standard]==0.27.1
9
+ python-multipart==0.0.9
10
+ jinja2==3.1.3
11
+
12
+ # --- Browser Automation ---
13
+ playwright==1.42.0
14
+
15
+ # --- Database (Async PostgreSQL) ---
16
+ SQLAlchemy[asyncio]==2.0.28
17
+ asyncpg==0.29.0
18
+ psycopg2-binary==2.9.9
19
+
20
+ # --- Environment & Config ---
21
+ pydantic==2.6.3
22
+ pydantic-settings==2.2.1
23
+ python-dotenv==1.0.1
24
+
25
+ # --- Utilities ---
26
+ aiofiles==23.2.1
27
+ httpx==0.27.0
28
+ Pillow==10.2.0
29
+
30
+ # --- For .epub generation (optional, .txt by default) ---
31
+ # ebooklib==0.18