Adeen commited on
Commit
ae14296
·
0 Parent(s):

Deploy: OCR support for PDF/Images/DOCX

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .dockerignore +10 -0
  2. .env +3 -0
  3. .gitignore +0 -0
  4. .lovable/plan.md +32 -0
  5. Dockerfile +23 -0
  6. README.md +14 -0
  7. bun.lock +0 -0
  8. components.json +20 -0
  9. dist/assets/KaTeX_AMS-Regular-BQhdFMY1.woff2 +0 -0
  10. dist/assets/KaTeX_AMS-Regular-DMm9YOAa.woff +0 -0
  11. dist/assets/KaTeX_AMS-Regular-DRggAlZN.ttf +0 -0
  12. dist/assets/KaTeX_Caligraphic-Bold-ATXxdsX0.ttf +0 -0
  13. dist/assets/KaTeX_Caligraphic-Bold-BEiXGLvX.woff +0 -0
  14. dist/assets/KaTeX_Caligraphic-Bold-Dq_IR9rO.woff2 +0 -0
  15. dist/assets/KaTeX_Caligraphic-Regular-CTRA-rTL.woff +0 -0
  16. dist/assets/KaTeX_Caligraphic-Regular-Di6jR-x-.woff2 +0 -0
  17. dist/assets/KaTeX_Caligraphic-Regular-wX97UBjC.ttf +0 -0
  18. dist/assets/KaTeX_Fraktur-Bold-BdnERNNW.ttf +0 -0
  19. dist/assets/KaTeX_Fraktur-Bold-BsDP51OF.woff +0 -0
  20. dist/assets/KaTeX_Fraktur-Bold-CL6g_b3V.woff2 +0 -0
  21. dist/assets/KaTeX_Fraktur-Regular-CB_wures.ttf +0 -0
  22. dist/assets/KaTeX_Fraktur-Regular-CTYiF6lA.woff2 +0 -0
  23. dist/assets/KaTeX_Fraktur-Regular-Dxdc4cR9.woff +0 -0
  24. dist/assets/KaTeX_Main-Bold-Cx986IdX.woff2 +0 -0
  25. dist/assets/KaTeX_Main-Bold-Jm3AIy58.woff +0 -0
  26. dist/assets/KaTeX_Main-Bold-waoOVXN0.ttf +0 -0
  27. dist/assets/KaTeX_Main-BoldItalic-DxDJ3AOS.woff2 +0 -0
  28. dist/assets/KaTeX_Main-BoldItalic-DzxPMmG6.ttf +0 -0
  29. dist/assets/KaTeX_Main-BoldItalic-SpSLRI95.woff +0 -0
  30. dist/assets/KaTeX_Main-Italic-3WenGoN9.ttf +0 -0
  31. dist/assets/KaTeX_Main-Italic-BMLOBm91.woff +0 -0
  32. dist/assets/KaTeX_Main-Italic-NWA7e6Wa.woff2 +0 -0
  33. dist/assets/KaTeX_Main-Regular-B22Nviop.woff2 +0 -0
  34. dist/assets/KaTeX_Main-Regular-Dr94JaBh.woff +0 -0
  35. dist/assets/KaTeX_Main-Regular-ypZvNtVU.ttf +0 -0
  36. dist/assets/KaTeX_Math-BoldItalic-B3XSjfu4.ttf +0 -0
  37. dist/assets/KaTeX_Math-BoldItalic-CZnvNsCZ.woff2 +0 -0
  38. dist/assets/KaTeX_Math-BoldItalic-iY-2wyZ7.woff +0 -0
  39. dist/assets/KaTeX_Math-Italic-DA0__PXp.woff +0 -0
  40. dist/assets/KaTeX_Math-Italic-flOr_0UB.ttf +0 -0
  41. dist/assets/KaTeX_Math-Italic-t53AETM-.woff2 +0 -0
  42. dist/assets/KaTeX_SansSerif-Bold-CFMepnvq.ttf +0 -0
  43. dist/assets/KaTeX_SansSerif-Bold-D1sUS0GD.woff2 +0 -0
  44. dist/assets/KaTeX_SansSerif-Bold-DbIhKOiC.woff +0 -0
  45. dist/assets/KaTeX_SansSerif-Italic-C3H0VqGB.woff2 +0 -0
  46. dist/assets/KaTeX_SansSerif-Italic-DN2j7dab.woff +0 -0
  47. dist/assets/KaTeX_SansSerif-Italic-YYjJ1zSn.ttf +0 -0
  48. dist/assets/KaTeX_SansSerif-Regular-BNo7hRIc.ttf +0 -0
  49. dist/assets/KaTeX_SansSerif-Regular-CS6fqUqJ.woff +0 -0
  50. dist/assets/KaTeX_SansSerif-Regular-DDBCnlJ7.woff2 +0 -0
.dockerignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ node_modules
2
+ dist
3
+ .git
4
+ .gitignore
5
+ .env
6
+ README.md
7
+ Dockerfile
8
+ .dockerignore
9
+ .lovable
10
+ supabase
.env ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ VITE_SUPABASE_PROJECT_ID="momxyrzmibxccqkmqbay"
2
+ VITE_SUPABASE_PUBLISHABLE_KEY="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Im1vbXh5cnptaWJ4Y2Nxa21xYmF5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NzY2MTcwNjksImV4cCI6MjA5MjE5MzA2OX0.en2svoWyapwx4olnlOdlVGVy9_2zC4z_05hh-zL7elI"
3
+ VITE_SUPABASE_URL="https://momxyrzmibxccqkmqbay.supabase.co"
.gitignore ADDED
Binary file (333 Bytes). View file
 
.lovable/plan.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Revised Plan — Local Whisper Note
3
+
4
+ ## The constraint
5
+ You want `openai-whisper` (the local Python package) instead of the OpenAI Whisper API for audio/video transcription.
6
+
7
+ ## Why this can't run inside Lovable's stack
8
+ Lovable apps run as **React + Vite (browser)** + **Supabase Edge Functions (Deno runtime)**. There is no Python process, no GPU, and Edge Functions have a ~150s timeout and small memory limits — local Whisper (PyTorch model, multi-GB, CPU/GPU heavy) cannot run there.
9
+
10
+ ## Two viable paths
11
+
12
+ **Option A — Self-hosted Whisper microservice (true local Whisper)**
13
+ - You run a tiny FastAPI service somewhere you control (your laptop, a VPS, RunPod, Modal, Fly.io GPU, etc.) that exposes `POST /transcribe`.
14
+ - That service does `pip install -U openai-whisper` + `whisper.load_model("base"|"small"|"medium")` and returns `{ text }`.
15
+ - Our Supabase Edge Function `ingest` uploads the audio file to that endpoint via its public URL + a shared secret, then stores `raw_text`.
16
+ - You give us: `WHISPER_ENDPOINT_URL` + `WHISPER_AUTH_TOKEN` (stored as Lovable Cloud secrets).
17
+ - Pros: real local Whisper, zero per-minute cost. Cons: you maintain the box.
18
+
19
+ **Option B — Hosted Whisper-compatible API (no API-key vendor lock)**
20
+ - Use a provider that runs the same `openai-whisper` model server-side (Replicate, Groq Whisper-large-v3, Deepgram Nova, Fal.ai). One key, no infra.
21
+ - Same code path as A, just a different URL.
22
+
23
+ ## Recommended
24
+ Go with **Option A**. I'll wire the Edge Function to call a configurable `WHISPER_ENDPOINT_URL`, and provide you a ready-to-run `whisper_server.py` (FastAPI + openai-whisper) plus a `requirements.txt` and run instructions, so you can `pip install -U openai-whisper && python whisper_server.py` on any machine and it just works. If you'd rather start with a hosted provider for now, swap the URL later — no code change.
25
+
26
+ ## Plan changes vs. previous
27
+ - Remove "OpenAI Whisper API key" from required secrets.
28
+ - Add `WHISPER_ENDPOINT_URL` + `WHISPER_AUTH_TOKEN` secrets.
29
+ - Slice 5 (Audio sources) deliverables now include: the standalone `whisper_server.py` script + README, plus the Edge Function client that POSTs the audio bytes and handles long-running transcription via the existing `jobs` table + Realtime progress.
30
+ - YouTube path unchanged (still needs a transcript API since `yt-dlp` also can't run in Edge Functions). If you'd prefer, the same self-hosted box can run `yt-dlp` + Whisper and we route YouTube through it too — tell me and I'll fold that in.
31
+
32
+ Everything else in the previously approved plan (dark UI, Notes/Flashcards/Quiz/Podcast/Chat tabs, pgvector RAG, jobs + Realtime, hash de-dup, Lovable AI Gateway for Gemini) stays the same.
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Build stage
2
+ FROM node:20-alpine AS build
3
+ WORKDIR /app
4
+ COPY package*.json ./
5
+ RUN npm install
6
+ COPY . .
7
+ # These will be provided by Hugging Face Secrets during build
8
+ # Vite automatically picks up VITE_ prefixed variables from the environment
9
+ RUN npm run build
10
+
11
+ # Production stage
12
+ FROM nginx:stable-alpine
13
+ COPY --from=build /app/dist /usr/share/nginx/html
14
+ RUN echo 'server { \
15
+ listen 7860; \
16
+ location / { \
17
+ root /usr/share/nginx/html; \
18
+ index index.html; \
19
+ try_files $uri $uri/ /index.html; \
20
+ } \
21
+ }' > /etc/nginx/conf.d/default.conf
22
+ EXPOSE 7860
23
+ CMD ["nginx", "-g", "daemon off;"]
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SOURCE.IO
3
+ emoji: 🚀
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
+ # SOURCE.AI
11
+
12
+ SOURCE TO YOUR STUDIES
13
+
14
+ This project is a React/Vite application deployed on Hugging Face Spaces.
bun.lock ADDED
The diff for this file is too large to render. See raw diff
 
components.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "$schema": "https://ui.shadcn.com/schema.json",
3
+ "style": "default",
4
+ "rsc": false,
5
+ "tsx": true,
6
+ "tailwind": {
7
+ "config": "tailwind.config.ts",
8
+ "css": "src/index.css",
9
+ "baseColor": "slate",
10
+ "cssVariables": true,
11
+ "prefix": ""
12
+ },
13
+ "aliases": {
14
+ "components": "@/components",
15
+ "utils": "@/lib/utils",
16
+ "ui": "@/components/ui",
17
+ "lib": "@/lib",
18
+ "hooks": "@/hooks"
19
+ }
20
+ }
dist/assets/KaTeX_AMS-Regular-BQhdFMY1.woff2 ADDED
Binary file (28.1 kB). View file
 
dist/assets/KaTeX_AMS-Regular-DMm9YOAa.woff ADDED
Binary file (33.5 kB). View file
 
dist/assets/KaTeX_AMS-Regular-DRggAlZN.ttf ADDED
Binary file (63.6 kB). View file
 
dist/assets/KaTeX_Caligraphic-Bold-ATXxdsX0.ttf ADDED
Binary file (12.4 kB). View file
 
dist/assets/KaTeX_Caligraphic-Bold-BEiXGLvX.woff ADDED
Binary file (7.72 kB). View file
 
dist/assets/KaTeX_Caligraphic-Bold-Dq_IR9rO.woff2 ADDED
Binary file (6.91 kB). View file
 
dist/assets/KaTeX_Caligraphic-Regular-CTRA-rTL.woff ADDED
Binary file (7.66 kB). View file
 
dist/assets/KaTeX_Caligraphic-Regular-Di6jR-x-.woff2 ADDED
Binary file (6.91 kB). View file
 
dist/assets/KaTeX_Caligraphic-Regular-wX97UBjC.ttf ADDED
Binary file (12.3 kB). View file
 
dist/assets/KaTeX_Fraktur-Bold-BdnERNNW.ttf ADDED
Binary file (19.6 kB). View file
 
dist/assets/KaTeX_Fraktur-Bold-BsDP51OF.woff ADDED
Binary file (13.3 kB). View file
 
dist/assets/KaTeX_Fraktur-Bold-CL6g_b3V.woff2 ADDED
Binary file (11.3 kB). View file
 
dist/assets/KaTeX_Fraktur-Regular-CB_wures.ttf ADDED
Binary file (19.6 kB). View file
 
dist/assets/KaTeX_Fraktur-Regular-CTYiF6lA.woff2 ADDED
Binary file (11.3 kB). View file
 
dist/assets/KaTeX_Fraktur-Regular-Dxdc4cR9.woff ADDED
Binary file (13.2 kB). View file
 
dist/assets/KaTeX_Main-Bold-Cx986IdX.woff2 ADDED
Binary file (25.3 kB). View file
 
dist/assets/KaTeX_Main-Bold-Jm3AIy58.woff ADDED
Binary file (29.9 kB). View file
 
dist/assets/KaTeX_Main-Bold-waoOVXN0.ttf ADDED
Binary file (51.3 kB). View file
 
dist/assets/KaTeX_Main-BoldItalic-DxDJ3AOS.woff2 ADDED
Binary file (16.8 kB). View file
 
dist/assets/KaTeX_Main-BoldItalic-DzxPMmG6.ttf ADDED
Binary file (33 kB). View file
 
dist/assets/KaTeX_Main-BoldItalic-SpSLRI95.woff ADDED
Binary file (19.4 kB). View file
 
dist/assets/KaTeX_Main-Italic-3WenGoN9.ttf ADDED
Binary file (33.6 kB). View file
 
dist/assets/KaTeX_Main-Italic-BMLOBm91.woff ADDED
Binary file (19.7 kB). View file
 
dist/assets/KaTeX_Main-Italic-NWA7e6Wa.woff2 ADDED
Binary file (17 kB). View file
 
dist/assets/KaTeX_Main-Regular-B22Nviop.woff2 ADDED
Binary file (26.3 kB). View file
 
dist/assets/KaTeX_Main-Regular-Dr94JaBh.woff ADDED
Binary file (30.8 kB). View file
 
dist/assets/KaTeX_Main-Regular-ypZvNtVU.ttf ADDED
Binary file (53.6 kB). View file
 
dist/assets/KaTeX_Math-BoldItalic-B3XSjfu4.ttf ADDED
Binary file (31.2 kB). View file
 
dist/assets/KaTeX_Math-BoldItalic-CZnvNsCZ.woff2 ADDED
Binary file (16.4 kB). View file
 
dist/assets/KaTeX_Math-BoldItalic-iY-2wyZ7.woff ADDED
Binary file (18.7 kB). View file
 
dist/assets/KaTeX_Math-Italic-DA0__PXp.woff ADDED
Binary file (18.7 kB). View file
 
dist/assets/KaTeX_Math-Italic-flOr_0UB.ttf ADDED
Binary file (31.3 kB). View file
 
dist/assets/KaTeX_Math-Italic-t53AETM-.woff2 ADDED
Binary file (16.4 kB). View file
 
dist/assets/KaTeX_SansSerif-Bold-CFMepnvq.ttf ADDED
Binary file (24.5 kB). View file
 
dist/assets/KaTeX_SansSerif-Bold-D1sUS0GD.woff2 ADDED
Binary file (12.2 kB). View file
 
dist/assets/KaTeX_SansSerif-Bold-DbIhKOiC.woff ADDED
Binary file (14.4 kB). View file
 
dist/assets/KaTeX_SansSerif-Italic-C3H0VqGB.woff2 ADDED
Binary file (12 kB). View file
 
dist/assets/KaTeX_SansSerif-Italic-DN2j7dab.woff ADDED
Binary file (14.1 kB). View file
 
dist/assets/KaTeX_SansSerif-Italic-YYjJ1zSn.ttf ADDED
Binary file (22.4 kB). View file
 
dist/assets/KaTeX_SansSerif-Regular-BNo7hRIc.ttf ADDED
Binary file (19.4 kB). View file
 
dist/assets/KaTeX_SansSerif-Regular-CS6fqUqJ.woff ADDED
Binary file (12.3 kB). View file
 
dist/assets/KaTeX_SansSerif-Regular-DDBCnlJ7.woff2 ADDED
Binary file (10.3 kB). View file