Thanmay Mohandas Das commited on
Commit
c092a08
·
unverified ·
2 Parent(s): 8a7ed3369068b7

Merge pull request #2 from tAnboyy/feature/notebookcrud-supabase-frontend

Browse files
.gitignore CHANGED
@@ -37,3 +37,6 @@ logs/
37
  # Gradio temp files
38
  gradio_cached_examples/
39
  flagged/
 
 
 
 
37
  # Gradio temp files
38
  gradio_cached_examples/
39
  flagged/
40
+
41
+ # User data storage
42
+ data/
HANDOFF_NOTEBOOKS.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NotebookLM Clone - Handoff Document
2
+
3
+ ## Stack
4
+
5
+ - **Auth:** Hugging Face OAuth (`gr.LoginButton`, `user_id` = HF username)
6
+ - **Metadata:** Supabase (notebooks, messages, artifacts)
7
+ - **Files:** Supabase Storage bucket `notebooklm`
8
+ - **Vectors:** Supabase pgvector (chunks table)
9
+
10
+ ## Setup
11
+
12
+ ### 1. Supabase
13
+
14
+ - Run `db/schema.sql` in SQL Editor
15
+ - Create Storage bucket: **Storage** → **New bucket** → name `notebooklm`, set public/private as needed
16
+ - Add RLS policies for the bucket if using private access
17
+
18
+ ### 2. HF Space
19
+
20
+ - Add `hf_oauth: true` in README (already done)
21
+ - Add `SUPABASE_URL`, `SUPABASE_KEY` (service role) as Space secrets
22
+ - Optional: `SUPABASE_BUCKET` (default: notebooklm)
23
+
24
+ ### 3. Local
25
+
26
+ - `HF_TOKEN` env var or `huggingface-cli login` (required for OAuth mock)
27
+ - `.env` with `SUPABASE_URL`, `SUPABASE_KEY`
28
+ - `pip install gradio[oauth]` (or `itsdangerous`) for LoginButton
29
+
30
+ ## Storage (Supabase Storage)
31
+
32
+ ```python
33
+ from backend.storage import get_sources_path, save_file, load_file
34
+
35
+ # Ingestion: save uploaded PDF
36
+ prefix = get_sources_path(user_id, notebook_id) # "user_id/notebook_id/sources"
37
+ path = f"{prefix}/document.pdf"
38
+ save_file(path, file_bytes)
39
+
40
+ # Load
41
+ data = load_file(path)
42
+ ```
43
+
44
+ Paths: `{user_id}/{notebook_id}/sources|embeddings|chats|artifacts}/{filename}`
45
+
46
+ ## Notebook API
47
+
48
+ - `create_notebook(user_id, name)`
49
+ - `list_notebooks(user_id)`
50
+ - `rename_notebook(user_id, notebook_id, new_name)`
51
+ - `delete_notebook(user_id, notebook_id)`
52
+
53
+ ## Chat (Supabase messages table)
54
+
55
+ - `save_message(notebook_id, role, content)`
56
+ - `load_chat(notebook_id)`
57
+
58
+ ## Embeddings (pgvector)
59
+
60
+ Table `chunks`: id, notebook_id, source_id, content, embedding vector(1536), metadata, created_at.
61
+
62
+ Ingestion team: embed chunks, insert into `chunks`, filter by `notebook_id` for retrieval.
63
+
64
+ ---
65
+
66
+ ## Handover: Ingestion & RAG Builders
67
+
68
+ ### Where to Write Your Code
69
+
70
+ | Responsibility | File / Location | Purpose |
71
+ |----------------|-----------------|---------|
72
+ | **Ingestion** | `backend/ingestion_service.py` (create this) | Parse uploaded files, chunk text, compute embeddings, insert into `chunks` |
73
+ | **RAG** | `backend/rag_service.py` (create this) | Embed query → similarity search → build context → call LLM → return answer |
74
+ | **Storage** | `backend/storage.py` (existing) | Save/load files in Supabase Storage; do not modify |
75
+ | **Chat** | `backend/chat_service.py` (existing) | Save/load messages; RAG calls `save_message` and `load_chat` |
76
+ | **UI** | `app.py` | Add upload component + chat interface; wire to ingestion and RAG |
77
+
78
+ ---
79
+
80
+ ### Ingestion Builder
81
+
82
+ **Write your code in:** `backend/ingestion_service.py`
83
+
84
+ **Flow:**
85
+ 1. Receive: `user_id`, `notebook_id`, uploaded file bytes, and filename.
86
+ 2. Save raw file via storage:
87
+ ```python
88
+ from backend.storage import get_sources_path, save_file
89
+ prefix = get_sources_path(user_id, notebook_id) # → "user_id/notebook_id/sources"
90
+ path = f"{prefix}/{filename}"
91
+ save_file(path, file_bytes)
92
+ ```
93
+ 3. Parse file (PDF, DOCX, TXT, etc.) and extract text.
94
+ 4. Chunk text (e.g., 512–1024 tokens with overlap).
95
+ 5. Compute embeddings (e.g., OpenAI `text-embedding-3-small` → 1536 dims, or compatible).
96
+ 6. Insert rows into `chunks`:
97
+ ```python
98
+ supabase.table("chunks").insert({
99
+ "notebook_id": notebook_id,
100
+ "source_id": path, # or your source identifier
101
+ "content": chunk_text,
102
+ "embedding": embedding_list, # list of 1536 floats
103
+ "metadata": {"page": 1, "chunk_idx": 0} # optional
104
+ }).execute()
105
+ ```
106
+
107
+ **Integrate in app:**
108
+ - Add `gr.File` or `gr.Upload` in `app.py` for the selected notebook.
109
+ - On upload, call `ingest_file(user_id, notebook_id, file_bytes, filename)` from your new service.
110
+
111
+ **Existing helpers:** `backend/storage` (save_file, load_file, list_files, get_sources_path).
112
+
113
+ ---
114
+
115
+ ### RAG Builder
116
+
117
+ **Write your code in:** `backend/rag_service.py`
118
+
119
+ **Flow:**
120
+ 1. Receive: `notebook_id`, user query.
121
+ 2. Embed the query (same model/dims as ingestion, e.g. 1536).
122
+ 3. Similarity search in `chunks`:
123
+ ```python
124
+ # Supabase pgvector example (cosine similarity)
125
+ result = supabase.rpc(
126
+ "match_chunks",
127
+ {"query_embedding": embedding, "match_count": 5, "p_notebook_id": notebook_id}
128
+ ).execute()
129
+ ```
130
+ - You must add a Supabase function `match_chunks` that filters by `notebook_id` and runs vector similarity (or use raw SQL).
131
+ - Alternative: use `supabase.table("chunks").select("*").eq("notebook_id", notebook_id)` and do similarity in Python (less efficient).
132
+ 4. Build context from top-k chunks.
133
+ 5. Call LLM (Hugging Face Inference API, OpenAI, etc.) with context + history.
134
+ 6. Persist messages via `chat_service`:
135
+ ```python
136
+ from backend.chat_service import save_message, load_chat
137
+ save_message(notebook_id, "user", query)
138
+ save_message(notebook_id, "assistant", answer)
139
+ ```
140
+
141
+ **Integrate in app:**
142
+ - Add a chat block in `app.py` (Chatbot component) tied to `selected_notebook_id`.
143
+ - On submit: call `rag_chat(notebook_id, query, chat_history)` → returns assistant reply; update history using `load_chat(notebook_id)` or append locally.
144
+
145
+ **Existing helpers:** `backend/chat_service` (save_message, load_chat), `backend/db` (supabase).
146
+
147
+ ---
148
+
149
+ ### Schema Reference (for both)
150
+
151
+ ```sql
152
+ -- chunks table (db/schema.sql)
153
+ chunks (
154
+ id uuid,
155
+ notebook_id uuid,
156
+ source_id text,
157
+ content text,
158
+ embedding vector(1536),
159
+ metadata jsonb,
160
+ created_at timestamptz
161
+ )
162
+ ```
163
+
164
+ **Required:** `embedding` must be 1536 dimensions (or update schema if using a different model).
165
+
166
+ ---
167
+
168
+ ### Suggested RPC for RAG (optional)
169
+
170
+ Add this in Supabase SQL Editor if you prefer server-side similarity:
171
+
172
+ ```sql
173
+ create or replace function match_chunks(
174
+ query_embedding vector(1536),
175
+ match_count int,
176
+ p_notebook_id uuid
177
+ )
178
+ returns table (id uuid, content text, metadata jsonb, similarity float)
179
+ language plpgsql as $$
180
+ begin
181
+ return query
182
+ select c.id, c.content, c.metadata,
183
+ 1 - (c.embedding <=> query_embedding) as similarity
184
+ from chunks c
185
+ where c.notebook_id = p_notebook_id
186
+ order by c.embedding <=> query_embedding
187
+ limit match_count;
188
+ end;
189
+ $$;
190
+ ```
191
+
192
+ Ingestion writes to `chunks`; RAG reads via `match_chunks` or equivalent.
README.md CHANGED
@@ -8,6 +8,12 @@ sdk_version: "4.44.0"
8
  python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
8
  python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
11
+
12
+ hf_oauth: true
13
+ hf_oauth_expiration_minutes: 480
14
+ hf_oauth_scopes:
15
+ - email
16
+ - read-repos
17
  ---
18
 
19
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -1,7 +1,232 @@
 
 
 
 
 
 
 
 
1
  import gradio as gr
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!. Testing Hugging Face deployment!"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
- demo = gr.Interface(fn=greet, inputs="text", outputs="text")
7
  demo.launch()
 
1
+ from pathlib import Path
2
+
3
+ from dotenv import load_dotenv
4
+
5
+ # Load .env from project root (parent of NotebookLM-Clone) so HF_TOKEN etc. are available
6
+ load_dotenv(Path(__file__).resolve().parent.parent / ".env")
7
+ load_dotenv(Path(__file__).resolve().parent / ".env")
8
+
9
  import gradio as gr
10
 
11
+ from backend.notebook_service import create_notebook, list_notebooks, rename_notebook, delete_notebook
12
+
13
+ # Theme: adapts to light/dark mode
14
+ theme = gr.themes.Soft(
15
+ primary_hue="blue",
16
+ secondary_hue="slate",
17
+ font=gr.themes.GoogleFont("Inter"),
18
+ )
19
+
20
+ CUSTOM_CSS = """
21
+ .container { max-width: 720px; margin: 0 auto; padding: 0 24px; }
22
+ .login-center { display: flex; flex-direction: column; align-items: center; justify-content: center; gap: 12px; padding: 24px 0; }
23
+ .login-center .login-btn-wrap { display: flex; justify-content: center; width: 100%; }
24
+ .login-center .login-btn-wrap button { display: inline-flex; align-items: center; gap: 8px; }
25
+ .hero { font-size: 1.5rem; font-weight: 600; color: #1e293b; margin-bottom: 8px; }
26
+ .sub { font-size: 0.875rem; color: #64748b; margin-bottom: 24px; }
27
+ .nb-row { display: flex; align-items: center; gap: 12px; padding: 10px 0; border-bottom: 1px solid #e2e8f0; }
28
+ .nb-row:last-child { border-bottom: none; }
29
+ .gr-button { min-height: 36px !important; padding: 0 16px !important; font-weight: 500 !important; border-radius: 8px !important; }
30
+ .gr-input { min-height: 40px !important; border-radius: 8px !important; }
31
+ .status { font-size: 0.875rem; color: #64748b; margin-top: 16px; padding: 12px 16px; background: #f8fafc; border-radius: 8px; }
32
+ @media (prefers-color-scheme: dark) {
33
+ .hero { color: #f1f5f9 !important; }
34
+ .sub { color: #94a3b8 !important; }
35
+ .nb-row { border-color: #334155 !important; }
36
+ .status { color: #94a3b8 !important; background: #1e293b !important; }
37
+ }
38
+ .dark .hero { color: #f1f5f9 !important; }
39
+ .dark .sub { color: #94a3b8 !important; }
40
+ .dark .nb-row { border-color: #334155 !important; }
41
+ .dark .status { color: #94a3b8 !important; background: #1e293b !important; }
42
+ """
43
+
44
+ MAX_NOTEBOOKS = 20
45
+
46
+
47
+ def _user_id(profile: gr.OAuthProfile | None) -> str | None:
48
+ """Extract user_id from HF OAuth profile. None if not logged in."""
49
+ return profile.name if profile else None
50
+
51
+
52
+ def _get_notebooks(user_id: str | None):
53
+ if not user_id:
54
+ return []
55
+ return list_notebooks(user_id)
56
+
57
+
58
+ def _safe_create(new_name, state, selected_id, profile: gr.OAuthProfile | None):
59
+ """Create notebook with name from text box."""
60
+ try:
61
+ user_id = _user_id(profile)
62
+ if not user_id:
63
+ return gr.skip(), gr.skip(), gr.skip(), "Please sign in with Hugging Face", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
64
+ name = (new_name or "").strip() or "Untitled Notebook"
65
+ nb = create_notebook(user_id, name)
66
+ if nb:
67
+ notebooks = _get_notebooks(user_id)
68
+ state = [(n["notebook_id"], n["name"]) for n in notebooks]
69
+ updates = _build_row_updates(notebooks)
70
+ new_selected = nb["notebook_id"]
71
+ status = f"Created: {nb['name']}"
72
+ return "", state, new_selected, status, *updates
73
+ return gr.skip(), gr.skip(), gr.skip(), "Failed to create", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
74
+ except Exception as e:
75
+ return gr.skip(), gr.skip(), gr.skip(), f"Error: {e}", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
76
+
77
+
78
+ def _safe_rename(idx, new_name, state, selected_id, profile: gr.OAuthProfile | None):
79
+ """Rename notebook at index."""
80
+ try:
81
+ if idx is None or idx < 0 or idx >= len(state):
82
+ return gr.skip(), gr.skip(), gr.skip(), *([gr.skip()] * (MAX_NOTEBOOKS * 2))
83
+ nb_id, _ = state[idx]
84
+ name = (new_name or "").strip()
85
+ if not name:
86
+ return gr.skip(), gr.skip(), gr.skip(), "Enter a name.", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
87
+ user_id = _user_id(profile)
88
+ if not user_id:
89
+ return gr.skip(), gr.skip(), gr.skip(), "Please sign in", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
90
+ ok = rename_notebook(user_id, nb_id, name)
91
+ if ok:
92
+ notebooks = _get_notebooks(user_id)
93
+ state = [(n["notebook_id"], n["name"]) for n in notebooks]
94
+ updates = _build_row_updates(notebooks)
95
+ return state, selected_id, f"Renamed to: {name}", *updates
96
+ return gr.skip(), gr.skip(), gr.skip(), "Failed to rename", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
97
+ except Exception as e:
98
+ return gr.skip(), gr.skip(), gr.skip(), f"Error: {e}", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
99
+
100
+
101
+ def _safe_delete(idx, state, selected_id, profile: gr.OAuthProfile | None):
102
+ """Delete notebook at index."""
103
+ try:
104
+ if idx is None or idx < 0 or idx >= len(state):
105
+ return gr.skip(), gr.skip(), gr.skip(), *([gr.skip()] * (MAX_NOTEBOOKS * 2))
106
+ nb_id, _ = state[idx]
107
+ user_id = _user_id(profile)
108
+ if not user_id:
109
+ return gr.skip(), gr.skip(), gr.skip(), "Please sign in", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
110
+ ok = delete_notebook(user_id, nb_id)
111
+ if ok:
112
+ notebooks = _get_notebooks(user_id)
113
+ state = [(n["notebook_id"], n["name"]) for n in notebooks]
114
+ updates = _build_row_updates(notebooks)
115
+ new_selected = notebooks[0]["notebook_id"] if notebooks else None
116
+ return state, new_selected, "Notebook deleted", *updates
117
+ return gr.skip(), gr.skip(), gr.skip(), "Failed to delete", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
118
+ except Exception as e:
119
+ return gr.skip(), gr.skip(), gr.skip(), f"Error: {e}", *([gr.skip()] * (MAX_NOTEBOOKS * 2))
120
+
121
+
122
+ def _select_notebook(idx, state):
123
+ """Set selected notebook when user interacts with a row."""
124
+ if idx is None or idx < 0 or idx >= len(state):
125
+ return gr.skip()
126
+ return state[idx][0]
127
+
128
+
129
+ def _initial_load(profile: gr.OAuthProfile | None):
130
+ """Load notebooks on app load. Uses HF OAuth profile for user_id."""
131
+ user_id = _user_id(profile)
132
+ notebooks = _get_notebooks(user_id)
133
+ state = [(n["notebook_id"], n["name"]) for n in notebooks]
134
+ selected = notebooks[0]["notebook_id"] if notebooks else None
135
+ updates = _build_row_updates(notebooks)
136
+ status = f"Signed in as {user_id}" if user_id else "Sign in with Hugging Face to manage notebooks."
137
+ return state, selected, status, *updates
138
+
139
+
140
+ def _build_row_updates(notebooks):
141
+ """Return gr.update values for each row: visibility, then text value."""
142
+ out = []
143
+ for i in range(MAX_NOTEBOOKS):
144
+ visible = i < len(notebooks)
145
+ name = notebooks[i]["name"] if visible else ""
146
+ out.append(gr.update(visible=visible))
147
+ out.append(gr.update(value=name, visible=visible))
148
+ return out
149
+
150
+
151
+ with gr.Blocks(
152
+ title="NotebookLM Clone - Notebooks",
153
+ theme=theme,
154
+ css=CUSTOM_CSS,
155
+ ) as demo:
156
+ gr.HTML('<div class="container"><p class="hero">Notebook Manager</p><p class="sub">Create notebook below, then manage with Rename and Delete</p></div>')
157
+
158
+ with gr.Row(elem_classes=["login-center"]):
159
+ gr.Markdown("**Sign in with Hugging Face to access your notebooks**")
160
+ with gr.Row(elem_classes=["login-btn-wrap"]):
161
+ login_btn = gr.LoginButton(value="🤗 Login with Hugging Face", size="lg")
162
+
163
+ nb_state = gr.State([])
164
+ selected_notebook_id = gr.State(None)
165
+
166
+ # Create section: text box + Create button
167
+ with gr.Row():
168
+ create_txt = gr.Textbox(
169
+ label="Create notebook",
170
+ placeholder="Enter new notebook name",
171
+ value="",
172
+ scale=3,
173
+ )
174
+ create_btn = gr.Button("Create", variant="primary", scale=1)
175
+
176
+ gr.Markdown("---")
177
+ gr.Markdown("**Your notebooks** (selected notebook used for chat/ingestion)")
178
+
179
+ # Rows: each notebook has [name] [Rename] [Delete]
180
+ row_components = []
181
+ row_outputs = []
182
+ for i in range(MAX_NOTEBOOKS):
183
+ with gr.Row(visible=False) as row:
184
+ name_txt = gr.Textbox(
185
+ value="",
186
+ show_label=False,
187
+ scale=3,
188
+ min_width=200,
189
+ )
190
+ rename_btn = gr.Button("Rename", scale=1, min_width=80)
191
+ delete_btn = gr.Button("Delete", variant="stop", scale=1, min_width=80)
192
+ select_btn = gr.Button("Select", scale=1, min_width=70)
193
+ row_components.append({"row": row, "name": name_txt, "rename": rename_btn, "delete": delete_btn, "select": select_btn})
194
+ row_outputs.extend([row, name_txt])
195
+
196
+ status = gr.Markdown("Sign in with Hugging Face to manage notebooks.", elem_classes=["status"])
197
+
198
+ demo.load(_initial_load, inputs=None, outputs=[nb_state, selected_notebook_id, status] + row_outputs)
199
+
200
+ # Create button
201
+ create_btn.click(
202
+ _safe_create,
203
+ inputs=[create_txt, nb_state, selected_notebook_id],
204
+ outputs=[create_txt, nb_state, selected_notebook_id, status] + row_outputs,
205
+ )
206
+
207
+ # Per-row: Rename, Delete, Select (profile injected by Gradio for OAuth)
208
+ for i in range(MAX_NOTEBOOKS):
209
+ rename_btn = row_components[i]["rename"]
210
+ delete_btn = row_components[i]["delete"]
211
+ select_btn = row_components[i]["select"]
212
+ name_txt = row_components[i]["name"]
213
+
214
+ rename_btn.click(
215
+ _safe_rename,
216
+ inputs=[gr.State(i), name_txt, nb_state, selected_notebook_id],
217
+ outputs=[nb_state, selected_notebook_id, status] + row_outputs,
218
+ )
219
+ delete_btn.click(
220
+ _safe_delete,
221
+ inputs=[gr.State(i), nb_state, selected_notebook_id],
222
+ outputs=[nb_state, selected_notebook_id, status] + row_outputs,
223
+ )
224
+ def _on_select():
225
+ return "Selected notebook updated. Use this for chat/ingestion."
226
+ select_btn.click(
227
+ _select_notebook,
228
+ inputs=[gr.State(i), nb_state],
229
+ outputs=[selected_notebook_id],
230
+ ).then(_on_select, None, [status])
231
 
 
232
  demo.launch()
backend/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Backend services: storage, notebooks, chat."""
backend/artifacts_service.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Artifacts - store references to generated reports, quizzes, podcasts, etc."""
2
+
3
+ from backend.db import supabase
4
+
5
+
6
+ def create_artifact(notebook_id: str, type: str, storage_path: str) -> dict | None:
7
+ """Create artifact record. Returns {id, notebook_id, type, storage_path, created_at} or None."""
8
+ try:
9
+ result = supabase.table("artifacts").insert({
10
+ "notebook_id": notebook_id,
11
+ "type": type,
12
+ "storage_path": storage_path,
13
+ }).execute()
14
+ return result.data[0] if result.data else None
15
+ except Exception:
16
+ return None
17
+
18
+
19
+ def list_artifacts(notebook_id: str) -> list[dict]:
20
+ """List artifacts for notebook. Returns [{id, type, storage_path, created_at}, ...]."""
21
+ try:
22
+ result = (
23
+ supabase.table("artifacts")
24
+ .select("id, type, storage_path, created_at")
25
+ .eq("notebook_id", notebook_id)
26
+ .order("created_at", desc=True)
27
+ .execute()
28
+ )
29
+ return result.data or []
30
+ except Exception:
31
+ return []
backend/chat_service.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Chat persistence - save/load messages via Supabase messages table."""
2
+
3
+ from backend.db import supabase
4
+
5
+
6
+ def save_message(notebook_id: str, role: str, content: str) -> None:
7
+ """Append a message to the messages table."""
8
+ supabase.table("messages").insert({
9
+ "notebook_id": notebook_id,
10
+ "role": role,
11
+ "content": content,
12
+ }).execute()
13
+
14
+
15
+ def load_chat(notebook_id: str) -> list[dict]:
16
+ """Load chat history. Returns [{role, content, created_at}, ...]."""
17
+ result = (
18
+ supabase.table("messages")
19
+ .select("role, content, created_at")
20
+ .eq("notebook_id", notebook_id)
21
+ .order("created_at")
22
+ .execute()
23
+ )
24
+ rows = result.data or []
25
+ return [{"role": r["role"], "content": r["content"], "timestamp": r["created_at"]} for r in rows]
backend/db.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Shared Supabase client."""
2
+
3
+ import os
4
+ from pathlib import Path
5
+
6
+ from dotenv import load_dotenv
7
+ from supabase import create_client, Client
8
+
9
+ load_dotenv(Path(__file__).resolve().parent.parent.parent / ".env")
10
+ load_dotenv(Path(__file__).resolve().parent.parent / ".env")
11
+
12
+ url = os.getenv("SUPABASE_URL")
13
+ key = os.getenv("SUPABASE_KEY")
14
+ if not url or not key:
15
+ raise ValueError("SUPABASE_URL and SUPABASE_KEY must be set")
16
+
17
+ supabase: Client = create_client(url, key)
backend/notebook_service.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Notebook CRUD service - spec-aligned API, Supabase + storage."""
2
+
3
+ import logging
4
+ from datetime import datetime, timezone
5
+
6
+ from backend.db import supabase
7
+ from backend.storage import ensure_notebook_dirs
8
+
9
+ log = logging.getLogger(__name__)
10
+
11
+
12
+ def _to_spec(row: dict) -> dict:
13
+ """Map DB row to spec format."""
14
+ return {
15
+ "notebook_id": str(row["id"]),
16
+ "name": row["name"],
17
+ "created_at": row.get("created_at"),
18
+ }
19
+
20
+
21
+ def create_notebook(user_id: str, name: str = "Untitled Notebook") -> dict | None:
22
+ """Create notebook. Returns {notebook_id, name, created_at} or None on error."""
23
+ try:
24
+ data = {"user_id": user_id, "name": name}
25
+ result = supabase.table("notebooks").insert(data).execute()
26
+ rows = result.data
27
+ if not rows:
28
+ return None
29
+ row = rows[0]
30
+ nb_id = str(row["id"])
31
+ ensure_notebook_dirs(user_id, nb_id)
32
+ return _to_spec(row)
33
+ except Exception as e:
34
+ log.exception("create_notebook failed")
35
+ return None
36
+
37
+
38
+ def list_notebooks(user_id: str) -> list[dict]:
39
+ """List notebooks for user. Returns [{notebook_id, name, created_at}, ...]."""
40
+ try:
41
+ result = (
42
+ supabase.table("notebooks")
43
+ .select("*")
44
+ .eq("user_id", user_id)
45
+ .order("updated_at", desc=True)
46
+ .execute()
47
+ )
48
+ return [_to_spec(r) for r in (result.data or [])]
49
+ except Exception as e:
50
+ log.exception("list_notebooks failed")
51
+ return []
52
+
53
+
54
+ def rename_notebook(user_id: str, notebook_id: str, new_name: str) -> bool:
55
+ """Rename notebook. Returns success."""
56
+ try:
57
+ result = (
58
+ supabase.table("notebooks")
59
+ .update({"name": new_name, "updated_at": datetime.now(timezone.utc).isoformat()})
60
+ .eq("id", notebook_id)
61
+ .eq("user_id", user_id)
62
+ .execute()
63
+ )
64
+ return len(result.data or []) > 0
65
+ except Exception:
66
+ return False
67
+
68
+
69
+ def delete_notebook(user_id: str, notebook_id: str) -> bool:
70
+ """Delete notebook. Returns success."""
71
+ try:
72
+ supabase.table("notebooks").delete().eq("id", notebook_id).eq("user_id", user_id).execute()
73
+ return True
74
+ except Exception:
75
+ return False
backend/storage.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Storage layer - Supabase Storage for files.
3
+ Path structure: {user_id}/{notebook_id}/sources/, embeddings/, chats/, artifacts/
4
+ """
5
+
6
+ import os
7
+ from pathlib import Path
8
+
9
+ from backend.db import supabase
10
+
11
+ BUCKET = os.getenv("SUPABASE_BUCKET", "notebooklm")
12
+
13
+
14
+ def _validate_segment(s: str) -> bool:
15
+ """Reject path traversal and invalid chars."""
16
+ if not s or ".." in s or "/" in s or "\\" in s:
17
+ return False
18
+ return True
19
+
20
+
21
+ def _base_path(user_id: str, notebook_id: str) -> str:
22
+ """Return base path for notebook. Raises on invalid input."""
23
+ if not _validate_segment(user_id) or not _validate_segment(notebook_id):
24
+ raise ValueError("Invalid user_id or notebook_id (path safety)")
25
+ return f"{user_id}/{notebook_id}"
26
+
27
+
28
+ def get_sources_path(user_id: str, notebook_id: str) -> str:
29
+ """Path prefix for notebook sources. Ingestion saves uploads here."""
30
+ return f"{_base_path(user_id, notebook_id)}/sources"
31
+
32
+
33
+ def get_embeddings_path(user_id: str, notebook_id: str) -> str:
34
+ """Path prefix for embeddings."""
35
+ return f"{_base_path(user_id, notebook_id)}/embeddings"
36
+
37
+
38
+ def get_chats_path(user_id: str, notebook_id: str) -> str:
39
+ """Path prefix for chat files."""
40
+ return f"{_base_path(user_id, notebook_id)}/chats"
41
+
42
+
43
+ def get_artifacts_path(user_id: str, notebook_id: str) -> str:
44
+ """Path prefix for artifacts."""
45
+ return f"{_base_path(user_id, notebook_id)}/artifacts"
46
+
47
+
48
+ def ensure_notebook_dirs(user_id: str, notebook_id: str) -> None:
49
+ """No-op for Supabase Storage - paths are created on first upload."""
50
+ _base_path(user_id, notebook_id)
51
+
52
+
53
+ def save_file(storage_path: str, content: bytes | str) -> None:
54
+ """Save content to Supabase Storage. Path must be within bucket (no leading /)."""
55
+ if ".." in storage_path or storage_path.startswith("/"):
56
+ raise ValueError("Invalid storage path")
57
+ data = content.encode("utf-8") if isinstance(content, str) else content
58
+ supabase.storage.from_(BUCKET).upload(
59
+ path=storage_path,
60
+ file=data,
61
+ file_options={"upsert": "true"},
62
+ )
63
+
64
+
65
+ def load_file(storage_path: str) -> bytes:
66
+ """Load file from Supabase Storage. Returns bytes."""
67
+ if ".." in storage_path or storage_path.startswith("/"):
68
+ raise ValueError("Invalid storage path")
69
+ return supabase.storage.from_(BUCKET).download(storage_path)
70
+
71
+
72
+ def list_files(prefix: str) -> list[str]:
73
+ """List file paths under prefix."""
74
+ try:
75
+ result = supabase.storage.from_(BUCKET).list(prefix.rstrip("/"))
76
+ paths = []
77
+ for item in result:
78
+ name = item.get("name") if isinstance(item, dict) else getattr(item, "name", None)
79
+ if not name or name == ".emptyFolderPlaceholder":
80
+ continue
81
+ path = f"{prefix.rstrip('/')}/{name}"
82
+ if isinstance(item, dict) and item.get("id") is None: # folder
83
+ paths.extend(list_files(path + "/"))
84
+ else:
85
+ paths.append(path)
86
+ return paths
87
+ except Exception:
88
+ return []
db/schema.sql ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ -- Run this in Supabase SQL Editor
2
+
3
+ -- notebooks (existing)
4
+ create table if not exists notebooks (
5
+ id uuid primary key default gen_random_uuid(),
6
+ user_id text not null,
7
+ name varchar(255) not null default 'Untitled Notebook',
8
+ created_at timestamptz default now(),
9
+ updated_at timestamptz default now()
10
+ );
11
+ create index if not exists idx_notebooks_user_id on notebooks(user_id);
12
+
13
+ -- messages
14
+ create table if not exists messages (
15
+ id uuid primary key default gen_random_uuid(),
16
+ notebook_id uuid not null references notebooks(id) on delete cascade,
17
+ role text not null,
18
+ content text not null,
19
+ created_at timestamptz default now()
20
+ );
21
+ create index if not exists idx_messages_notebook_id on messages(notebook_id);
22
+
23
+ -- artifacts
24
+ create table if not exists artifacts (
25
+ id uuid primary key default gen_random_uuid(),
26
+ notebook_id uuid not null references notebooks(id) on delete cascade,
27
+ type text not null,
28
+ storage_path text not null,
29
+ created_at timestamptz default now()
30
+ );
31
+ create index if not exists idx_artifacts_notebook_id on artifacts(notebook_id);
32
+
33
+ -- pgvector extension for embeddings
34
+ create extension if not exists vector;
35
+
36
+ -- chunks with embeddings (for RAG)
37
+ create table if not exists chunks (
38
+ id uuid primary key default gen_random_uuid(),
39
+ notebook_id uuid not null references notebooks(id) on delete cascade,
40
+ source_id text,
41
+ content text not null,
42
+ embedding vector(1536),
43
+ metadata jsonb,
44
+ created_at timestamptz default now()
45
+ );
46
+ create index if not exists idx_chunks_notebook_id on chunks(notebook_id);
47
+ -- Vector index (run after you have data; ivfflat requires rows):
48
+ -- create index idx_chunks_embedding on chunks using ivfflat (embedding vector_cosine_ops) with (lists = 100);
requirements.txt CHANGED
@@ -1,2 +1,5 @@
1
- gradio==4.44.0
2
  huggingface_hub==0.24.7
 
 
 
 
1
+ gradio[oauth]==4.44.0
2
  huggingface_hub==0.24.7
3
+ supabase>=2.0.0
4
+ python-dotenv>=1.0.0
5
+ realtime==2.3.0