Spaces:
Running
A newer version of the Gradio SDK is available:
6.9.0
Data Collection Guide
Everything you need to collect your saved content from each source before running the ingest pipeline.
1. Raindrop.io
OpenMark pulls all your Raindrop collections automatically via the official REST API. You just need a token.
Steps:
- Go to app.raindrop.io/settings/integrations
- Under "For Developers" β click Create new app
- Copy the Test token (permanent, no expiry)
- Add to
.env:RAINDROP_TOKEN=your-token-here
The pipeline fetches every collection, every sub-collection, and every unsorted raindrop automatically. No manual export needed.
2. Browser Bookmarks (Edge / Chrome / Firefox)
Export your bookmarks as an HTML file in the Netscape bookmark format (all browsers support this).
Edge:
Settings β Favourites β Β·Β·Β· (three dots) β Export favourites β save as favorites.html
Chrome:
Bookmarks Manager (Ctrl+Shift+O) β Β·Β·Β· β Export bookmarks β save as bookmarks.html
Firefox:
Bookmarks β Manage Bookmarks β Import and Backup β Export Bookmarks to HTML
After exporting:
- Place the HTML file(s) in your
raindrop-missionfolder (or whereverRAINDROP_MISSION_DIRpoints) - The pipeline (
merge.py) looks forfavorites_*.htmlandbookmarks_*.htmlpatterns - It parses the Netscape format and extracts URLs + titles + folder structure
Tip: Export fresh before every ingest to capture new bookmarks.
3. LinkedIn Saved Posts
LinkedIn has no public API for saved posts. OpenMark uses LinkedIn's internal Voyager GraphQL API β the same API the LinkedIn web app uses internally.
This is the exact endpoint used:
https://www.linkedin.com/voyager/api/graphql
?variables=(start:0,count:10,paginationToken:null,
query:(flagshipSearchIntent:SEARCH_MY_ITEMS_SAVED_POSTS))
&queryId=voyagerSearchDashClusters.05111e1b90ee7fea15bebe9f9410ced9
How to get your session cookie:
- Log into LinkedIn in your browser
- Open DevTools (
F12) β Application tab β Cookies βhttps://www.linkedin.com - Find the cookie named
li_atβ copy its value - Also find
JSESSIONIDβ copy its value (used as CSRF token, format:ajax:XXXXXXXXXXXXXXXXXX)
Run the fetch script:
python raindrop-mission/linkedin_fetch.py
Paste your li_at value when prompted.
Output: raindrop-mission/linkedin_saved.json β 1,260 saved posts with author, content, and URL.
Pagination: LinkedIn returns 10 posts per page. The script detects end of results when no nextPageToken is returned. With 1,260 posts that's ~133 pages.
Important: The
queryId(voyagerSearchDashClusters.05111e1b90ee7fea15bebe9f9410ced9) is hardcoded in LinkedIn's JavaScript bundle and can change with LinkedIn deployments. If the script returns 0 results, intercept a fresh request from your browser's Network tab β filter forvoyagerSearchDashClusters, copy the newqueryId.
Personal use only. This method is not officially supported by LinkedIn. Do not use for scraping at scale.
4. YouTube
Uses the official YouTube Data API v3 via OAuth 2.0. Collects liked videos, watch later playlist, and any saved playlists.
One-time setup:
- Go to Google Cloud Console
- Create a new project (e.g. "OpenMark")
- Enable YouTube Data API v3 (APIs & Services β Enable APIs)
- Create credentials: OAuth 2.0 Client ID β Desktop App
- Download the JSON file β rename it to
client_secret.jsonand place it inraindrop-mission/ - Go to OAuth consent screen β Test users β add your Google account email
Run the fetch script:
python raindrop-mission/youtube_fetch.py
A browser window opens for Google sign-in. After auth, a token is cached locally β you won't need to auth again.
Output: raindrop-mission/youtube_MASTER.json with:
liked_videosβ videos you've liked (up to ~3,200 via API limit)watch_laterβ requires Google Takeout (see below)playlistsβ saved playlists
Watch Later via Google Takeout: YouTube's API does not expose Watch Later directly. Export it via takeout.google.com:
- Select only YouTube β Playlists β Download
- Extract the CSV file named
Watch later-videos.csv - Place it in
raindrop-mission/ - The
youtube_organize.pyscript fetches video titles via API and includes them inyoutube_MASTER.json
5. daily.dev Bookmarks
daily.dev does not provide a public API. Use the included browser console script to extract bookmarks directly from the page.
Steps:
- Go to app.daily.dev β Bookmarks
- Scroll all the way down to load all bookmarks
- Open DevTools β Console tab
- Paste and run
raindrop-mission/dailydev_console_script.js - The script copies a JSON array to your clipboard
- Paste into a file named
dailydev_bookmarks.jsoninraindrop-mission/
The script filters for
/posts/URLs only β it ignores profile links, squad links, and other noise.
Summary
| Source | Method | Output file |
|---|---|---|
| Raindrop | REST API (auto) | pulled live |
| Edge/Chrome bookmarks | HTML export | favorites.html / bookmarks.html |
| LinkedIn saved posts | Voyager GraphQL + session cookie | linkedin_saved.json |
| YouTube liked/playlists | YouTube Data API v3 + OAuth | youtube_MASTER.json |
| YouTube watch later | Google Takeout CSV | included in youtube_MASTER.json |
| daily.dev bookmarks | Browser console script | dailydev_bookmarks.json |
Once all files are in place, run:
python scripts/ingest.py