Spaces:
Running
Running
| # HuggingFace Publishing Guide | |
| OpenMark publishes two things on HuggingFace: | |
| 1. **Space** β live Gradio demo at `OthmanAdi/OpenMark` | |
| 2. **Dataset** β the categorized bookmarks at `OthmanAdi/openmark-bookmarks` | |
| --- | |
| ## Prerequisites | |
| You need a HuggingFace account and a **write-access token**: | |
| 1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) | |
| 2. Create a new token β **Write** access | |
| 3. Add to your `.env`: | |
| ``` | |
| HF_TOKEN=hf_your_token_here | |
| ``` | |
| --- | |
| ## 1. HuggingFace Space (Gradio Demo) | |
| The Space hosts the Gradio UI publicly (or privately until you're ready). | |
| **Create the Space:** | |
| ```bash | |
| pip install huggingface_hub | |
| python -c " | |
| from huggingface_hub import HfApi | |
| import os | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| api = HfApi(token=os.getenv('HF_TOKEN')) | |
| api.create_repo( | |
| repo_id='OthmanAdi/OpenMark', | |
| repo_type='space', | |
| space_sdk='gradio', | |
| private=True, | |
| ) | |
| print('Space created: https://huggingface.co/spaces/OthmanAdi/OpenMark') | |
| " | |
| ``` | |
| **Push the code to the Space:** | |
| ```bash | |
| python -c " | |
| from huggingface_hub import HfApi | |
| import os | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| api = HfApi(token=os.getenv('HF_TOKEN')) | |
| api.upload_folder( | |
| folder_path='.', | |
| repo_id='OthmanAdi/OpenMark', | |
| repo_type='space', | |
| ignore_patterns=['.env', 'data/chroma_db/*', '__pycache__/*', '.git/*'], | |
| ) | |
| " | |
| ``` | |
| > **Note:** The Space version requires your ChromaDB and Neo4j data to be pre-loaded. For a public demo, you would host a sample dataset. For private use, the full local setup is better. | |
| --- | |
| ## 2. HuggingFace Dataset | |
| The dataset card publishes your 8,000+ categorized bookmarks as a reusable dataset for RAG experiments. | |
| **What's in the dataset:** | |
| - URL, title, category (19 categories), tags, score (1-10), source | |
| - Sources: Raindrop, Edge browser, LinkedIn, YouTube, daily.dev | |
| - ~8,007 unique items after deduplication | |
| **Create the dataset repo:** | |
| ```bash | |
| python -c " | |
| from huggingface_hub import HfApi | |
| import os, json | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| api = HfApi(token=os.getenv('HF_TOKEN')) | |
| # Create private dataset repo | |
| api.create_repo( | |
| repo_id='OthmanAdi/openmark-bookmarks', | |
| repo_type='dataset', | |
| private=True, | |
| ) | |
| # Upload dataset card | |
| api.upload_file( | |
| path_or_fileobj='docs/dataset_card.md', | |
| path_in_repo='README.md', | |
| repo_id='OthmanAdi/openmark-bookmarks', | |
| repo_type='dataset', | |
| ) | |
| # Upload the data (RAINDROP_MISSION_DIR/CATEGORIZED.json) | |
| api.upload_file( | |
| path_or_fileobj=os.path.join(os.getenv('RAINDROP_MISSION_DIR'), 'CATEGORIZED.json'), | |
| path_in_repo='data/bookmarks.json', | |
| repo_id='OthmanAdi/openmark-bookmarks', | |
| repo_type='dataset', | |
| ) | |
| print('Dataset created: https://huggingface.co/datasets/OthmanAdi/openmark-bookmarks') | |
| " | |
| ``` | |
| --- | |
| ## Making Public | |
| When you're ready to go public, flip visibility: | |
| ```bash | |
| python -c " | |
| from huggingface_hub import HfApi | |
| import os | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| api = HfApi(token=os.getenv('HF_TOKEN')) | |
| # Make Space public | |
| api.update_repo_visibility('OthmanAdi/OpenMark', private=False, repo_type='space') | |
| # Make Dataset public | |
| api.update_repo_visibility('OthmanAdi/openmark-bookmarks', private=False, repo_type='dataset') | |
| print('Both are now public.') | |
| " | |
| ``` | |