Spaces:
Running
Running
File size: 3,388 Bytes
81598c5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | # HuggingFace Publishing Guide
OpenMark publishes two things on HuggingFace:
1. **Space** — live Gradio demo at `OthmanAdi/OpenMark`
2. **Dataset** — the categorized bookmarks at `OthmanAdi/openmark-bookmarks`
---
## Prerequisites
You need a HuggingFace account and a **write-access token**:
1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Create a new token → **Write** access
3. Add to your `.env`:
```
HF_TOKEN=hf_your_token_here
```
---
## 1. HuggingFace Space (Gradio Demo)
The Space hosts the Gradio UI publicly (or privately until you're ready).
**Create the Space:**
```bash
pip install huggingface_hub
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
api.create_repo(
repo_id='OthmanAdi/OpenMark',
repo_type='space',
space_sdk='gradio',
private=True,
)
print('Space created: https://huggingface.co/spaces/OthmanAdi/OpenMark')
"
```
**Push the code to the Space:**
```bash
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
api.upload_folder(
folder_path='.',
repo_id='OthmanAdi/OpenMark',
repo_type='space',
ignore_patterns=['.env', 'data/chroma_db/*', '__pycache__/*', '.git/*'],
)
"
```
> **Note:** The Space version requires your ChromaDB and Neo4j data to be pre-loaded. For a public demo, you would host a sample dataset. For private use, the full local setup is better.
---
## 2. HuggingFace Dataset
The dataset card publishes your 8,000+ categorized bookmarks as a reusable dataset for RAG experiments.
**What's in the dataset:**
- URL, title, category (19 categories), tags, score (1-10), source
- Sources: Raindrop, Edge browser, LinkedIn, YouTube, daily.dev
- ~8,007 unique items after deduplication
**Create the dataset repo:**
```bash
python -c "
from huggingface_hub import HfApi
import os, json
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
# Create private dataset repo
api.create_repo(
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
private=True,
)
# Upload dataset card
api.upload_file(
path_or_fileobj='docs/dataset_card.md',
path_in_repo='README.md',
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
)
# Upload the data (RAINDROP_MISSION_DIR/CATEGORIZED.json)
api.upload_file(
path_or_fileobj=os.path.join(os.getenv('RAINDROP_MISSION_DIR'), 'CATEGORIZED.json'),
path_in_repo='data/bookmarks.json',
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
)
print('Dataset created: https://huggingface.co/datasets/OthmanAdi/openmark-bookmarks')
"
```
---
## Making Public
When you're ready to go public, flip visibility:
```bash
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
# Make Space public
api.update_repo_visibility('OthmanAdi/OpenMark', private=False, repo_type='space')
# Make Dataset public
api.update_repo_visibility('OthmanAdi/openmark-bookmarks', private=False, repo_type='dataset')
print('Both are now public.')
"
```
|