OpenMark / docs /huggingface.md
codingwithadi's picture
Upload folder using huggingface_hub
81598c5 verified
# HuggingFace Publishing Guide
OpenMark publishes two things on HuggingFace:
1. **Space** β€” live Gradio demo at `OthmanAdi/OpenMark`
2. **Dataset** β€” the categorized bookmarks at `OthmanAdi/openmark-bookmarks`
---
## Prerequisites
You need a HuggingFace account and a **write-access token**:
1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Create a new token β†’ **Write** access
3. Add to your `.env`:
```
HF_TOKEN=hf_your_token_here
```
---
## 1. HuggingFace Space (Gradio Demo)
The Space hosts the Gradio UI publicly (or privately until you're ready).
**Create the Space:**
```bash
pip install huggingface_hub
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
api.create_repo(
repo_id='OthmanAdi/OpenMark',
repo_type='space',
space_sdk='gradio',
private=True,
)
print('Space created: https://huggingface.co/spaces/OthmanAdi/OpenMark')
"
```
**Push the code to the Space:**
```bash
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
api.upload_folder(
folder_path='.',
repo_id='OthmanAdi/OpenMark',
repo_type='space',
ignore_patterns=['.env', 'data/chroma_db/*', '__pycache__/*', '.git/*'],
)
"
```
> **Note:** The Space version requires your ChromaDB and Neo4j data to be pre-loaded. For a public demo, you would host a sample dataset. For private use, the full local setup is better.
---
## 2. HuggingFace Dataset
The dataset card publishes your 8,000+ categorized bookmarks as a reusable dataset for RAG experiments.
**What's in the dataset:**
- URL, title, category (19 categories), tags, score (1-10), source
- Sources: Raindrop, Edge browser, LinkedIn, YouTube, daily.dev
- ~8,007 unique items after deduplication
**Create the dataset repo:**
```bash
python -c "
from huggingface_hub import HfApi
import os, json
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
# Create private dataset repo
api.create_repo(
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
private=True,
)
# Upload dataset card
api.upload_file(
path_or_fileobj='docs/dataset_card.md',
path_in_repo='README.md',
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
)
# Upload the data (RAINDROP_MISSION_DIR/CATEGORIZED.json)
api.upload_file(
path_or_fileobj=os.path.join(os.getenv('RAINDROP_MISSION_DIR'), 'CATEGORIZED.json'),
path_in_repo='data/bookmarks.json',
repo_id='OthmanAdi/openmark-bookmarks',
repo_type='dataset',
)
print('Dataset created: https://huggingface.co/datasets/OthmanAdi/openmark-bookmarks')
"
```
---
## Making Public
When you're ready to go public, flip visibility:
```bash
python -c "
from huggingface_hub import HfApi
import os
from dotenv import load_dotenv
load_dotenv()
api = HfApi(token=os.getenv('HF_TOKEN'))
# Make Space public
api.update_repo_visibility('OthmanAdi/OpenMark', private=False, repo_type='space')
# Make Dataset public
api.update_repo_visibility('OthmanAdi/openmark-bookmarks', private=False, repo_type='dataset')
print('Both are now public.')
"
```