kszucs HF Staff commited on
Commit
35d80ab
Β·
verified Β·
1 Parent(s): 32b2b2b

Sync catalog metadata

Browse files
Files changed (3) hide show
  1. Dockerfile +9 -0
  2. README.md +81 -6
  3. faceberg.yml +14 -0
Dockerfile ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.13-slim
2
+
3
+ COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
4
+
5
+ RUN uv pip install --system --no-cache faceberg
6
+
7
+ EXPOSE 7860
8
+
9
+ CMD sh -c 'faceberg hf://spaces/${SPACE_ID} serve --host 0.0.0.0 --port 7860'
README.md CHANGED
@@ -1,10 +1,85 @@
1
  ---
2
- title: Catalog
3
- emoji: πŸ“š
4
- colorFrom: gray
5
- colorTo: blue
6
  sdk: docker
7
- pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Faceberg REST Catalog
3
+ emoji: πŸ—ƒοΈ
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
8
  ---
9
 
10
+ # Faceberg REST Catalog
11
+
12
+ An Apache Iceberg REST catalog with interactive web interface for browsing and querying Hugging Face datasets using [Faceberg](https://github.com/kszucs/faceberg).
13
+
14
+ ## Features
15
+
16
+ - **πŸ“š Interactive Browser**: Explore namespaces, tables, and schemas through an intuitive web interface
17
+ - **πŸ” SQL Query Interface**: Run queries directly in your browser using DuckDB-WASM with full Iceberg support
18
+ - **🌐 REST API**: Full Iceberg REST catalog specification at `/v1/*` endpoints
19
+ - **πŸš€ Zero Setup**: No installation required - just visit the Space URL
20
+
21
+ ## Usage
22
+
23
+ ### Web Interface
24
+
25
+ Visit the Space URL (e.g., `https://your-username-your-space.hf.space`) to:
26
+
27
+ 1. **Browse Catalog**: View all namespaces and tables with detailed metadata
28
+ - Expand namespaces to see tables
29
+ - View table schemas with column names, types, and constraints
30
+ - See row counts, file counts, and HuggingFace dataset links
31
+
32
+ 2. **Query with DuckDB**: Run interactive SQL queries in your browser
33
+ - Click "Query with DuckDB" tab
34
+ - Initialize DuckDB-WASM (loads ~10MB with Iceberg extension)
35
+ - Write SQL queries using `iceberg_scan('metadata_location')`
36
+ - View results in a formatted table
37
+
38
+ **Example Queries:**
39
+ ```sql
40
+ -- Scan full table (limited)
41
+ SELECT * FROM iceberg_scan('metadata_location') LIMIT 100;
42
+
43
+ -- Filter by partition
44
+ SELECT * FROM iceberg_scan('metadata_location')
45
+ WHERE split = 'train' LIMIT 10;
46
+
47
+ -- Aggregate statistics
48
+ SELECT split, COUNT(*) as count
49
+ FROM iceberg_scan('metadata_location')
50
+ GROUP BY split;
51
+ ```
52
+
53
+ ### REST API
54
+
55
+ Connect with any Iceberg client:
56
+
57
+ ```python
58
+ from pyiceberg.catalog import load_catalog
59
+
60
+ catalog = load_catalog(
61
+ "rest",
62
+ uri="https://your-username-your-space.hf.space",
63
+ )
64
+
65
+ # List namespaces
66
+ namespaces = catalog.list_namespaces()
67
+
68
+ # Load table
69
+ table = catalog.load_table("namespace.table_name")
70
+
71
+ # Query with DuckDB
72
+ import duckdb
73
+ duckdb.sql("SELECT * FROM iceberg_scan('table') LIMIT 10").show()
74
+ ```
75
+
76
+ ## About
77
+
78
+ Faceberg enables storing Apache Iceberg table metadata directly on Hugging Face Hub as datasets, making your data lake tables easily shareable and version-controlled.
79
+
80
+ **DuckDB-WASM Integration**: Powered by DuckDB-WASM with native Iceberg and httpfs extensions, enabling full metadata-aware querying directly in your browser without server load.
81
+
82
+ Learn more:
83
+ - [Faceberg on GitHub](https://github.com/kszucs/faceberg)
84
+ - [Apache Iceberg](https://iceberg.apache.org/)
85
+ - [DuckDB](https://duckdb.org/)
faceberg.yml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ deepmind:
2
+ narrativeqa:
3
+ type: dataset
4
+ repo: deepmind/narrativeqa
5
+ config: default
6
+ nvidia:
7
+ personas_brazil:
8
+ type: dataset
9
+ repo: nvidia/Nemotron-Personas-Brazil
10
+ config: default
11
+ open_math_reasoning:
12
+ type: dataset
13
+ repo: nvidia/OpenMathReasoning
14
+ config: default