Spaces:
Sleeping
Sleeping
| title: WayyDB API | |
| emoji: ⚡ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| <p align="center"> | |
| <h1 align="center">WayyDB</h1> | |
| <p align="center"> | |
| <strong>High-performance columnar time-series database for quantitative finance</strong> | |
| </p> | |
| <p align="center"> | |
| kdb+ functionality • Pythonic API • Zero-copy NumPy • SIMD-accelerated | |
| </p> | |
| </p> | |
| --- | |
| WayyDB is a C++ time-series database with Python bindings, designed for quantitative research and trading systems. It provides **kdb+-like temporal join operations** with a modern, accessible API—no q language required. | |
| ## Why WayyDB? | |
| | Challenge | WayyDB Solution | | |
| |-----------|-----------------| | |
| | kdb+ costs $100K+/year | Open source, free forever | | |
| | q language learning curve | Pythonic API you already know | | |
| | Pandas/Polars lack temporal joins | Native `aj()` and `wj()` primitives | | |
| | Memory copies kill performance | Zero-copy NumPy via mmap | | |
| | Slow aggregations | AVX2/AVX-512 SIMD acceleration | | |
| ## Features | |
| - **As-of Join (aj)** — For each trade, find the most recent quote. O(n log m) via binary search on sorted indices | |
| - **Window Join (wj)** — Get all quotes within a time window around each trade | |
| - **Zero-copy NumPy** — Columns are memory-mapped; `to_numpy()` returns views, not copies | |
| - **SIMD Aggregations** — Sum, avg, min, max accelerated with AVX2 intrinsics | |
| - **Window Functions** — Moving average, EMA, rolling std with O(n) complexity | |
| - **Persistent Storage** — Tables saved as memory-mapped files for instant loading | |
| ## Installation | |
| ```bash | |
| pip install wayy-db | |
| ``` | |
| Or build from source: | |
| ```bash | |
| git clone https://github.com/wayy-research/wayydb.git | |
| cd wayydb | |
| pip install -e . | |
| ``` | |
| ## Quick Start | |
| ### Create Tables from NumPy Arrays | |
| ```python | |
| import wayy_db as wdb | |
| import numpy as np | |
| # Create trades table | |
| trades = wdb.from_dict({ | |
| "timestamp": np.array([1000, 2000, 3000, 4000, 5000], dtype=np.int64), | |
| "symbol": np.array([0, 1, 0, 1, 0], dtype=np.uint32), # AAPL=0, MSFT=1 | |
| "price": np.array([150.25, 380.50, 151.00, 381.25, 152.00]), | |
| "size": np.array([100, 200, 150, 250, 100], dtype=np.int64), | |
| }, name="trades", sorted_by="timestamp") | |
| # Create quotes table | |
| quotes = wdb.from_dict({ | |
| "timestamp": np.array([500, 900, 1500, 2500, 3500], dtype=np.int64), | |
| "symbol": np.array([0, 1, 0, 1, 0], dtype=np.uint32), | |
| "bid": np.array([149.50, 379.50, 150.50, 380.50, 151.50]), | |
| "ask": np.array([150.00, 380.00, 151.00, 381.00, 152.00]), | |
| }, name="quotes", sorted_by="timestamp") | |
| ``` | |
| ### As-of Join: Match Trades to Quotes | |
| ```python | |
| # For each trade, get the most recent quote for that symbol | |
| result = wdb.ops.aj(trades, quotes, on=["symbol"], as_of="timestamp") | |
| # Result contains trade columns + quote columns (bid, ask) | |
| print(result["bid"].to_numpy()) # [149.5, 379.5, 150.5, 380.5, 151.5] | |
| ``` | |
| ### Aggregations and Window Functions | |
| ```python | |
| # SIMD-accelerated aggregations | |
| total_volume = wdb.ops.sum(trades["size"]) | |
| avg_price = wdb.ops.avg(trades["price"]) | |
| price_std = wdb.ops.std(trades["price"]) | |
| # Window functions | |
| mavg_20 = wdb.ops.mavg(trades["price"], window=20) | |
| ema = wdb.ops.ema(trades["price"], alpha=0.1) | |
| rolling_std = wdb.ops.mstd(trades["price"], window=10) | |
| # Returns and changes | |
| returns = wdb.ops.pct_change(trades["price"]) | |
| price_diff = wdb.ops.diff(trades["price"]) | |
| ``` | |
| ### Persistent Database | |
| ```python | |
| # Create persistent database | |
| db = wdb.Database("/data/markets") | |
| # Add table (automatically saved) | |
| db.add_table(trades) | |
| # Later: reload with zero-copy mmap | |
| db2 = wdb.Database("/data/markets") | |
| trades = db2["trades"] # Instant load via memory mapping | |
| # Access data without copying | |
| prices = trades["price"].to_numpy() # Zero-copy view into mmap'd file | |
| ``` | |
| ### Pandas/Polars Interop | |
| ```python | |
| import pandas as pd | |
| import polars as pl | |
| # From pandas | |
| df = pd.DataFrame({"timestamp": [...], "price": [...]}) | |
| table = wdb.from_pandas(df, name="from_pandas", sorted_by="timestamp") | |
| # From polars | |
| df = pl.DataFrame({"timestamp": [...], "price": [...]}) | |
| table = wdb.from_polars(df, name="from_polars", sorted_by="timestamp") | |
| # To dict (for conversion back) | |
| data = table.to_dict() # {"timestamp": np.array, "price": np.array, ...} | |
| ``` | |
| ## API Reference | |
| ### Core Classes | |
| | Class | Description | | |
| |-------|-------------| | |
| | `Database(path="")` | Container for tables. Empty path = in-memory | | |
| | `Table(name="")` | Columnar table with optional sorted index | | |
| | `Column` | Typed column with zero-copy NumPy access | | |
| ### Table Methods | |
| ```python | |
| table.num_rows # Number of rows | |
| table.num_columns # Number of columns | |
| table.column_names() # List of column names | |
| table.sorted_by # Column used for temporal ordering (or None) | |
| table["col"] # Get column by name | |
| table.to_dict() # Export as {name: np.array} dict | |
| table.save(path) # Save to directory | |
| Table.load(path) # Load from directory (copies data) | |
| Table.mmap(path) # Memory-map from directory (zero-copy) | |
| ``` | |
| ### Operations (wayy_db.ops) | |
| #### Aggregations | |
| | Function | Description | | |
| |----------|-------------| | |
| | `sum(col)` | Sum of values (SIMD) | | |
| | `avg(col)` | Mean of values | | |
| | `min(col)` | Minimum value | | |
| | `max(col)` | Maximum value | | |
| | `std(col)` | Standard deviation | | |
| #### Temporal Joins | |
| | Function | Description | | |
| |----------|-------------| | |
| | `aj(left, right, on, as_of)` | As-of join: most recent right row for each left row | | |
| | `wj(left, right, on, as_of, before, after)` | Window join: all right rows within time window | | |
| #### Window Functions | |
| | Function | Description | | |
| |----------|-------------| | |
| | `mavg(col, window)` | Moving average | | |
| | `msum(col, window)` | Moving sum | | |
| | `mstd(col, window)` | Moving standard deviation | | |
| | `mmin(col, window)` | Moving minimum (O(n) via monotonic deque) | | |
| | `mmax(col, window)` | Moving maximum (O(n) via monotonic deque) | | |
| | `ema(col, alpha)` | Exponential moving average | | |
| | `diff(col, periods=1)` | Difference from n periods ago | | |
| | `pct_change(col, periods=1)` | Percent change from n periods ago | | |
| | `shift(col, n)` | Shift values by n positions | | |
| ## Type System | |
| | Type | Python | C++ | Size | Use Case | | |
| |------|--------|-----|------|----------| | |
| | Int64 | `np.int64` | `int64_t` | 8B | Quantities, IDs | | |
| | Float64 | `np.float64` | `double` | 8B | Prices, returns | | |
| | Timestamp | `np.int64` | `int64_t` | 8B | Nanoseconds since epoch | | |
| | Symbol | `np.uint32` | `uint32_t` | 4B | Interned strings (tickers) | | |
| | Bool | `np.uint8` | `uint8_t` | 1B | Flags | | |
| ## Architecture | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Python Interface │ | |
| │ wayy_db.Database | Table | Column | ops │ | |
| ├─────────────────────────────────────────────────────────────┤ | |
| │ pybind11 Bindings │ | |
| │ Zero-copy NumPy arrays via buffer protocol │ | |
| ├─────────────────────────────────────────────────────────────┤ | |
| │ C++ Core Engine │ | |
| │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ | |
| │ │ Storage │ │ Compute │ │ Joins │ │ | |
| │ │ • mmap I/O │ │ • AVX2 agg │ │ • O(n log m) aj │ │ | |
| │ │ • columnar │ │ • windows │ │ • O(n) wj │ │ | |
| │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ | |
| ├─────────────────────────────────────────────────────────────┤ | |
| │ Memory-Mapped File Storage │ | |
| │ Zero-copy | Lazy loading | Shared │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| ## Performance | |
| ### Complexity | |
| | Operation | Complexity | Notes | | |
| |-----------|------------|-------| | |
| | As-of join | O(n log(m/k)) | n=left rows, m=right rows, k=unique keys | | |
| | Window join | O(n log m + matches) | Plus output size | | |
| | Aggregations | O(n) | SIMD 4x speedup for sum | | |
| | Window functions | O(n) | Single pass with O(1) update | | |
| | Point lookup | O(log n) | Binary search on sorted index | | |
| | Load from disk | O(1) | Memory mapping, no deserialization | | |
| ### Design Targets | |
| | Metric | Target | | |
| |--------|--------| | |
| | As-of join (1M x 1M rows) | < 150ms | | |
| | Simple aggregation (1B rows) | < 80ms | | |
| | Binary size | < 5 MB | | |
| | Memory overhead | < 1% beyond data | | |
| ## Building from Source | |
| ### Requirements | |
| - CMake >= 3.20 | |
| - C++20 compiler (GCC 11+, Clang 14+, MSVC 2022+) | |
| - Python >= 3.9 | |
| ### Build | |
| ```bash | |
| git clone https://github.com/wayy-research/wayydb.git | |
| cd wayydb | |
| # Option 1: pip install (recommended) | |
| pip install -e . | |
| # Option 2: CMake directly | |
| mkdir build && cd build | |
| cmake .. -DWAYY_BUILD_PYTHON=ON -DWAYY_BUILD_TESTS=ON | |
| make -j$(nproc) | |
| ``` | |
| ### Run Tests | |
| ```bash | |
| # C++ tests (31 tests) | |
| cd build && ctest --output-on-failure | |
| # Python tests (17 tests) | |
| PYTHONPATH=python pytest tests/python -v | |
| ``` | |
| ## Comparison with Alternatives | |
| | Feature | WayyDB | kdb+ | DuckDB | Polars | | |
| |---------|--------|------|--------|--------| | |
| | As-of join | Native | Native | Extension | None | | |
| | Window join | Native | Native | None | None | | |
| | Zero-copy Python | Yes | No | No | Limited | | |
| | Sorted index optimization | Yes | Yes | No | No | | |
| | License | MIT | Commercial | MIT | MIT | | |
| | Learning curve | Low | High (q) | Low | Low | | |
| | Persistence | mmap | Native | Native | None | | |
| ## Roadmap | |
| - [ ] String column type with dictionary encoding | |
| - [ ] LZ4 compression for columns | |
| - [ ] Parallel aggregations | |
| - [ ] More join types (inner, left, full) | |
| - [ ] Query optimizer | |
| - [ ] Streaming ingestion API | |
| ## License | |
| MIT License - see [LICENSE](LICENSE) for details. | |
| ## Contributing | |
| Contributions welcome! Please read our contributing guidelines and submit PRs to the `develop` branch. | |
| --- | |
| <p align="center"> | |
| Built with C++20 and Python by <a href="https://wayy.io">Wayy Research</a> | |
| </p> | |