Spaces:
Sleeping
Sleeping
metadata
title: WayyDB API
emoji: ⚡
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
WayyDB
High-performance columnar time-series database for quantitative finance
kdb+ functionality • Pythonic API • Zero-copy NumPy • SIMD-accelerated
WayyDB is a C++ time-series database with Python bindings, designed for quantitative research and trading systems. It provides kdb+-like temporal join operations with a modern, accessible API—no q language required.
Why WayyDB?
| Challenge | WayyDB Solution |
|---|---|
| kdb+ costs $100K+/year | Open source, free forever |
| q language learning curve | Pythonic API you already know |
| Pandas/Polars lack temporal joins | Native aj() and wj() primitives |
| Memory copies kill performance | Zero-copy NumPy via mmap |
| Slow aggregations | AVX2/AVX-512 SIMD acceleration |
Features
- As-of Join (aj) — For each trade, find the most recent quote. O(n log m) via binary search on sorted indices
- Window Join (wj) — Get all quotes within a time window around each trade
- Zero-copy NumPy — Columns are memory-mapped;
to_numpy()returns views, not copies - SIMD Aggregations — Sum, avg, min, max accelerated with AVX2 intrinsics
- Window Functions — Moving average, EMA, rolling std with O(n) complexity
- Persistent Storage — Tables saved as memory-mapped files for instant loading
Installation
pip install wayy-db
Or build from source:
git clone https://github.com/wayy-research/wayydb.git
cd wayydb
pip install -e .
Quick Start
Create Tables from NumPy Arrays
import wayy_db as wdb
import numpy as np
# Create trades table
trades = wdb.from_dict({
"timestamp": np.array([1000, 2000, 3000, 4000, 5000], dtype=np.int64),
"symbol": np.array([0, 1, 0, 1, 0], dtype=np.uint32), # AAPL=0, MSFT=1
"price": np.array([150.25, 380.50, 151.00, 381.25, 152.00]),
"size": np.array([100, 200, 150, 250, 100], dtype=np.int64),
}, name="trades", sorted_by="timestamp")
# Create quotes table
quotes = wdb.from_dict({
"timestamp": np.array([500, 900, 1500, 2500, 3500], dtype=np.int64),
"symbol": np.array([0, 1, 0, 1, 0], dtype=np.uint32),
"bid": np.array([149.50, 379.50, 150.50, 380.50, 151.50]),
"ask": np.array([150.00, 380.00, 151.00, 381.00, 152.00]),
}, name="quotes", sorted_by="timestamp")
As-of Join: Match Trades to Quotes
# For each trade, get the most recent quote for that symbol
result = wdb.ops.aj(trades, quotes, on=["symbol"], as_of="timestamp")
# Result contains trade columns + quote columns (bid, ask)
print(result["bid"].to_numpy()) # [149.5, 379.5, 150.5, 380.5, 151.5]
Aggregations and Window Functions
# SIMD-accelerated aggregations
total_volume = wdb.ops.sum(trades["size"])
avg_price = wdb.ops.avg(trades["price"])
price_std = wdb.ops.std(trades["price"])
# Window functions
mavg_20 = wdb.ops.mavg(trades["price"], window=20)
ema = wdb.ops.ema(trades["price"], alpha=0.1)
rolling_std = wdb.ops.mstd(trades["price"], window=10)
# Returns and changes
returns = wdb.ops.pct_change(trades["price"])
price_diff = wdb.ops.diff(trades["price"])
Persistent Database
# Create persistent database
db = wdb.Database("/data/markets")
# Add table (automatically saved)
db.add_table(trades)
# Later: reload with zero-copy mmap
db2 = wdb.Database("/data/markets")
trades = db2["trades"] # Instant load via memory mapping
# Access data without copying
prices = trades["price"].to_numpy() # Zero-copy view into mmap'd file
Pandas/Polars Interop
import pandas as pd
import polars as pl
# From pandas
df = pd.DataFrame({"timestamp": [...], "price": [...]})
table = wdb.from_pandas(df, name="from_pandas", sorted_by="timestamp")
# From polars
df = pl.DataFrame({"timestamp": [...], "price": [...]})
table = wdb.from_polars(df, name="from_polars", sorted_by="timestamp")
# To dict (for conversion back)
data = table.to_dict() # {"timestamp": np.array, "price": np.array, ...}
API Reference
Core Classes
| Class | Description |
|---|---|
Database(path="") |
Container for tables. Empty path = in-memory |
Table(name="") |
Columnar table with optional sorted index |
Column |
Typed column with zero-copy NumPy access |
Table Methods
table.num_rows # Number of rows
table.num_columns # Number of columns
table.column_names() # List of column names
table.sorted_by # Column used for temporal ordering (or None)
table["col"] # Get column by name
table.to_dict() # Export as {name: np.array} dict
table.save(path) # Save to directory
Table.load(path) # Load from directory (copies data)
Table.mmap(path) # Memory-map from directory (zero-copy)
Operations (wayy_db.ops)
Aggregations
| Function | Description |
|---|---|
sum(col) |
Sum of values (SIMD) |
avg(col) |
Mean of values |
min(col) |
Minimum value |
max(col) |
Maximum value |
std(col) |
Standard deviation |
Temporal Joins
| Function | Description |
|---|---|
aj(left, right, on, as_of) |
As-of join: most recent right row for each left row |
wj(left, right, on, as_of, before, after) |
Window join: all right rows within time window |
Window Functions
| Function | Description |
|---|---|
mavg(col, window) |
Moving average |
msum(col, window) |
Moving sum |
mstd(col, window) |
Moving standard deviation |
mmin(col, window) |
Moving minimum (O(n) via monotonic deque) |
mmax(col, window) |
Moving maximum (O(n) via monotonic deque) |
ema(col, alpha) |
Exponential moving average |
diff(col, periods=1) |
Difference from n periods ago |
pct_change(col, periods=1) |
Percent change from n periods ago |
shift(col, n) |
Shift values by n positions |
Type System
| Type | Python | C++ | Size | Use Case |
|---|---|---|---|---|
| Int64 | np.int64 |
int64_t |
8B | Quantities, IDs |
| Float64 | np.float64 |
double |
8B | Prices, returns |
| Timestamp | np.int64 |
int64_t |
8B | Nanoseconds since epoch |
| Symbol | np.uint32 |
uint32_t |
4B | Interned strings (tickers) |
| Bool | np.uint8 |
uint8_t |
1B | Flags |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Python Interface │
│ wayy_db.Database | Table | Column | ops │
├─────────────────────────────────────────────────────────────┤
│ pybind11 Bindings │
│ Zero-copy NumPy arrays via buffer protocol │
├─────────────────────────────────────────────────────────────┤
│ C++ Core Engine │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Storage │ │ Compute │ │ Joins │ │
│ │ • mmap I/O │ │ • AVX2 agg │ │ • O(n log m) aj │ │
│ │ • columnar │ │ • windows │ │ • O(n) wj │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Memory-Mapped File Storage │
│ Zero-copy | Lazy loading | Shared │
└─────────────────────────────────────────────────────────────┘
Performance
Complexity
| Operation | Complexity | Notes |
|---|---|---|
| As-of join | O(n log(m/k)) | n=left rows, m=right rows, k=unique keys |
| Window join | O(n log m + matches) | Plus output size |
| Aggregations | O(n) | SIMD 4x speedup for sum |
| Window functions | O(n) | Single pass with O(1) update |
| Point lookup | O(log n) | Binary search on sorted index |
| Load from disk | O(1) | Memory mapping, no deserialization |
Design Targets
| Metric | Target |
|---|---|
| As-of join (1M x 1M rows) | < 150ms |
| Simple aggregation (1B rows) | < 80ms |
| Binary size | < 5 MB |
| Memory overhead | < 1% beyond data |
Building from Source
Requirements
- CMake >= 3.20
- C++20 compiler (GCC 11+, Clang 14+, MSVC 2022+)
- Python >= 3.9
Build
git clone https://github.com/wayy-research/wayydb.git
cd wayydb
# Option 1: pip install (recommended)
pip install -e .
# Option 2: CMake directly
mkdir build && cd build
cmake .. -DWAYY_BUILD_PYTHON=ON -DWAYY_BUILD_TESTS=ON
make -j$(nproc)
Run Tests
# C++ tests (31 tests)
cd build && ctest --output-on-failure
# Python tests (17 tests)
PYTHONPATH=python pytest tests/python -v
Comparison with Alternatives
| Feature | WayyDB | kdb+ | DuckDB | Polars |
|---|---|---|---|---|
| As-of join | Native | Native | Extension | None |
| Window join | Native | Native | None | None |
| Zero-copy Python | Yes | No | No | Limited |
| Sorted index optimization | Yes | Yes | No | No |
| License | MIT | Commercial | MIT | MIT |
| Learning curve | Low | High (q) | Low | Low |
| Persistence | mmap | Native | Native | None |
Roadmap
- String column type with dictionary encoding
- LZ4 compression for columns
- Parallel aggregations
- More join types (inner, left, full)
- Query optimizer
- Streaming ingestion API
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! Please read our contributing guidelines and submit PRs to the develop branch.
Built with C++20 and Python by Wayy Research