Time-Series-Crypto-Transformer
Overview
Time-Series-Crypto-Transformer is a dedicated encoder-decoder Transformer model optimized for multivariate time series forecasting, specifically predicting the 7-day future closing price trend for major cryptocurrencies. The model incorporates both temporal features (price, volume, volatility) and static categorical features (Asset ID) to generate robust predictions.
This model is a custom implementation based on the architecture popularized by models like Informer and Autoformer, adapted for the financial domain's low-latency requirements.
Model Architecture
The model uses a Transformer Encoder-Decoder structure.
- Architecture: Custom Transformer for Time Series (built using
HuggingFace-TSprinciples). - Input Features (
input_size=12): [Close Price (Target), Open, High, Low, Volume, Log Returns, Moving Averages (2), Volatility (2), Time-of-Day, Day-of-Week]. - Context Length (
context_length): 90 time steps (equivalent to 90 days of look-back). - Prediction Length (
prediction_length): 7 time steps (7-day forecast). - Dimensionality: $d_{model}=256$, $n_{layers}=4$ (Encoder & Decoder), $n_{heads}=8$.
- Static Features: Asset ID (5 major cryptocurrencies: BTC, ETH, BNB, SOL, ADA).
Intended Use
- 7-Day Price Trend Forecasting: Predicting the direction and magnitude of the close price for the next week.
- Algorithmic Trading Signals: Integrating forecasts into a larger trading system for entry/exit points.
- Risk Management: Quantifying future price volatility to manage portfolio risk exposures.
- Benchmarking: Serving as a strong baseline model for comparison with other time-series models (e.g., ARIMA, LSTMs) in the crypto domain.
Limitations
- Black Swan Events: The model relies on historical patterns and may fail to accurately predict sudden, high-impact events (e.g., major regulatory changes, exchange failures) that are not represented in the training data.
- Data Stationarity: Performance assumes a certain degree of non-stationarity in the time series, but extreme shifts in market structure may degrade accuracy.
- Feature Dependence: Accuracy is highly dependent on the quality and preprocessing of the 12 input features. Missing or noisy data will significantly impact results.
Example Code (Python - Conceptual)
import pandas as pd
from transformers import AutoConfig, TimeSeriesModel
model_name = "Quant/Time-Series-Crypto-Transformer"
config = AutoConfig.from_pretrained(model_name)
model = TimeSeriesModel(config)
# --- Conceptual Data Preparation ---
# Your input data should be a 3D tensor: [Batch Size, Context Length, Input Size (Features)]
# Example:
# Historical data for 90 days (context_length) of 5 assets (batch size 5) with 12 features (input_size)
historical_data = torch.randn(5, config.context_length, config.input_size)
static_features = torch.tensor([0, 1, 2, 3, 4]) # Asset IDs
# --- Inference ---
# The forward pass returns the distribution or point estimate for the prediction_length (7 days)
outputs = model(
past_target=historical_data[..., 0].unsqueeze(-1), # Close price is the target
past_observed_mask=torch.ones_like(historical_data[..., 0].unsqueeze(-1)),
past_feat_dynamic_real=historical_data[..., 1:], # Other dynamic features
static_categorical_features=static_features
)
# Output is a distribution object (e.g., Gaussian) or a tensor of shape [Batch, Prediction Length, Output Size]
# For a point estimate:
forecast = outputs.prediction
print(f"Shape of 7-day forecast for 5 assets: {forecast.shape}")
# Expected output: torch.Size([5, 7, 1])