Time-Series-Crypto-Transformer

Overview

Time-Series-Crypto-Transformer is a dedicated encoder-decoder Transformer model optimized for multivariate time series forecasting, specifically predicting the 7-day future closing price trend for major cryptocurrencies. The model incorporates both temporal features (price, volume, volatility) and static categorical features (Asset ID) to generate robust predictions.

This model is a custom implementation based on the architecture popularized by models like Informer and Autoformer, adapted for the financial domain's low-latency requirements.

Model Architecture

The model uses a Transformer Encoder-Decoder structure.

Architecture: Custom Transformer for Time Series (built using HuggingFace-TS principles).
Input Features (input_size=12): [Close Price (Target), Open, High, Low, Volume, Log Returns, Moving Averages (2), Volatility (2), Time-of-Day, Day-of-Week].
Context Length (context_length): 90 time steps (equivalent to 90 days of look-back).
Prediction Length (prediction_length): 7 time steps (7-day forecast).
Dimensionality: $d_{model}=256$, $n_{layers}=4$ (Encoder & Decoder), $n_{heads}=8$.
Static Features: Asset ID (5 major cryptocurrencies: BTC, ETH, BNB, SOL, ADA).

Intended Use

7-Day Price Trend Forecasting: Predicting the direction and magnitude of the close price for the next week.
Algorithmic Trading Signals: Integrating forecasts into a larger trading system for entry/exit points.
Risk Management: Quantifying future price volatility to manage portfolio risk exposures.
Benchmarking: Serving as a strong baseline model for comparison with other time-series models (e.g., ARIMA, LSTMs) in the crypto domain.

Limitations

Black Swan Events: The model relies on historical patterns and may fail to accurately predict sudden, high-impact events (e.g., major regulatory changes, exchange failures) that are not represented in the training data.
Data Stationarity: Performance assumes a certain degree of non-stationarity in the time series, but extreme shifts in market structure may degrade accuracy.
Feature Dependence: Accuracy is highly dependent on the quality and preprocessing of the 12 input features. Missing or noisy data will significantly impact results.

Example Code (Python - Conceptual)

import pandas as pd
from transformers import AutoConfig, TimeSeriesModel

model_name = "Quant/Time-Series-Crypto-Transformer"
config = AutoConfig.from_pretrained(model_name)
model = TimeSeriesModel(config)

# --- Conceptual Data Preparation ---
# Your input data should be a 3D tensor: [Batch Size, Context Length, Input Size (Features)]
# Example: 
# Historical data for 90 days (context_length) of 5 assets (batch size 5) with 12 features (input_size)
historical_data = torch.randn(5, config.context_length, config.input_size)
static_features = torch.tensor([0, 1, 2, 3, 4]) # Asset IDs

# --- Inference ---
# The forward pass returns the distribution or point estimate for the prediction_length (7 days)
outputs = model(
    past_target=historical_data[..., 0].unsqueeze(-1),  # Close price is the target
    past_observed_mask=torch.ones_like(historical_data[..., 0].unsqueeze(-1)),
    past_feat_dynamic_real=historical_data[..., 1:], # Other dynamic features
    static_categorical_features=static_features
)

# Output is a distribution object (e.g., Gaussian) or a tensor of shape [Batch, Prediction Length, Output Size]
# For a point estimate:
forecast = outputs.prediction
print(f"Shape of 7-day forecast for 5 assets: {forecast.shape}") 
# Expected output: torch.Size([5, 7, 1])