Bayesian LSTM for Cryptocurrency Return & Uncertainty Prediction

This repository contains the implementation of a Bayesian Long Short-Term Memory (BLSTM) model designed to predict hourly log returns and estimate both Epistemic and Aleatoric uncertainty for major cryptocurrency assets (SOL, BTC, and DOGE).

This project is part of a Master's Thesis in Mathematics (Statistics Concentration) at Andalas University.

🧠 Model Features

  • Bayesian Inference: Built using blitz-bayesian-pytorch for Variational Inference.
  • Uncertainty Estimation: Quantifies market noise (aleatoric) and model confidence (epistemic) using Monte Carlo Sampling.
  • Feature Engineering: Includes log returns, volatility metrics, and cyclical time encoding (sin/cos).
  • Multi-Asset: Pre-trained weights and artifacts for Solana (SOL), Bitcoin (BTC), and Dogecoin (DOGE).

πŸ“‚ Repository Structure

  • model_arch.py: The BLSTM architecture definition.
  • load_model.py: Script to load model and artifacts.
  • requirements.txt: Necessary libraries.
  • /[COIN]/: Contains BLSTM_model.pth (weights) and training_artifacts.pkl (scalers & config) for each asset.

πŸ“ˆ Performance Summary

Results based on backtesting process using Bayesian LSTM model for trading strategy on the test data (Jan 2025 - Nov 2025):

Asset RMSE Sharpe Ratio PICP (95% CI)
Solana (SOL) 0.0094 0.7631 0.7416
Bitcoin (BTC) 0.0048 -0.0112 0.4864
Dogecoin (DOGE) 0.0103 0.8999 0.7698

πŸ“ˆ Detailed Backtesting Results

Click to expand: Cumulative Returns of gain from Bitcoin

Backtesting Plot

Click to expand: Cumulative Returns of gain from Solana

Backtesting Plot

Click to expand: Cumulative Returns of gain from Dogecoin

Backtesting Plot

πŸš€ Quick Start

Installation

pip install -r requirements.txt

How to Predict with Your Own Dataset (OHLCV)

To get predictions from this model, you need to provide a sequence of the last 168 hours (1 week) of OHLCV data. The process involves three main steps: Feature Engineering, Scaling, and Monte Carlo Inference.

1. Prepare Your Data

Ensure your data is a Pandas DataFrame with a DatetimeIndex and contains the following columns: Open, High, Low, Close, and Volume.

2. Preprocessing & Prediction Script

Copy and use this script to process your raw OHLCV data and get the predicted return along with its uncertainty:

Environment Setup: Ensure that your prediction script (e.g., load_model.py) is located in the same directory as model_arch.py. The model initialization relies on importing the BayesianLSTMModel class from this file.

import torch
import numpy as np
from model_arch import BayesianLSTMModel
import feature_engineering
import pickle

# Configuration
COIN = "SOL" # or "BTC", "DOGE"
# Load artifacts (Scaler & Config) from the respective folder
with open(f'./{COIN}/training_artifacts.pkl', 'rb') as f:
    artifacts = pickle.load(f)

# Initialize and Load Model
model = BayesianLSTMModel(
    input_dim=artifacts['input_dim'],
    hidden_1=artifacts['config']['hidden_size_1'],
    hidden_2=artifacts['config']['hidden_size_2']
)
model.load_state_dict(torch.load(f'./{COIN}/BLSTM_model.pth'))
model.eval()

def predict_return(raw_df):
    # Step A: Feature Engineering (Log Returns, Cyclical Time, etc.)
    df_feat = feature_engineering(raw_df)
    
    # Step B: Scaling
    feature_cols = [c for c in df_feat.columns if c != 'log_return']
    X_scaled = artifacts['scaler_X'].transform(df_feat[feature_cols].values)
    
    # Step C: Prepare Sequence (Last 168 Hours)
    input_seq = X_scaled[-168:] 
    input_tensor = torch.tensor(input_seq).float().unsqueeze(0)
    
    # Step D: Bayesian Inference (Monte Carlo Sampling)
    mc_samples = 50
    preds = [model(input_tensor) for _ in range(mc_samples)]
    
    # Calculate Mean & Total Uncertainty
    mu_list = [p[0].item() for p in preds]
    sigma_list = [p[1].item() for p in preds]
    
    final_pred_mu = np.mean(mu_list)
    total_unc = np.sqrt(np.var(mu_list) + np.mean(np.array(sigma_list)**2))
    
    # Step E: Inverse Transform to get actual Log Return
    actual_return = artifacts['scaler_y'].inverse_transform([[final_pred_mu]])[0][0]
    return actual_return, total_unc
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support