Bayesian LSTM for Cryptocurrency Return & Uncertainty Prediction

This repository contains the implementation of a Bayesian Long Short-Term Memory (BLSTM) model designed to predict hourly log returns and estimate both Epistemic and Aleatoric uncertainty for major cryptocurrency assets (SOL, BTC, and DOGE).

This project is part of a Master's Thesis in Mathematics (Statistics Concentration) at Andalas University.

🧠 Model Features

Bayesian Inference: Built using blitz-bayesian-pytorch for Variational Inference.
Uncertainty Estimation: Quantifies market noise (aleatoric) and model confidence (epistemic) using Monte Carlo Sampling.
Feature Engineering: Includes log returns, volatility metrics, and cyclical time encoding (sin/cos).
Multi-Asset: Pre-trained weights and artifacts for Solana (SOL), Bitcoin (BTC), and Dogecoin (DOGE).

📂 Repository Structure

model_arch.py: The BLSTM architecture definition.
load_model.py: Script to load model and artifacts.
requirements.txt: Necessary libraries.
/[COIN]/: Contains BLSTM_model.pth (weights) and training_artifacts.pkl (scalers & config) for each asset.

📈 Performance Summary

Results based on backtesting process using Bayesian LSTM model for trading strategy on the test data (Jan 2025 - Nov 2025):

Asset	RMSE	Sharpe Ratio	PICP (95% CI)
Solana (SOL)	0.0094	0.7631	0.7416
Bitcoin (BTC)	0.0048	-0.0112	0.4864
Dogecoin (DOGE)	0.0103	0.8999	0.7698

📈 Detailed Backtesting Results

Click to expand: Cumulative Returns of gain from Bitcoin

Backtesting Plot

Click to expand: Cumulative Returns of gain from Solana

Backtesting Plot

Click to expand: Cumulative Returns of gain from Dogecoin

Backtesting Plot

🚀 Quick Start

Installation

pip install -r requirements.txt

How to Predict with Your Own Dataset (OHLCV)

To get predictions from this model, you need to provide a sequence of the last 168 hours (1 week) of OHLCV data. The process involves three main steps: Feature Engineering, Scaling, and Monte Carlo Inference.

1. Prepare Your Data

Ensure your data is a Pandas DataFrame with a DatetimeIndex and contains the following columns: Open, High, Low, Close, and Volume.

2. Preprocessing & Prediction Script

Copy and use this script to process your raw OHLCV data and get the predicted return along with its uncertainty:

Environment Setup: Ensure that your prediction script (e.g., load_model.py) is located in the same directory as model_arch.py. The model initialization relies on importing the BayesianLSTMModel class from this file.

import torch
import numpy as np
from model_arch import BayesianLSTMModel
import feature_engineering
import pickle

# Configuration
COIN = "SOL" # or "BTC", "DOGE"
# Load artifacts (Scaler & Config) from the respective folder
with open(f'./{COIN}/training_artifacts.pkl', 'rb') as f:
    artifacts = pickle.load(f)

# Initialize and Load Model
model = BayesianLSTMModel(
    input_dim=artifacts['input_dim'],
    hidden_1=artifacts['config']['hidden_size_1'],
    hidden_2=artifacts['config']['hidden_size_2']
)
model.load_state_dict(torch.load(f'./{COIN}/BLSTM_model.pth'))
model.eval()

def predict_return(raw_df):
    # Step A: Feature Engineering (Log Returns, Cyclical Time, etc.)
    df_feat = feature_engineering(raw_df)
    
    # Step B: Scaling
    feature_cols = [c for c in df_feat.columns if c != 'log_return']
    X_scaled = artifacts['scaler_X'].transform(df_feat[feature_cols].values)
    
    # Step C: Prepare Sequence (Last 168 Hours)
    input_seq = X_scaled[-168:] 
    input_tensor = torch.tensor(input_seq).float().unsqueeze(0)
    
    # Step D: Bayesian Inference (Monte Carlo Sampling)
    mc_samples = 50
    preds = [model(input_tensor) for _ in range(mc_samples)]
    
    # Calculate Mean & Total Uncertainty
    mu_list = [p[0].item() for p in preds]
    sigma_list = [p[1].item() for p in preds]
    
    final_pred_mu = np.mean(mu_list)
    total_unc = np.sqrt(np.var(mu_list) + np.mean(np.array(sigma_list)**2))
    
    # Step E: Inverse Transform to get actual Log Return
    actual_return = artifacts['scaler_y'].inverse_transform([[final_pred_mu]])[0][0]
    return actual_return, total_unc

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support