Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import numpy as np | |
| import pandas as pd | |
| import yfinance as yf | |
| import warnings | |
| import plotly.graph_objects as go | |
| from plotly.subplots import make_subplots | |
| import datetime | |
| warnings.filterwarnings('ignore') | |
| # Set wide page layout | |
| st.set_page_config(page_title="Spread Estimation", layout="wide") | |
| st.title("Spread Estimation") | |
| st.write("This application estimates the rolling bid-ask spread using a rolling window estimator on OHLC prices. Each chart below shows volume, close price, rolling volatility, and rolling spread. Use the expanders to see additional analysis for each interval. For further details on the methodology, please see [this article](https://entreprenerdly.com/estimating-bid-ask-spreads-using-ohlc-prices/).") | |
| with st.expander("Theory and Methodology ", expanded=False): | |
| st.markdown(r""" | |
| #### Methodology for Estimating Rolling Bid-Ask Spreads | |
| This function allows for rolling estimation of bid-ask spreads. It is suitable for analyzing transaction costs over time. | |
| ##### Bid-Ask Spread Definition | |
| The **effective bid-ask spread** measures the deviation of observed transaction prices from the unobserved fundamental price. Formally, for a given trade: | |
| $$ | |
| S = \frac{2D (P - P^*)}{P^*} | |
| $$ | |
| where: | |
| - \( P \) is the observed transaction price. | |
| - \( P^* \) is the unobserved fundamental price. | |
| - \( D \) is a trade direction indicator (+1 for buy, -1 for sell). | |
| Since \( P^* \) is unobserved, various methods exist to estimate the spread using **low-frequency data (OHLC prices)**. | |
| ##### Improving Estimation with OHLC Prices | |
| The function implements an estimator that: | |
| 1. Uses **open, high, low, and close** prices. | |
| 2. Corrects for **discretely observed prices** to avoid biases. | |
| 3. Uses the **Generalized Method of Moments (GMM)** to minimize estimation variance. | |
| ##### Moment Conditions | |
| We define multiple log-returns: | |
| - **Midpoint log-prices**: | |
| $$ | |
| \eta_t = \frac{\log(H_t) + \log(L_t)}{2} | |
| $$ | |
| - **Returns used for estimation**: | |
| $$ | |
| r_1 = \eta_t - o_t,\quad r_2 = o_t - \eta_{t-1},\quad r_3 = \eta_t - c_{t-1},\quad r_4 = c_{t-1} - \eta_{t-1},\quad r_5 = o_t - c_{t-1} | |
| $$ | |
| where \( o_t \), \( c_t \), \( H_t \), and \( L_t \) are the log-transformed **open, close, high, and low** prices. | |
| A key step is defining **indicators for price variation**: | |
| $$ | |
| \tau_t = | |
| \begin{cases} | |
| 0, & \text{if } H_t = L_t = c_{t-1} \\\\ | |
| 1, & \text{otherwise} | |
| \end{cases} | |
| $$ | |
| This handles cases when prices remain unchanged and prevents overestimation. | |
| #### Estimator Calculation | |
| Using the moment conditions, the estimator is: | |
| $$ | |
| S^2 = -8\,\frac{E[(\eta_t - o_t)(o_t - c_{t-1})]}{P[o_t \neq H_t,\, \tau_t = 1] + P[o_t \neq L_t,\, \tau_t = 1]} | |
| $$ | |
| Multiple estimators are computed and combined using GMM weighting: | |
| $$ | |
| S^2_{\text{EDGE}} = w_1\,E[x_1] + w_2\,E[x_2] | |
| $$ | |
| where \( w_1 \) and \( w_2 \) are chosen to minimize variance: | |
| $$ | |
| w_1 = \frac{\operatorname{Var}[x_2]}{\operatorname{Var}[x_1] + \operatorname{Var}[x_2]},\quad | |
| w_2 = \frac{\operatorname{Var}[x_1]}{\operatorname{Var}[x_1] + \operatorname{Var}[x_2]} | |
| $$ | |
| Finally, the **spread estimate** is given by: | |
| $$ | |
| S = \sqrt{\max(0, S^2)} | |
| $$ | |
| ##### Rolling Estimation | |
| The estimator uses a **rolling window approach**: | |
| - The window size is user-defined. | |
| - The estimates update dynamically, allowing for time-varying spread analysis. | |
| - Negative values are reset to zero for consistency. | |
| ##### Implementation | |
| The function `edge_rolling` computes **rolling bid-ask spread estimates** from OHLC prices. It accepts: | |
| - `data`: A DataFrame with `open`, `high`, `low`, and `close` prices. | |
| - `window`: The rolling window size. | |
| - `sign`: Boolean to preserve the sign of the estimate. | |
| - `kwargs`: Additional arguments for the Pandas rolling function. | |
| This estimator improves accuracy in markets with varying trading frequencies. | |
| For further details, see: | |
| - [Ardia, D., Guidotti, E., & Kroencke, T. (2024). Efficient Estimation of Bid-Ask Spreads from OHLC Prices. Journal of Financial Economics.](https://doi.org/10.1016/j.jfineco.2024.103916) | |
| """, unsafe_allow_html=True) | |
| with st.sidebar: | |
| with st.expander("User Inputs", expanded=True): | |
| ticker = st.text_input( | |
| "Ticker", value="CVNA", | |
| help="Enter the ticker symbol or cryptopair (e.g., 'AAPL', 'BTC-USD')." | |
| ) | |
| start_date = st.date_input( | |
| "Start Date", value=pd.to_datetime("2022-01-01"), | |
| help="Select the start date for the analysis." | |
| ) | |
| default_end_date = datetime.date.today() + datetime.timedelta(days=1) | |
| end_date = st.date_input( | |
| "End Date", value=default_end_date, | |
| help="Select the end date for the analysis (default is tomorrow)." | |
| ) | |
| run_analysis = st.button("Run Analysis") | |
| # Function to compute the rolling spread estimate | |
| def edge_rolling(data: pd.DataFrame, window: int, sign: bool = False, **kwargs) -> pd.Series: | |
| df = data.rename(columns=str.lower, inplace=False) | |
| log_open = np.log(df['open']) | |
| log_high = np.log(df['high']) | |
| log_low = np.log(df['low']) | |
| log_close = np.log(df['close']) | |
| log_mid = (log_high + log_low) / 2.0 | |
| log_high_prev = log_high.shift(1) | |
| log_low_prev = log_low.shift(1) | |
| log_close_prev = log_close.shift(1) | |
| log_mid_prev = log_mid.shift(1) | |
| r1 = log_mid - log_open | |
| r2 = log_open - log_mid_prev | |
| r3 = log_mid - log_close_prev | |
| r4 = log_close_prev - log_mid_prev | |
| r5 = log_open - log_close_prev | |
| tau = np.where( | |
| np.isnan(log_high) | np.isnan(log_low) | np.isnan(log_close_prev), | |
| np.nan, | |
| (log_high != log_low) | (log_low != log_close_prev) | |
| ) | |
| ind_o_h = tau * np.where(np.isnan(log_open) | np.isnan(log_high), np.nan, log_open != log_high) | |
| ind_o_l = tau * np.where(np.isnan(log_open) | np.isnan(log_low), np.nan, log_open != log_low) | |
| ind_c_h = tau * np.where(np.isnan(log_close_prev) | np.isnan(log_high_prev), np.nan, log_close_prev != log_high_prev) | |
| ind_c_l = tau * np.where(np.isnan(log_close_prev) | np.isnan(log_low_prev), np.nan, log_close_prev != log_low_prev) | |
| prod_12 = r1 * r2 | |
| prod_34 = r3 * r4 | |
| prod_15 = r1 * r5 | |
| prod_45 = r4 * r5 | |
| tau_r1 = tau * r1 | |
| tau_r2 = tau * r2 | |
| tau_r4 = tau * r4 | |
| tau_r5 = tau * r5 | |
| vals = pd.DataFrame({ | |
| 'prod_12': prod_12, | |
| 'prod_34': prod_34, | |
| 'prod_15': prod_15, | |
| 'prod_45': prod_45, | |
| 'tau': tau, | |
| 'r1': r1, | |
| 'tau_r2': tau_r2, | |
| 'r3': r3, | |
| 'tau_r4': tau_r4, | |
| 'r5': r5, | |
| 'prod_12_sq': prod_12 ** 2, | |
| 'prod_34_sq': prod_34 ** 2, | |
| 'prod_15_sq': prod_15 ** 2, | |
| 'prod_45_sq': prod_45 ** 2, | |
| 'prod_12_34': prod_12 * prod_34, | |
| 'prod_15_45': prod_15 * prod_45, | |
| 'tau_r2_r2': tau_r2 * r2, | |
| 'tau_r4_r4': tau_r4 * r4, | |
| 'tau_r5_r5': tau_r5 * r5, | |
| 'tau_r2_prod12': tau_r2 * prod_12, | |
| 'tau_r4_prod34': tau_r4 * prod_34, | |
| 'tau_r5_prod15': tau_r5 * prod_15, | |
| 'tau_r4_prod45': tau_r4 * prod_45, | |
| 'tau_r4_prod12': tau_r4 * prod_12, | |
| 'tau_r2_prod34': tau_r2 * prod_34, | |
| 'tau_r2_r4': tau_r2 * r4, | |
| 'tau_r1_prod45': tau_r1 * prod_45, | |
| 'tau_r5_prod45': tau_r5 * prod_45, | |
| 'tau_r4_r5': tau_r4 * r5, | |
| 'tau_r5_only': tau_r5, | |
| 'ind_o_h': ind_o_h, | |
| 'ind_o_l': ind_o_l, | |
| 'ind_c_h': ind_c_h, | |
| 'ind_c_l': ind_c_l | |
| }, index=df.index) | |
| vals.iloc[0] = np.nan | |
| window_adj = window - 1 if isinstance(window, (int, np.integer)) else window | |
| if 'min_periods' in kwargs and isinstance(kwargs['min_periods'], (int, np.integer)): | |
| kwargs['min_periods'] = max(0, kwargs['min_periods'] - 1) | |
| roll_vals = vals.rolling(window=window_adj, **kwargs).mean() | |
| p_tau = roll_vals['tau'] | |
| p_open = roll_vals['ind_o_h'] + roll_vals['ind_o_l'] | |
| p_close = roll_vals['ind_c_h'] + roll_vals['ind_c_l'] | |
| count_tau = vals['tau'].rolling(window=window_adj, **kwargs).sum() | |
| roll_vals[(count_tau < 2) | (p_open == 0) | (p_close == 0)] = np.nan | |
| a1 = -4.0 / p_open | |
| a2 = -4.0 / p_close | |
| a3 = roll_vals['r1'] / p_tau | |
| a4 = roll_vals['tau_r4'] / p_tau | |
| a5 = roll_vals['r3'] / p_tau | |
| a6 = roll_vals['r5'] / p_tau | |
| a12 = 2 * a1 * a2 | |
| a11 = a1 ** 2 | |
| a22 = a2 ** 2 | |
| a33 = a3 ** 2 | |
| a55 = a5 ** 2 | |
| a66 = a6 ** 2 | |
| E1 = a1 * (roll_vals['prod_12'] - a3 * roll_vals['tau_r2']) + \ | |
| a2 * (roll_vals['prod_34'] - a4 * roll_vals['r3']) | |
| E2 = a1 * (roll_vals['prod_15'] - a3 * roll_vals['tau_r5_only']) + \ | |
| a2 * (roll_vals['prod_45'] - a4 * roll_vals['r5']) | |
| V1 = - E1**2 + ( | |
| a11 * (roll_vals['prod_12_sq'] - 2 * a3 * roll_vals['tau_r2_prod12'] + a33 * roll_vals['tau_r2_r2']) + | |
| a22 * (roll_vals['prod_34_sq'] - 2 * a5 * roll_vals['tau_r4_prod34'] + a55 * roll_vals['tau_r4_r4']) + | |
| a12 * (roll_vals['prod_12_34'] - a3 * roll_vals['tau_r2_prod34'] - a5 * roll_vals['tau_r4_prod12'] + a3 * a5 * roll_vals['tau_r2_r4']) | |
| ) | |
| V2 = - E2**2 + ( | |
| a11 * (roll_vals['prod_15_sq'] - 2 * a3 * roll_vals['tau_r5_prod15'] + a33 * roll_vals['tau_r5_r5']) + | |
| a22 * (roll_vals['prod_45_sq'] - 2 * a6 * roll_vals['tau_r4_prod45'] + a66 * roll_vals['tau_r4_r4']) + | |
| a12 * (roll_vals['prod_15_45'] - a3 * roll_vals['tau_r5_prod45'] - a6 * roll_vals['tau_r4_r5'] + a3 * a6 * roll_vals['tau_r4_r5']) | |
| ) | |
| tot_var = V1 + V2 | |
| s2 = np.where(tot_var > 0, (V2 * E1 + V1 * E2) / tot_var, (E1 + E2) / 2.0) | |
| spread = np.sqrt(np.abs(s2)) | |
| if sign: | |
| spread *= np.sign(s2) | |
| return pd.Series(spread, index=df.index) | |
| # Download data function supporting different intervals | |
| def download_data(ticker, start, end, interval="1d"): | |
| if interval in ["1d", "1wk", "1mo"]: | |
| data = yf.download(ticker, start=start, end=end, interval=interval, auto_adjust=True) | |
| return data | |
| else: | |
| period_mapping = {"1m": "8d", "5m": "60d", "60m": "300d"} #changed this from 720 to 300 | |
| if interval in period_mapping: | |
| period = period_mapping[interval] | |
| data = yf.download(ticker, period=period, interval=interval, auto_adjust=True) | |
| return data | |
| else: | |
| data = yf.download(ticker, start=start, end=end, interval=interval, auto_adjust=True) | |
| return data | |
| # Run analysis when button is clicked | |
| if run_analysis: | |
| start_date_str = pd.to_datetime(start_date).strftime("%Y-%m-%d") | |
| end_date_str = pd.to_datetime(end_date).strftime("%Y-%m-%d") | |
| intervals = ["1d", "60m", "5m", "1m"] | |
| for interval in intervals: | |
| st.markdown(f"## Spread Estimation at {interval} data") | |
| with st.spinner(f"Downloading {interval} data..."): | |
| data = download_data(ticker, start_date_str, end_date_str, interval=interval) | |
| if data.empty: | |
| st.error(f"No data available for the {interval} interval.") | |
| continue | |
| if isinstance(data.columns, pd.MultiIndex): | |
| data.columns = data.columns.get_level_values(0) | |
| try: | |
| ohlc = data[['Open', 'High', 'Low', 'Close']] | |
| except Exception as e: | |
| st.error("Error processing data columns.") | |
| continue | |
| try: | |
| rolling_spreads = edge_rolling(ohlc, window=15, min_periods=10, sign=False) | |
| except Exception as e: | |
| st.error("Error computing rolling spread.") | |
| continue | |
| data_with_spread = data.copy() | |
| data_with_spread["Spread"] = rolling_spreads | |
| volume = data['Volume'] | |
| returns = ohlc['Close'].pct_change() | |
| rolling_vol = returns.rolling(window=15).std() | |
| upper_band = ohlc['Close'] * (1 + rolling_spreads / 2) | |
| lower_band = ohlc['Close'] * (1 - rolling_spreads / 2) | |
| # Create the main Plotly chart | |
| fig_ts = go.Figure() | |
| fig_ts.add_trace(go.Bar( | |
| x=ohlc.index, y=volume, | |
| name="Volume", | |
| marker_color="gray", | |
| opacity=1, | |
| yaxis="y" | |
| )) | |
| fig_ts.add_trace(go.Scatter( | |
| x=ohlc.index, y=ohlc['Close'], | |
| mode="lines", | |
| name="Close Price", | |
| line=dict(color="lime"), | |
| yaxis="y2" | |
| )) | |
| fig_ts.add_trace(go.Scatter( | |
| x=ohlc.index, y=lower_band, | |
| mode="lines", | |
| name="Lower Band", | |
| line=dict(color="rgba(0,0,0,0)"), | |
| showlegend=False, | |
| yaxis="y2" | |
| )) | |
| fig_ts.add_trace(go.Scatter( | |
| x=ohlc.index, y=upper_band, | |
| mode="lines", | |
| name="Spread Band", | |
| line=dict(color="gray"), | |
| fill="tonexty", | |
| fillcolor="rgba(128,128,128,0.3)", | |
| yaxis="y2" | |
| )) | |
| fig_ts.add_trace(go.Scatter( | |
| x=ohlc.index, y=rolling_vol, | |
| mode="lines", | |
| name="Rolling Volatility", | |
| line=dict(color="orange", dash="dash"), | |
| yaxis="y3" | |
| )) | |
| fig_ts.add_trace(go.Scatter( | |
| x=ohlc.index, y=rolling_spreads, | |
| mode="lines", | |
| name="Rolling Spread", | |
| line=dict(color="blue"), | |
| yaxis="y4" | |
| )) | |
| fig_ts.update_layout( | |
| template="plotly_dark", | |
| paper_bgcolor='#0e1117', | |
| plot_bgcolor='#0e1117', | |
| title=dict(text=f"Rolling Spread, Volume, Close Price, and Rolling Volatility ({interval} data)", font=dict(color="white")), | |
| xaxis=dict( | |
| title="Date", | |
| tickformat="%Y-%m-%d", | |
| nticks=20, | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| yaxis=dict( | |
| title="Volume", | |
| side="left", | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| yaxis2=dict( | |
| title="Close Price", | |
| overlaying="y", | |
| side="left", | |
| position=0.05, | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| yaxis3=dict( | |
| title="Rolling Volatility", | |
| overlaying="y", | |
| side="right", | |
| position=0.95, | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| yaxis4=dict( | |
| title="Rolling Spread", | |
| overlaying="y", | |
| side="right", | |
| position=1, | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| legend=dict( | |
| orientation="h", | |
| yanchor="bottom", | |
| y=1.02, | |
| xanchor="right", | |
| x=1, | |
| font=dict(color="white") | |
| ), | |
| margin=dict(l=50, r=50, t=80, b=50) | |
| ) | |
| st.plotly_chart(fig_ts, use_container_width=True) | |
| with st.expander(f"Additional Analysis for {interval}", expanded=False): | |
| st.write("This section shows a preview of the raw data, scatter plots of lagged relationships, and a rolling correlation chart.") | |
| st.subheader("Raw Data Preview") | |
| st.dataframe(data_with_spread, use_container_width=True) | |
| lag_period = 1 | |
| lagged_spreads = rolling_spreads.shift(lag_period) | |
| lagged_volume = volume.shift(lag_period) | |
| lagged_vol = rolling_vol.shift(lag_period) | |
| titles = [ | |
| f"Returns vs Lagged Spreads ({lag_period})", | |
| f"Spreads vs Lagged Volume ({lag_period})", | |
| f"Returns vs Lagged Volume ({lag_period})", | |
| f"Volatility vs Lagged Spreads ({lag_period})", | |
| f"Lagged Volatility vs Current Spreads ({lag_period})" | |
| ] | |
| data_pairs = [ | |
| (lagged_spreads, returns), | |
| (lagged_volume, rolling_spreads), | |
| (lagged_volume, returns), | |
| (lagged_spreads, rolling_vol), | |
| (lagged_vol, rolling_spreads) | |
| ] | |
| colors = ["blue", "red", "green", "orange", "magenta"] | |
| fig_scatter = make_subplots( | |
| rows=1, cols=5, | |
| shared_xaxes=False, horizontal_spacing=0.08, | |
| subplot_titles=titles | |
| ) | |
| for i, (x, y) in enumerate(data_pairs, start=1): | |
| mask = (~x.isna()) & (~y.isna()) | |
| fig_scatter.add_trace(go.Scatter( | |
| x=x[mask], y=y[mask], | |
| mode="markers", | |
| marker=dict(color=colors[i-1], size=5), | |
| name=titles[i-1] | |
| ), row=1, col=i) | |
| fig_scatter.update_layout( | |
| template="plotly_dark", | |
| paper_bgcolor='#0e1117', | |
| plot_bgcolor='#0e1117', | |
| title=dict(text=f"Lagged Value Analysis ({lag_period})", font=dict(color="white")), | |
| xaxis1=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| xaxis2=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| xaxis3=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| xaxis4=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| xaxis5=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| yaxis=dict(gridcolor="rgba(255,255,255,0.2)", color="white", tickfont=dict(color="white")), | |
| legend=dict(font=dict(color="white")) | |
| ) | |
| st.plotly_chart(fig_scatter, use_container_width=True) | |
| window_corr = 60 | |
| lag_period_corr = 1 | |
| lagged_volume_corr = volume.shift(lag_period_corr) | |
| lagged_vol_corr = rolling_vol.shift(lag_period_corr) | |
| rolling_corr_volume_spread = lagged_volume_corr.rolling(window=window_corr).corr(rolling_spreads) | |
| rolling_corr_volatility_spread = lagged_vol_corr.rolling(window=window_corr).corr(rolling_spreads) | |
| fig_corr = go.Figure() | |
| fig_corr.add_trace(go.Scatter( | |
| x=rolling_corr_volume_spread.index, y=rolling_corr_volume_spread, | |
| mode="lines", | |
| name=f"Rolling Corr (Lagged Volume vs Spread, lag={lag_period_corr}, window={window_corr})", | |
| line=dict(color="red") | |
| )) | |
| fig_corr.add_trace(go.Scatter( | |
| x=rolling_corr_volatility_spread.index, y=rolling_corr_volatility_spread, | |
| mode="lines", | |
| name=f"Rolling Corr (Lagged Volatility vs Spread, lag={lag_period_corr}, window={window_corr})", | |
| line=dict(color="orange") | |
| )) | |
| fig_corr.update_layout( | |
| template="plotly_dark", | |
| paper_bgcolor='#0e1117', | |
| plot_bgcolor='#0e1117', | |
| title=dict(text=f"Rolling Correlations (Window={window_corr}, Lag={lag_period_corr})", font=dict(color="white")), | |
| xaxis=dict( | |
| title="Date", | |
| tickformat="%Y-%m-%d", | |
| nticks=20, | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| yaxis=dict( | |
| title="Correlation", | |
| showgrid=True, | |
| gridcolor="rgba(255,255,255,0.2)", | |
| color="white", | |
| tickfont=dict(color="white") | |
| ), | |
| legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1, font=dict(color="white")), | |
| margin=dict(l=50, r=50, t=80, b=50) | |
| ) | |
| st.plotly_chart(fig_corr, use_container_width=True) | |
| st.markdown( | |
| """ | |
| <style> | |
| #MainMenu {visibility: hidden;} | |
| footer {visibility: hidden;} | |
| </style> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |