AJAY KASU commited on
Commit
27279c5
·
1 Parent(s): 76700b5

Add 4-part Medium article series on ArbIntel quantitative strategies

Browse files
docs/articles/part1_architecture.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbIntel Part 1: Architecture of a Production-Grade Prediction Market Engine
2
+
3
+ Prediction markets like Polymarket and Kalshi have emerged as highly efficient forecasting mechanisms. However, the microstructure of these markets presents unique challenges: high latency, sparse order books, and fragmented liquidity. To exploit these inefficiencies, we built **ArbIntel**, a production-grade algorithmic trading system.
4
+
5
+ In this first part of a four-part series, we'll dive into the architecture of ArbIntel.
6
+
7
+ ## The Tri-Layer Architecture
8
+
9
+ A robust trading system requires strict separation of concerns. ArbIntel employs a tri-layer design:
10
+
11
+ 1. **Data Ingestion Layer**:
12
+ The foundation of any quantitative model is clean data. ArbIntel establishes persistent async WebSocket connections to both the Polymarket Gamma API and the Kalshi v2 trade API. Real-time tick updates (Bids, Asks, Sizes) are normalized into implied probabilities (0.0 to 1.0) and persisted to a local TimescaleDB instance for tick-level backtesting.
13
+
14
+ 2. **Strategy execution Layer**:
15
+ This is where the Alpha is generated. Raw ticks are fed through multiple independent engines:
16
+ * *Cross-Platform Arbitrage Scanner*: Continuously monitors the spread between Polymarket and Kalshi.
17
+ * *Bayesian Fair Value Model*: Estimates the "true" probability of an event using continuous Beta distribution updates.
18
+ * *NLP Momentum Tracker*: Analyzes breaking news and social sentiment to front-run retail flows.
19
+
20
+ 3. **Execution & Risk Layer**:
21
+ Alpha is useless without execution. ArbIntel utilizes a Paper Trading framework enforcing rigorous risk constraints, including maximum position sizing and daily drawdown limits.
22
+
23
+ ## Technical Stack Overview
24
+
25
+ * **Language**: Python 3.13 (Asyncio heavy)
26
+ * **Database**: PostgreSQL + TimescaleDB extension
27
+ * **Machine Learning**: `hmmlearn` (Regime Detection), Hugging Face `transformers` (Sentiment)
28
+ * **Dashboard**: Streamlit + Plotly
29
+
30
+ In **Part 2**, we will explore the mathematics behind our Bayesian Fair-Value Model and how we use 1D Kalman Filters to smooth out the noise of illiquid order books.
docs/articles/part2_bayesian_fv.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbIntel Part 2: Smoothing the Noise with Bayesian Statistics and Kalman Filters
2
+
3
+ In Part 1, we established ArbIntel's unified data pipeline. However, prediction markets are notoriously noisy. A single retail trader's market order on Polymarket can dramatically swipe an illiquid order book, creating a temporary price spike that doesn't reflect a true change in the underlying event's probability.
4
+
5
+ How do we differentiate between random bid-ask bounce and genuine new information? The answer lies in state-space smoothing and Bayesian updating.
6
+
7
+ ## The 1D Kalman Filter
8
+
9
+ A Kalman Filter is recursive. It estimates the true state of a system (the "Fair Value" of an asset) from a series of incomplete and noisy measurements (the traded prices).
10
+
11
+ In ArbIntel, we use a 1D implementation:
12
+ * **State Estimate ($x_t$)**: Our current belief of the true probability.
13
+ * **Process Variance ($Q$)**: How fast we think the true probability fundamentally changes over time (low for stable markets, high for volatile ones).
14
+ * **Measurement Variance ($R$)**: The estimated noise in the exchange's order book.
15
+
16
+ When a new trade occurs, the Kalman Filter calculates the Kalman Gain ($K$)—essentially deciding whether to trust the new market print or our existing belief. This allows ArbIntel to completely ignore short-term liquidity sweeps.
17
+
18
+ ## Bayesian Fair Value
19
+
20
+ While the Kalman filter is great for smoothing, it doesn't establish an intrinsic "Fair Value." For this, we treat the probability of an event resolving "YES" as a Beta distribution, $Beta(\alpha, \beta)$.
21
+
22
+ 1. **The Prior**: We establish base rates (e.g., incumbents win elections 65% of the time $\rightarrow Beta(65, 35)$).
23
+ 2. **The Update**: When trades occur on Polymarket, we adjust our $\alpha$ and $\beta$ parameters scaled by the trade volume. High volume trades shift our belief significantly; low volume trades are treated as noise.
24
+
25
+ By combining the structural rigidity of a Kalman Filter with the probabilistic updates of a Bayesian model, ArbIntel refuses to be faked out by market microstructure dynamics.
26
+
27
+ In **Part 3**, we'll discuss the easiest money in prediction markets: Cross-Platform Arbitrage.
docs/articles/part3_arbitrage.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbIntel Part 3: Exploiting Cross-Platform Arbitrage
2
+
3
+ The holy grail of quantitative trading is arbitrage—the simultaneous purchase and sale of an asset to profit from an imbalance in the price.
4
+
5
+ Because prediction markets are currently fragmented, with heavy US regulatory restrictions forcing Kalshi to operate differently than Polymarket, liquidity is isolated. Where liquidity is isolated, prices decouple.
6
+
7
+ ## The Anatomy of an Arb Trade
8
+
9
+ Consider the market: *"Will the Federal Reserve cut rates by 25bps in September?"*
10
+
11
+ Let's assume the current order books look like this:
12
+ * **Polymarket**: Bid \$0.50 | Ask \$0.52
13
+ * **Kalshi**: Bid \$0.55 | Ask \$0.57
14
+
15
+ The inefficiency is screaming at us. ArbIntel detects this instantly via its WebSocket listeners.
16
+ 1. **The Buy**: We buy the "YES" contract on Polymarket at the Ask price of \$0.52.
17
+ 2. **The Sell**: We simultaneously sell the "YES" contract (or buy "NO") on Kalshi at the Bid price of \$0.55.
18
+
19
+ Our gross margin is $0.55 - 0.52 = \$0.03$ per contract (a 3% risk-free return).
20
+
21
+ ## Overcoming Edge Friction
22
+
23
+ Gross margin is vanity; net margin is sanity. ArbIntel's `CrossPlatformArbitrage` module calculates true edge by accounting for two major frictions:
24
+
25
+ 1. **Fees**: Polymarket historically has 0% taker fees, but interacting with the Polygon blockchain incurs gas costs. Kalshi utilizes a sliding fee schedule capped at a certain dollar amount. The scanner deducts the estimated fees from the gross margin dynamically before executing.
26
+ 2. **Slippage & Depth**: We cannot arb 10,000 contracts if the Kalshi bid size is only 500 contracts. The module computes the maximum executable size as `min(Polymarket_Ask_Size, Kalshi_Bid_Size)`.
27
+
28
+ If the net margin after fees remains above our predefined risk-free threshold (e.g., 0.5%), ArbIntel fires off simultaneous Execution API requests.
29
+
30
+ In the final post, **Part 4**, we'll explore how ArbIntel uses Hugging Face NLP to trade breaking news momentum before human traders can react.
docs/articles/part4_nlp_momentum.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbIntel Part 4: NLP and News Momentum Trading
2
+
3
+ When breaking news hits, prediction markets move violently. For example, if a presidential candidate drops out of a race, their "YES" contracts crash to $0.00 instantly, while their opponents surge. The first machine to parse this information wins.
4
+
5
+ In the final part of our series, we discuss how ArbIntel uses the Hugging Face ecosystem to build a low-latency sentiment momentum trader.
6
+
7
+ ## Transformers for Alpha
8
+
9
+ Financial text is tricky. A phrase like "The FED lowered rates" is structurally neutral, but the market implication is strongly positive for equities and prediction markets tracking economic relief. Standard sentiment models (like NLTK's VADER) often fail here.
10
+
11
+ ArbIntel leverages `ProsusAI/finbert`, a domain-specific BERT model fine-tuned entirely on financial communication, earnings transcripts, and analyst reports.
12
+
13
+ 1. **Ingestion**: ArbIntel monitors social streams (like X/Twitter) filtered by mapped market keywords (e.g., "FOMC", "Powell").
14
+ 2. **Processing**: The text goes through a regex preprocessor to strip URLs and mentions.
15
+ 3. **Inference**: The clean string hits the FinBERT classifier. The output is a continuous score from -1.0 (Bearish) to +1.0 (Bullish).
16
+
17
+ ## The Volume Confirmation Trigger
18
+
19
+ Trading blindly off sentiment is a recipe for disaster. We need confirmation that the market is actually reacting to the news.
20
+
21
+ ArbIntel utilizes a dual-signal trigger inside `NewsReactionMomentum`:
22
+ 1. **Sentiment Signal**: The aggregated FinBERT score surpasses our predefined confidence threshold (e.g., $|score| > 0.5$).
23
+ 2. **Volume Confirmation**: The current contract volume spikes by at least $3x$ the moving average of the last 60 minutes.
24
+
25
+ If and only if both conditions are met, an execution signal fires. The system automatically calculates dynamic Stop-Loss and Take-Profit limits and initiates a position in the direction of the news.
26
+
27
+ ## Conclusion
28
+
29
+ ArbIntel proves that prediction market microstructure is ripe for exploitation. By combining high-speed WebSocket listeners with robust statistical models (Kalman Filters, HMM Regime Tracking) and deep learning (FinBERT), we created a system capable of operating in the most competitive quantitative environments.
30
+
31
+ *Thank you for reading the ArbIntel series! Code is deployed locally and heavily guarded. For insights, questions, or recruiting, connect below.*