Update README.md
Browse files
README.md
CHANGED
|
@@ -5,46 +5,67 @@ datasets:
|
|
| 5 |
tags:
|
| 6 |
- finance
|
| 7 |
- fintech
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
Thales is a physics-informed neural network architecture designed for option pricing across complex volatility surfaces, including SABR and SVI models. Addressing the interpretability bottleneck of deep learning in quantitative finance, Thales integrates a Sparse Autoencoder (SAE) into its latent space. This structural design maps dense neural representations into disentangled, human-interpretable features, facilitating compliance with institutional risk management and model validation protocols.
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
* Latent Space Disentanglement: To address the opacity of deep neural networks, the embedded SAE translates dense activations into sparse, structurally meaningful representations, enabling explicit feature attribution.
|
| 20 |
-
* Computational Efficiency: By substituting computationally expensive PDE solvers and Monte Carlo simulations with a highly parallelizable forward pass, the model significantly reduces inference latency and the associated computational overhead of derivative pricing tasks.
|
| 21 |
|
| 22 |
-
##
|
| 23 |
|
| 24 |
-
Thales
|
| 25 |
|
| 26 |
-
*
|
| 27 |
-
*
|
| 28 |
-
*
|
| 29 |
-
* Gradient Stabilization: Global gradient clipping is implemented to mitigate explosive gradients, a common instability when regressing deep out-of-the-money (OTM) implied volatilities.
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
##
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |

|
| 44 |
|
| 45 |
-
## 5. Usage
|
| 46 |
|
| 47 |
-
Dependencies
|
| 48 |
|
| 49 |
```python
|
| 50 |
import torch
|
|
@@ -66,15 +87,14 @@ with torch.no_grad():
|
|
| 66 |
|
| 67 |
print(f"Predicted Option Price: {price.item():.4f}")
|
| 68 |
print(f"Primary Active SAE Node: {torch.argmax(sae_acts).item()}")
|
| 69 |
-
|
| 70 |
```
|
| 71 |
|
| 72 |
-
## 6.
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
| Batch Size | Batch Latency (ms) | Per-Option Latency (μs) | Throughput (opts/sec) | SAE Neurons Sparsity |
|
| 77 |
-
| --- | --- | --- | --- | --- |
|
| 78 |
| 1 | 0.766 | 765.68 | 1,306 | 48.83% |
|
| 79 |
| 16 | 0.548 | 34.25 | 29,201 | 50.71% |
|
| 80 |
| 64 | 0.542 | 8.47 | 117,998 | 50.90% |
|
|
@@ -82,8 +102,20 @@ Evaluation conducted on an out-of-sample dataset of synthetic SABR/SVI surfaces.
|
|
| 82 |
| 1024 | 0.618 | 0.60 | 1,658,165 | 50.72% |
|
| 83 |
| 4096 | 1.577 | 0.39 | 2,596,696 | 50.68% |
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- finance
|
| 7 |
- fintech
|
| 8 |
+
- sparse-autoencoders
|
| 9 |
+
- xai
|
| 10 |
---
|
| 11 |
|
| 12 |
+
<div class="container">
|
| 13 |
+
<div class="content"><img src="banner.png" /></div>
|
| 14 |
+
</div>
|
| 15 |
|
|
|
|
| 16 |
|
| 17 |
+
# Model Card for Thales
|
| 18 |
|
| 19 |
+
## Model Overview
|
| 20 |
|
| 21 |
+
Thales is a deep learning architecture developed for quantitative option pricing. It constraints deep neural representations using fundamental financial partial differential equations (PDEs). To ensure compliance with institutional model validation protocols, Thales projects dense latent activations into disentangled representations via a Sparse Autoencoder, which are subsequently decoded into deterministic natural language risk narratives using a fine-tuned Large Language Model.
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
## 1. Architecture details
|
| 24 |
|
| 25 |
+
The Thales architecture addresses the trade-off between computational throughput and model interpretability through three structural paradigms:
|
| 26 |
|
| 27 |
+
* Physics-Informed Optimization: The training objective diverges from standard empirical regression. The loss function is parameterized to optimize the governing PDEs of financial derivatives, implicitly enforcing structural adherence to market dynamics and Greek sensitivities.
|
| 28 |
+
* Latent Space Disentanglement: An embedded Sparse Autoencoder (SAE) acts as a bottleneck layer, regularizing dense network activations into discrete, attribution-friendly feature clusters.
|
| 29 |
+
* Forward-Pass Surrogate: The model replaces recursive numerical methods (e.g., finite-difference PDE solvers, Monte Carlo simulations) with a vectorized forward pass, reducing inference latency to bounded matrix multiplications.
|
|
|
|
| 30 |
|
| 31 |
+
## 2. Training Methodology
|
| 32 |
|
| 33 |
+
Thales is trained using specific optimization strategies to maintain numerical stability across disparate moneyness and maturity regimes:
|
| 34 |
|
| 35 |
+
* Orthogonalized Momentum Optimization (Muon): Parameter spaces are partitioned based on tensor dimensionality. 2D parameters (convolutional and linear weights) are updated using Newton-Schulz iteration (Muon) to preserve orthogonal weight updates and constrain Lipschitz constants. 1D parameters (biases, normalizations) utilize standard AdamW.
|
| 36 |
+
* Sobolev Regularization: The objective function integrates a Sobolev loss penalty. Using exact automatic differentiation, the analytical gradient of the price with respect to the underlying asset ($\frac{\partial V}{\partial S}$, Delta) is computed dynamically. The loss incorporates Mean Squared Error (MSE) for pricing and Mean Absolute Error (MAE) for Delta, constraining the derivative manifold.
|
| 37 |
+
* Arbitrage-Free Constraints: Dynamic normalization is applied to input parameters ($S, K, T, r$) preserving computational graph integrity. A Softplus terminal activation function enforces strictly positive pricing, structurally bounding the output to prevent trivial arbitrage conditions.
|
| 38 |
+
* Gradient Stabilization: Global gradient norm clipping is applied to prevent numerical divergence when regressing deep out-of-the-money (OTM) implied volatility gradients.
|
| 39 |
|
| 40 |
+
## 3. Interpretability and Semantic Decoding
|
| 41 |
|
| 42 |
+
A documented limitation of deep parametric models in finance is the opacity of feature attribution. Thales applies a specific latent constraint methodology to generate auditable risk outputs.
|
| 43 |
|
| 44 |
+
### SNR-Bounded Sparsity Paradigm
|
| 45 |
+
Standard language model SAEs typically target >95% sparsity. Thales targets a "Dense-Sparse Hybrid" equilibrium, empirically maintained at ~50% structural sparsity. This hyperparameter is derived from the fundamentally low Signal-to-Noise Ratio (SNR) in financial micro-structures.
|
| 46 |
+
|
| 47 |
+
Macro-level absolute variance accounts for the majority of spatial energy, whereas local geometry (skewness, curvature) exhibits low-amplitude signals. Enforcing extreme sparsity induces manifold collapse, degrading Greek precision. The ~50% sparsity threshold acts as a low-pass structural filter, preserving baseline volatility representations while discretizing Moneyness deformations.
|
| 48 |
+
|
| 49 |
+
### Semantic LLM Decoding
|
| 50 |
+
To transition from numerical latent vectors to auditable risk reports, Thales employs a post-trained Large Language Model (fine-tuned on the `Thales_Instruction_Dataset`) as a deterministic decoder. High-dimensional SAE activation states are projected into the LLM context window. The system maps specific mathematical activation vectors to structural risk diagnostics.
|
| 51 |
+
|
| 52 |
+
*Example Output:*
|
| 53 |
+
> `SAE cluster [42, 118, 503] active. Indicator: Short-term OTM put skewness expansion corresponding to a SABR Rho parameter contraction. ATM baseline volatility remains static.`
|
| 54 |
+
|
| 55 |
+
## 4. Environmental, Social, and Governance (ESG) & Carbon Footprint
|
| 56 |
+
|
| 57 |
+
The deployment of quantitative models presents significant computational sustainability challenges. Traditional risk management requires re-evaluating massive portfolios using grid-based PDE solvers or large-scale Monte Carlo methods, yielding substantial Scope 3 emissions via data center power consumption.
|
| 58 |
+
|
| 59 |
+
Thales systematically front-loads computational cost to the training phase (which is a one-time carbon expenditure), serving as a sustainable surrogate model during inference.
|
| 60 |
+
|
| 61 |
+
* Inference Carbon Intensity: By reducing algorithmic time complexity from $O(N \times \text{paths})$ (Monte Carlo) to $O(1)$ tensor multiplications per batch, Thales reduces the Joules-per-inference metric by orders of magnitude compared to traditional CPU-bound risk engines.
|
| 62 |
+
* Hardware Efficiency: The architecture enables high GPU utilization (exceeding 2.5 million options per second at high batch regimes). This parallelization density allows financial institutions to downscale the physical server footprint required for overnight risk scenario generation (e.g., CCAR, FRTB compliance), directly contributing to institutional Net Zero and carbon neutrality mandates.
|
| 63 |
|
| 64 |

|
| 65 |
|
| 66 |
+
## 5. Implementation & Usage
|
| 67 |
|
| 68 |
+
Dependencies: `torch`, `safetensors`, `huggingface_hub`.
|
| 69 |
|
| 70 |
```python
|
| 71 |
import torch
|
|
|
|
| 87 |
|
| 88 |
print(f"Predicted Option Price: {price.item():.4f}")
|
| 89 |
print(f"Primary Active SAE Node: {torch.argmax(sae_acts).item()}")
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
+
## 6. Evaluation and Benchmarks
|
| 93 |
|
| 94 |
+
Benchmarking was conducted on an out-of-sample synthetically generated dataset comprising SABR and SVI surfaces. Measurements were taken in an isolated NVIDIA CUDA environment (fp32 precision).
|
| 95 |
|
| 96 |
| Batch Size | Batch Latency (ms) | Per-Option Latency (μs) | Throughput (opts/sec) | SAE Neurons Sparsity |
|
| 97 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 98 |
| 1 | 0.766 | 765.68 | 1,306 | 48.83% |
|
| 99 |
| 16 | 0.548 | 34.25 | 29,201 | 50.71% |
|
| 100 |
| 64 | 0.542 | 8.47 | 117,998 | 50.90% |
|
|
|
|
| 102 |
| 1024 | 0.618 | 0.60 | 1,658,165 | 50.72% |
|
| 103 |
| 4096 | 1.577 | 0.39 | 2,596,696 | 50.68% |
|
| 104 |
|
| 105 |
+
The benchmark demonstrates sub-linear latency scaling relative to batch dimension. Single-batch ($N=1$) inference is strictly bounded by CUDA kernel launch overhead (~0.76 ms). At a saturation batch size of 4096, the system achieves a throughput of ~2.59M options/second, with per-option latency converging to 0.39 μs. Throughout the scaling regimes, SAE structural sparsity remains strictly bounded at $\approx 50\%$, confirming the stability of the latent representation irrespective of computational load.
|
| 106 |
+
|
| 107 |
+
## 7. Limitations and Citation
|
| 108 |
+
|
| 109 |
+
Thales is constrained by the distribution of its training data. Its surrogate accuracy degrades gracefully when extrapolating beyond the volatility surface parameter boundaries observed during training. The current iteration does not explicitly model jump-diffusion processes or discrete dividend schedules.
|
| 110 |
|
| 111 |
+
If utilizing the Thales architecture in published research or enterprise environments, please cite:
|
| 112 |
|
| 113 |
+
```bibtex
|
| 114 |
+
@misc{thales2026,
|
| 115 |
+
author = {Chunjiang Intelligence},
|
| 116 |
+
title = {Thales: Interpretable and Physics-Informed Deep Learning for Quantitative Option Pricing},
|
| 117 |
+
year = {2026},
|
| 118 |
+
publisher = {Hugging Face},
|
| 119 |
+
howpublished = {\url{https://huggingface.co/Chunjiang-Intelligence/Thales}}
|
| 120 |
+
}
|
| 121 |
+
```
|