Safetensors
thales_quant
finance
fintech
sparse-autoencoders
xai
imbue2025 commited on
Commit
62358f2
·
verified ·
1 Parent(s): 7c5e9d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -28
README.md CHANGED
@@ -5,46 +5,67 @@ datasets:
5
  tags:
6
  - finance
7
  - fintech
 
 
8
  ---
9
 
10
- # Thales: Interpretable and Physics-Informed Deep Learning for Quantitative Option Pricing
 
 
11
 
12
- Thales is a physics-informed neural network architecture designed for option pricing across complex volatility surfaces, including SABR and SVI models. Addressing the interpretability bottleneck of deep learning in quantitative finance, Thales integrates a Sparse Autoencoder (SAE) into its latent space. This structural design maps dense neural representations into disentangled, human-interpretable features, facilitating compliance with institutional risk management and model validation protocols.
13
 
14
- ## 1. Architecture and Objectives
15
 
16
- Quantitative modeling in finance requires rigorous adherence to stability, interpretability, and computational efficiency. The Thales architecture is structured around three core principles:
17
 
18
- * Physics-Informed Objective: The model does not merely regress empirical prices; it optimizes for the fundamental partial differential equations (PDEs) governing financial derivatives. By fitting the Greeks, the model inherently adheres to market dynamics.
19
- * Latent Space Disentanglement: To address the opacity of deep neural networks, the embedded SAE translates dense activations into sparse, structurally meaningful representations, enabling explicit feature attribution.
20
- * Computational Efficiency: By substituting computationally expensive PDE solvers and Monte Carlo simulations with a highly parallelizable forward pass, the model significantly reduces inference latency and the associated computational overhead of derivative pricing tasks.
21
 
22
- ## 2. Methodology and Optimization
23
 
24
- Thales utilizes specific optimization strategies and architectural constraints to ensure numerical stability and arbitrage-free pricing:
25
 
26
- * Orthogonalized Momentum Optimization (Muon): The parameter space is partitioned to optimize training dynamics. Convolutional and linear weight matrices (2D parameters) are optimized via Newton-Schulz iteration (Muon) to maintain orthogonal weight updates, enhancing generalization. Biases and normalization layers (1D parameters) are updated using standard AdamW.
27
- * Sobolev Regularization: Utilizing automatic differentiation (`torch.autograd.grad`), the analytical derivative of the price with respect to the underlying asset ($S$) is computed dynamically during the forward pass. The objective function incorporates both mean squared error for pricing and mean absolute error for Delta (Sobolev Loss), constraining the model to learn the underlying pricing dynamics rather than merely interpolating data points.
28
- * Arbitrage-Free Constraints: Input variables ($S, K, T, r$) are dynamically normalized while preserving the computational graph. A Softplus activation function applied at the terminal layer enforces strictly positive option prices, structurally eliminating trivial arbitrage violations.
29
- * Gradient Stabilization: Global gradient clipping is implemented to mitigate explosive gradients, a common instability when regressing deep out-of-the-money (OTM) implied volatilities.
30
 
31
- ## 3. Latent Space Interpretability via Sparse Autoencoders
32
 
33
- A primary challenge in financial deep learning is the opacity of feature extraction. Thales addresses this by applying a sparsity constraint on the latent representations extracted from the convolutional surface layers.
34
 
35
- By bottlenecking these features through an SAE, the network is constrained to activate a minimal set of nodes for any given market regime. This sparsity enables a direct mapping between mathematical market states and discrete logic. For instance, specific node activations can be structurally correlated with distinct surface phenomena, such as short-term skewness under highly negative Rho conditions in a SABR model. This explicit attribution allows the outputs to be mapped to natural language models, facilitating automated, deterministic risk reporting and scenario analysis.
 
 
 
36
 
37
- ## 4. Computational Sustainability
38
 
39
- Traditional derivative pricing methods, such as finite-difference grids or large-scale Monte Carlo simulations, are highly computationally intensive. As a surrogate model, Thales offloads this computational burden entirely to the training phase.
40
 
41
- At inference, pricing a portfolio of 10,000 options requires computationally trivial matrix multiplications, executing in microsecond-scale latency per option on standard hardware. This architecture offers a reduction in latency while systematically lowering the energy consumption associated with high-frequency pricing and large-scale risk scenario generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ![Showcase](throughput_scaling.png)
44
 
45
- ## 5. Usage
46
 
47
- Dependencies include `safetensors` and `huggingface_hub`.
48
 
49
  ```python
50
  import torch
@@ -66,15 +87,14 @@ with torch.no_grad():
66
 
67
  print(f"Predicted Option Price: {price.item():.4f}")
68
  print(f"Primary Active SAE Node: {torch.argmax(sae_acts).item()}")
69
-
70
  ```
71
 
72
- ## 6. Empirical Benchmarks
73
 
74
- Evaluation conducted on an out-of-sample dataset of synthetic SABR/SVI surfaces. Hardware profile: NVIDIA CUDA environment.
75
 
76
  | Batch Size | Batch Latency (ms) | Per-Option Latency (μs) | Throughput (opts/sec) | SAE Neurons Sparsity |
77
- | --- | --- | --- | --- | --- |
78
  | 1 | 0.766 | 765.68 | 1,306 | 48.83% |
79
  | 16 | 0.548 | 34.25 | 29,201 | 50.71% |
80
  | 64 | 0.542 | 8.47 | 117,998 | 50.90% |
@@ -82,8 +102,20 @@ Evaluation conducted on an out-of-sample dataset of synthetic SABR/SVI surfaces.
82
  | 1024 | 0.618 | 0.60 | 1,658,165 | 50.72% |
83
  | 4096 | 1.577 | 0.39 | 2,596,696 | 50.68% |
84
 
85
- *Analysis:* The architecture demonstrates robust parallelization scaling. While single-batch inference is bounded by kernel launch overhead (~0.76 ms), throughput scales sub-linearly with batch size, achieving over 2.5 million options per second at batch size 4096. Concurrently, the SAE maintains a stable ~50% structural sparsity across all input regimes, validating the consistency of the disentangled latent space.
 
 
 
 
86
 
87
- ## 7. Citation and License
88
 
89
- If you utilize the Thales architecture in your research or quantitative infrastructure, please consider citing this repository.
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - finance
7
  - fintech
8
+ - sparse-autoencoders
9
+ - xai
10
  ---
11
 
12
+ <div class="container">
13
+ <div class="content"><img src="banner.png" /></div>
14
+ </div>
15
 
 
16
 
17
+ # Model Card for Thales
18
 
19
+ ## Model Overview
20
 
21
+ Thales is a deep learning architecture developed for quantitative option pricing. It constraints deep neural representations using fundamental financial partial differential equations (PDEs). To ensure compliance with institutional model validation protocols, Thales projects dense latent activations into disentangled representations via a Sparse Autoencoder, which are subsequently decoded into deterministic natural language risk narratives using a fine-tuned Large Language Model.
 
 
22
 
23
+ ## 1. Architecture details
24
 
25
+ The Thales architecture addresses the trade-off between computational throughput and model interpretability through three structural paradigms:
26
 
27
+ * Physics-Informed Optimization: The training objective diverges from standard empirical regression. The loss function is parameterized to optimize the governing PDEs of financial derivatives, implicitly enforcing structural adherence to market dynamics and Greek sensitivities.
28
+ * Latent Space Disentanglement: An embedded Sparse Autoencoder (SAE) acts as a bottleneck layer, regularizing dense network activations into discrete, attribution-friendly feature clusters.
29
+ * Forward-Pass Surrogate: The model replaces recursive numerical methods (e.g., finite-difference PDE solvers, Monte Carlo simulations) with a vectorized forward pass, reducing inference latency to bounded matrix multiplications.
 
30
 
31
+ ## 2. Training Methodology
32
 
33
+ Thales is trained using specific optimization strategies to maintain numerical stability across disparate moneyness and maturity regimes:
34
 
35
+ * Orthogonalized Momentum Optimization (Muon): Parameter spaces are partitioned based on tensor dimensionality. 2D parameters (convolutional and linear weights) are updated using Newton-Schulz iteration (Muon) to preserve orthogonal weight updates and constrain Lipschitz constants. 1D parameters (biases, normalizations) utilize standard AdamW.
36
+ * Sobolev Regularization: The objective function integrates a Sobolev loss penalty. Using exact automatic differentiation, the analytical gradient of the price with respect to the underlying asset ($\frac{\partial V}{\partial S}$, Delta) is computed dynamically. The loss incorporates Mean Squared Error (MSE) for pricing and Mean Absolute Error (MAE) for Delta, constraining the derivative manifold.
37
+ * Arbitrage-Free Constraints: Dynamic normalization is applied to input parameters ($S, K, T, r$) preserving computational graph integrity. A Softplus terminal activation function enforces strictly positive pricing, structurally bounding the output to prevent trivial arbitrage conditions.
38
+ * Gradient Stabilization: Global gradient norm clipping is applied to prevent numerical divergence when regressing deep out-of-the-money (OTM) implied volatility gradients.
39
 
40
+ ## 3. Interpretability and Semantic Decoding
41
 
42
+ A documented limitation of deep parametric models in finance is the opacity of feature attribution. Thales applies a specific latent constraint methodology to generate auditable risk outputs.
43
 
44
+ ### SNR-Bounded Sparsity Paradigm
45
+ Standard language model SAEs typically target >95% sparsity. Thales targets a "Dense-Sparse Hybrid" equilibrium, empirically maintained at ~50% structural sparsity. This hyperparameter is derived from the fundamentally low Signal-to-Noise Ratio (SNR) in financial micro-structures.
46
+
47
+ Macro-level absolute variance accounts for the majority of spatial energy, whereas local geometry (skewness, curvature) exhibits low-amplitude signals. Enforcing extreme sparsity induces manifold collapse, degrading Greek precision. The ~50% sparsity threshold acts as a low-pass structural filter, preserving baseline volatility representations while discretizing Moneyness deformations.
48
+
49
+ ### Semantic LLM Decoding
50
+ To transition from numerical latent vectors to auditable risk reports, Thales employs a post-trained Large Language Model (fine-tuned on the `Thales_Instruction_Dataset`) as a deterministic decoder. High-dimensional SAE activation states are projected into the LLM context window. The system maps specific mathematical activation vectors to structural risk diagnostics.
51
+
52
+ *Example Output:*
53
+ > `SAE cluster [42, 118, 503] active. Indicator: Short-term OTM put skewness expansion corresponding to a SABR Rho parameter contraction. ATM baseline volatility remains static.`
54
+
55
+ ## 4. Environmental, Social, and Governance (ESG) & Carbon Footprint
56
+
57
+ The deployment of quantitative models presents significant computational sustainability challenges. Traditional risk management requires re-evaluating massive portfolios using grid-based PDE solvers or large-scale Monte Carlo methods, yielding substantial Scope 3 emissions via data center power consumption.
58
+
59
+ Thales systematically front-loads computational cost to the training phase (which is a one-time carbon expenditure), serving as a sustainable surrogate model during inference.
60
+
61
+ * Inference Carbon Intensity: By reducing algorithmic time complexity from $O(N \times \text{paths})$ (Monte Carlo) to $O(1)$ tensor multiplications per batch, Thales reduces the Joules-per-inference metric by orders of magnitude compared to traditional CPU-bound risk engines.
62
+ * Hardware Efficiency: The architecture enables high GPU utilization (exceeding 2.5 million options per second at high batch regimes). This parallelization density allows financial institutions to downscale the physical server footprint required for overnight risk scenario generation (e.g., CCAR, FRTB compliance), directly contributing to institutional Net Zero and carbon neutrality mandates.
63
 
64
  ![Showcase](throughput_scaling.png)
65
 
66
+ ## 5. Implementation & Usage
67
 
68
+ Dependencies: `torch`, `safetensors`, `huggingface_hub`.
69
 
70
  ```python
71
  import torch
 
87
 
88
  print(f"Predicted Option Price: {price.item():.4f}")
89
  print(f"Primary Active SAE Node: {torch.argmax(sae_acts).item()}")
 
90
  ```
91
 
92
+ ## 6. Evaluation and Benchmarks
93
 
94
+ Benchmarking was conducted on an out-of-sample synthetically generated dataset comprising SABR and SVI surfaces. Measurements were taken in an isolated NVIDIA CUDA environment (fp32 precision).
95
 
96
  | Batch Size | Batch Latency (ms) | Per-Option Latency (μs) | Throughput (opts/sec) | SAE Neurons Sparsity |
97
+ | :--- | :--- | :--- | :--- | :--- |
98
  | 1 | 0.766 | 765.68 | 1,306 | 48.83% |
99
  | 16 | 0.548 | 34.25 | 29,201 | 50.71% |
100
  | 64 | 0.542 | 8.47 | 117,998 | 50.90% |
 
102
  | 1024 | 0.618 | 0.60 | 1,658,165 | 50.72% |
103
  | 4096 | 1.577 | 0.39 | 2,596,696 | 50.68% |
104
 
105
+ The benchmark demonstrates sub-linear latency scaling relative to batch dimension. Single-batch ($N=1$) inference is strictly bounded by CUDA kernel launch overhead (~0.76 ms). At a saturation batch size of 4096, the system achieves a throughput of ~2.59M options/second, with per-option latency converging to 0.39 μs. Throughout the scaling regimes, SAE structural sparsity remains strictly bounded at $\approx 50\%$, confirming the stability of the latent representation irrespective of computational load.
106
+
107
+ ## 7. Limitations and Citation
108
+
109
+ Thales is constrained by the distribution of its training data. Its surrogate accuracy degrades gracefully when extrapolating beyond the volatility surface parameter boundaries observed during training. The current iteration does not explicitly model jump-diffusion processes or discrete dividend schedules.
110
 
111
+ If utilizing the Thales architecture in published research or enterprise environments, please cite:
112
 
113
+ ```bibtex
114
+ @misc{thales2026,
115
+ author = {Chunjiang Intelligence},
116
+ title = {Thales: Interpretable and Physics-Informed Deep Learning for Quantitative Option Pricing},
117
+ year = {2026},
118
+ publisher = {Hugging Face},
119
+ howpublished = {\url{https://huggingface.co/Chunjiang-Intelligence/Thales}}
120
+ }
121
+ ```