Spaces:
Sleeping
Research Module
Abstract
The research/ directory houses experimental and pre-production components that extend the Portfolio Engine beyond classical mean-variance optimisation. These modules investigate cybernetic control theory and model-based reinforcement learning as complementary paradigms for adaptive portfolio management. None of these components are currently integrated into the production pipeline; they constitute a forward-looking research agenda grounded in control-theoretic and decision-theoretic foundations.
1. Cybernetic Control Systems
1.1 PID Volatility Controller — cybernetic.py
Theoretical Basis. The Proportional-Integral-Derivative (PID) controller, first formalised by Minorsky (1922) and refined by Ziegler & Nichols (1942), is the workhorse of industrial process control. We adapt this framework to the problem of volatility targeting, an approach widely used in institutional risk management (Moreira & Muir, 2017; Harvey et al., 2018).
Mechanism. The controller measures a portfolio's realised volatility over a rolling window (default: 21 trading days, annualised via the √252 scaling convention). It computes the error signal e(t) = σ_target − σ_realised and derives a leverage multiplier:
leverage(t) = 1 + K_p · e(t) + K_i · ∫e(τ)dτ + K_d · de(t)/dt
- Proportional gain (K_p = 2.0): Immediate response to current deviation.
- Integral gain (K_i = 0.5): Corrects persistent bias; subject to anti-windup clamping at ±0.5 to prevent integrator saturation.
- Derivative gain (K_d = 0.3): Anticipates the direction of error change, providing damping.
Leverage is hard-clamped to [0.3, 1.5] to enforce position limits consistent with institutional mandates.
Key Insight. Volatility is substantially more predictable than returns (Andersen et al., 2003). A PID controller exploiting this property delivers stable risk exposure without requiring accurate return forecasts.
1.2 Adaptive Risk Controller — cybernetic.py
Concept. This module implements a homeostatic setpoint adjustment for the PID controller. Rather than using a fixed volatility target, the outer loop adjusts σ_target according to the prevailing market regime:
| Regime | Multiplier | Effective Target |
|---|---|---|
| Bull / Low Volatility | 1.2× | 18% annualised |
| Normal / Chop | 1.0× | 15% annualised |
| Crash / High Volatility | 0.5× | 7.5% annualised |
This nested-loop architecture mirrors Ashby's Law of Requisite Variety (1956): the controller must possess at least as much regulatory diversity as the environment it governs.
1.3 Three-Layer Cybernetic Ensemble — cybernetic_ensemble.py
Architecture. The CyberneticPortfolioController implements a hierarchical control system with three timescales, inspired by Wiener's cybernetic feedback principles (1948):
| Layer | Component | Timescale | Function |
|---|---|---|---|
| 1 | PID Controller | Intraday–Daily | Instant volatility regulation |
| 2 | Differentiable Optimiser | Daily | Mean-variance weight computation |
| 3 | Dreamer RL Agent | Weekly–Monthly | Meta-parameter adaptation |
Each layer operates on the output of the layer below it. Faster layers handle high-frequency perturbations; slower layers learn structural adaptations from accumulated performance data.
MetaController. A fourth supervisory layer (MetaController) monitors tracking error against a benchmark and dynamically increases or decreases control complexity—adjusting PID gains and exploration parameters—based on rolling performance diagnostics. This implements Ashby's principle at the architectural level.
2. Dreamer World-Model Agent — research/dreamer/
2.1 Overview
The dreamer/ package implements a variant of the DreamerV2 world-model agent (Hafner et al., 2021) adapted for financial time series. The architecture learns a latent dynamics model from historical observation-action-reward trajectories and then trains an actor-critic pair entirely in imagination, avoiding the sample-inefficiency of model-free reinforcement learning.
2.2 Components
| Module | Class / Function | Purpose |
|---|---|---|
rssm.py |
RSSM |
Recurrent State-Space Model with GRU dynamics, stochastic latent state, and prior/posterior networks |
rssm.py |
RSSMState |
Lightweight container for the concatenated deterministic–stochastic state vector |
networks.py |
Encoder, Decoder |
Observation embedding and reconstruction networks (2-layer MLP with ELU activations) |
networks.py |
RewardModel |
Predicts scalar reward (Sharpe ratio proxy) from latent features |
networks.py |
Actor |
Policy network outputting portfolio weights via softmax (ensures simplex constraint) |
networks.py |
Critic, HomeostaticCritic |
Value estimation; the homeostatic variant maintains a slowly-adapting setpoint |
buffer.py |
ReplayBuffer |
Sequence replay buffer storing variable-length episodes with edge-padding |
agent.py |
AgenticForecaster |
Main agent class: world-model training, latent-space actor-critic training, and inference |
agent.py |
HomeostaticAgenticForecaster |
Extension with homeostatic critic and periodic setpoint updates |
2.3 Training Procedure
World Model. Given batches of (observations, actions, rewards) sequences of shape (B, T, D):
- The encoder maps each observation to an embedding.
- The RSSM rolls forward through time, producing posterior states conditioned on real observations and prior states from the dynamics model alone.
- The decoder reconstructs observations from latent features; the reward model predicts scalar rewards.
- Loss = Reconstruction MSE + Reward MSE + KL(posterior ‖ prior), with free-nats clamping (default: 3.0 nats) to prevent posterior collapse.
Actor-Critic. Training occurs entirely in the latent imagination space:
- From a batch of start states sampled from the posterior, the actor rolls out an imagined trajectory of H steps (default: 15).
- The target critic estimates values along the trajectory; TD(λ) returns are computed with γ = 0.99, λ = 0.95.
- The actor maximises expected λ-returns; the critic minimises MSE against the λ-return targets.
- A Polyak-averaged target critic (τ = 0.05) stabilises training.
2.4 Homeostatic Critic
The HomeostaticCritic decomposes value prediction as:
V(s) = setpoint + deviation(s)
The setpoint adapts slowly via exponential moving average (rate = 0.01), implementing a biological homeostasis analogy: the critic maintains a baseline expectation and learns only deviations from it. This stabilises learning in non-stationary financial environments where the absolute scale of returns drifts over time.
3. Integration Test — test_dreamer.py
A standalone integration test validates the full Dreamer pipeline:
- Generates 20 synthetic episodes of random observations, normalised portfolio-weight actions, and scalar rewards.
- Populates a
ReplayBufferand samples a batch. - Trains the world model for one gradient step, verifying loss convergence.
- Trains the actor-critic in imagination from the RSSM initial state, verifying gradient flow.
This test is intended for rapid smoke-testing during development and does not constitute a performance benchmark.
4. Status and Roadmap
| Item | Status |
|---|---|
| PID volatility controller | ✅ Implemented |
| Adaptive regime-based target | ✅ Implemented |
| Cybernetic ensemble (3-layer) | ✅ Implemented |
| Dreamer world-model training | ✅ Implemented |
| Homeostatic critic variant | ✅ Implemented |
| Integration with production pipeline | ⬜ Not started |
| Hyperparameter tuning on real data | ⬜ Not started |
| Out-of-sample performance evaluation | ⬜ Not started |
References
- Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71(2), 579–625.
- Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman & Hall.
- Hafner, D., Lillicrap, T., Norouzi, M., & Ba, J. (2021). Mastering Atari with discrete world models. ICLR 2021.
- Harvey, C. R., Hoyle, E., Korgaonkar, R., Rattray, S., Sargaison, M., & Van Hemert, O. (2018). The impact of volatility targeting. Journal of Portfolio Management, 45(1), 14–33.
- Minorsky, N. (1922). Directional stability of automatically steered bodies. Journal of the American Society of Naval Engineers, 34(2), 280–309.
- Moreira, A., & Muir, T. (2017). Volatility-managed portfolios. Journal of Finance, 72(4), 1611–1644.
- Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.
- Ziegler, J. G., & Nichols, N. B. (1942). Optimum settings for automatic controllers. Transactions of the ASME, 64(11), 759–768.