Spaces:
Sleeping
Sleeping
| # Research Module | |
| ## Abstract | |
| The `research/` directory houses experimental and pre-production components that extend the Portfolio Engine beyond classical mean-variance optimisation. These modules investigate cybernetic control theory and model-based reinforcement learning as complementary paradigms for adaptive portfolio management. None of these components are currently integrated into the production pipeline; they constitute a forward-looking research agenda grounded in control-theoretic and decision-theoretic foundations. | |
| --- | |
| ## 1. Cybernetic Control Systems | |
| ### 1.1 PID Volatility Controller — `cybernetic.py` | |
| **Theoretical Basis.** The Proportional-Integral-Derivative (PID) controller, first formalised by Minorsky (1922) and refined by Ziegler & Nichols (1942), is the workhorse of industrial process control. We adapt this framework to the problem of *volatility targeting*, an approach widely used in institutional risk management (Moreira & Muir, 2017; Harvey et al., 2018). | |
| **Mechanism.** The controller measures a portfolio's realised volatility over a rolling window (default: 21 trading days, annualised via the √252 scaling convention). It computes the error signal *e(t) = σ_target − σ_realised* and derives a leverage multiplier: | |
| ``` | |
| leverage(t) = 1 + K_p · e(t) + K_i · ∫e(τ)dτ + K_d · de(t)/dt | |
| ``` | |
| - **Proportional gain (K_p = 2.0):** Immediate response to current deviation. | |
| - **Integral gain (K_i = 0.5):** Corrects persistent bias; subject to anti-windup clamping at ±0.5 to prevent integrator saturation. | |
| - **Derivative gain (K_d = 0.3):** Anticipates the direction of error change, providing damping. | |
| Leverage is hard-clamped to [0.3, 1.5] to enforce position limits consistent with institutional mandates. | |
| **Key Insight.** Volatility is substantially more predictable than returns (Andersen et al., 2003). A PID controller exploiting this property delivers stable risk exposure without requiring accurate return forecasts. | |
| ### 1.2 Adaptive Risk Controller — `cybernetic.py` | |
| **Concept.** This module implements a *homeostatic setpoint adjustment* for the PID controller. Rather than using a fixed volatility target, the outer loop adjusts σ_target according to the prevailing market regime: | |
| | Regime | Multiplier | Effective Target | | |
| |---------------------------|-----------|------------------| | |
| | Bull / Low Volatility | 1.2× | 18% annualised | | |
| | Normal / Chop | 1.0× | 15% annualised | | |
| | Crash / High Volatility | 0.5× | 7.5% annualised | | |
| This nested-loop architecture mirrors Ashby's Law of Requisite Variety (1956): the controller must possess at least as much regulatory diversity as the environment it governs. | |
| ### 1.3 Three-Layer Cybernetic Ensemble — `cybernetic_ensemble.py` | |
| **Architecture.** The `CyberneticPortfolioController` implements a hierarchical control system with three timescales, inspired by Wiener's cybernetic feedback principles (1948): | |
| | Layer | Component | Timescale | Function | | |
| |-------|----------------------------|---------------|-----------------------------------| | |
| | 1 | PID Controller | Intraday–Daily | Instant volatility regulation | | |
| | 2 | Differentiable Optimiser | Daily | Mean-variance weight computation | | |
| | 3 | Dreamer RL Agent | Weekly–Monthly | Meta-parameter adaptation | | |
| Each layer operates on the output of the layer below it. Faster layers handle high-frequency perturbations; slower layers learn structural adaptations from accumulated performance data. | |
| **MetaController.** A fourth supervisory layer (`MetaController`) monitors tracking error against a benchmark and dynamically increases or decreases control complexity—adjusting PID gains and exploration parameters—based on rolling performance diagnostics. This implements Ashby's principle at the architectural level. | |
| --- | |
| ## 2. Dreamer World-Model Agent — `research/dreamer/` | |
| ### 2.1 Overview | |
| The `dreamer/` package implements a variant of the DreamerV2 world-model agent (Hafner et al., 2021) adapted for financial time series. The architecture learns a latent dynamics model from historical observation-action-reward trajectories and then trains an actor-critic pair *entirely in imagination*, avoiding the sample-inefficiency of model-free reinforcement learning. | |
| ### 2.2 Components | |
| | Module | Class / Function | Purpose | | |
| |----------------|----------------------------|-----------------------------------------------------------------| | |
| | `rssm.py` | `RSSM` | Recurrent State-Space Model with GRU dynamics, stochastic latent state, and prior/posterior networks | | |
| | `rssm.py` | `RSSMState` | Lightweight container for the concatenated deterministic–stochastic state vector | | |
| | `networks.py` | `Encoder`, `Decoder` | Observation embedding and reconstruction networks (2-layer MLP with ELU activations) | | |
| | `networks.py` | `RewardModel` | Predicts scalar reward (Sharpe ratio proxy) from latent features | | |
| | `networks.py` | `Actor` | Policy network outputting portfolio weights via softmax (ensures simplex constraint) | | |
| | `networks.py` | `Critic`, `HomeostaticCritic` | Value estimation; the homeostatic variant maintains a slowly-adapting setpoint | | |
| | `buffer.py` | `ReplayBuffer` | Sequence replay buffer storing variable-length episodes with edge-padding | | |
| | `agent.py` | `AgenticForecaster` | Main agent class: world-model training, latent-space actor-critic training, and inference | | |
| | `agent.py` | `HomeostaticAgenticForecaster` | Extension with homeostatic critic and periodic setpoint updates | | |
| ### 2.3 Training Procedure | |
| **World Model.** Given batches of `(observations, actions, rewards)` sequences of shape `(B, T, D)`: | |
| 1. The encoder maps each observation to an embedding. | |
| 2. The RSSM rolls forward through time, producing posterior states conditioned on real observations and prior states from the dynamics model alone. | |
| 3. The decoder reconstructs observations from latent features; the reward model predicts scalar rewards. | |
| 4. Loss = Reconstruction MSE + Reward MSE + KL(posterior ‖ prior), with free-nats clamping (default: 3.0 nats) to prevent posterior collapse. | |
| **Actor-Critic.** Training occurs entirely in the latent imagination space: | |
| 1. From a batch of start states sampled from the posterior, the actor rolls out an imagined trajectory of *H* steps (default: 15). | |
| 2. The target critic estimates values along the trajectory; TD(λ) returns are computed with γ = 0.99, λ = 0.95. | |
| 3. The actor maximises expected λ-returns; the critic minimises MSE against the λ-return targets. | |
| 4. A Polyak-averaged target critic (τ = 0.05) stabilises training. | |
| ### 2.4 Homeostatic Critic | |
| The `HomeostaticCritic` decomposes value prediction as: | |
| ``` | |
| V(s) = setpoint + deviation(s) | |
| ``` | |
| The setpoint adapts slowly via exponential moving average (rate = 0.01), implementing a biological homeostasis analogy: the critic maintains a baseline expectation and learns only *deviations* from it. This stabilises learning in non-stationary financial environments where the absolute scale of returns drifts over time. | |
| --- | |
| ## 3. Integration Test — `test_dreamer.py` | |
| A standalone integration test validates the full Dreamer pipeline: | |
| 1. Generates 20 synthetic episodes of random observations, normalised portfolio-weight actions, and scalar rewards. | |
| 2. Populates a `ReplayBuffer` and samples a batch. | |
| 3. Trains the world model for one gradient step, verifying loss convergence. | |
| 4. Trains the actor-critic in imagination from the RSSM initial state, verifying gradient flow. | |
| This test is intended for rapid smoke-testing during development and does not constitute a performance benchmark. | |
| --- | |
| ## 4. Status and Roadmap | |
| | Item | Status | | |
| |----------------------------------------|----------------| | |
| | PID volatility controller | ✅ Implemented | | |
| | Adaptive regime-based target | ✅ Implemented | | |
| | Cybernetic ensemble (3-layer) | ✅ Implemented | | |
| | Dreamer world-model training | ✅ Implemented | | |
| | Homeostatic critic variant | ✅ Implemented | | |
| | Integration with production pipeline | ⬜ Not started | | |
| | Hyperparameter tuning on real data | ⬜ Not started | | |
| | Out-of-sample performance evaluation | ⬜ Not started | | |
| --- | |
| ## References | |
| - Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and forecasting realized volatility. *Econometrica*, 71(2), 579–625. | |
| - Ashby, W. R. (1956). *An Introduction to Cybernetics*. Chapman & Hall. | |
| - Hafner, D., Lillicrap, T., Norouzi, M., & Ba, J. (2021). Mastering Atari with discrete world models. *ICLR 2021*. | |
| - Harvey, C. R., Hoyle, E., Korgaonkar, R., Rattray, S., Sargaison, M., & Van Hemert, O. (2018). The impact of volatility targeting. *Journal of Portfolio Management*, 45(1), 14–33. | |
| - Minorsky, N. (1922). Directional stability of automatically steered bodies. *Journal of the American Society of Naval Engineers*, 34(2), 280–309. | |
| - Moreira, A., & Muir, T. (2017). Volatility-managed portfolios. *Journal of Finance*, 72(4), 1611–1644. | |
| - Wiener, N. (1948). *Cybernetics: Or Control and Communication in the Animal and the Machine*. MIT Press. | |
| - Ziegler, J. G., & Nichols, N. B. (1942). Optimum settings for automatic controllers. *Transactions of the ASME*, 64(11), 759–768. | |