--- license: mit language: - en tags: - reinforcement-learning - pytorch - ppo - aerospace - flight-control - boeing-747 - continuous-control - gymnasium library_name: tensoraerospace pipeline_tag: reinforcement-learning model-index: - name: PPO-B747-PitchControl results: - task: type: reinforcement-learning name: Pitch Angle Tracking Control dataset: type: custom name: Boeing 747 Longitudinal Dynamics Simulation metrics: - type: eval_reward value: 0.9137 name: Best Evaluation Reward - type: overshoot value: 0.49 name: Overshoot (%) - type: settling_time value: 0.60 name: Settling Time (s) - type: rise_time value: 0.30 name: Rise Time (s) - type: static_error value: 0.0046 name: Static Error --- # PPO Agent for Boeing 747 Pitch Angle Control
![TensorAeroSpace](https://raw.githubusercontent.com/TensorAeroSpace/TensorAeroSpace/main/img/logo-no-background.png) **Proximal Policy Optimization (PPO) for Longitudinal Aircraft Control** [![TensorAeroSpace](https://img.shields.io/badge/%F0%9F%9A%80-TensorAeroSpace-blue)](https://github.com/TensorAeroSpace/TensorAeroSpace) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)
## Model Description This model is a **Proximal Policy Optimization (PPO)** agent trained to control the pitch angle (θ) of a **Boeing 747** aircraft in a longitudinal flight dynamics simulation. The agent receives normalized state observations and outputs continuous elevator deflection commands to track reference pitch angle signals. ![image](https://cdn-uploads.huggingface.co/production/uploads/602bf7c9c4f8038e9a1e0a65/g79y7SGa8VyXCDqDjd_GO.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/602bf7c9c4f8038e9a1e0a65/OZcb5JP_txYA9WEqjHGa5.png) ### Intended Uses - **Primary Use**: Automatic pitch angle tracking and stabilization for Boeing 747 aircraft simulation - **Research Applications**: Benchmarking RL algorithms for aerospace control systems - **Educational**: Learning reinforcement learning concepts in aerospace applications - **Hybrid Control**: Can be combined with PID/MPC controllers for robust flight control ### Model Architecture The PPO agent consists of separate **Actor** and **Critic** neural networks: #### Actor Network (Policy) | Layer | Configuration | |-------|--------------| | Input | 4 (observation dim) | | Hidden 1 | Linear(4, 256) + ReLU | | Hidden 2 | Linear(256, 256) + ReLU | | Output (μ) | Linear(256, 1) + Tanh | | Output (log σ) | Linear(256, 1), clamped to [-5.0, -1.5] | #### Critic Network (Value Function) | Layer | Configuration | |-------|--------------| | Input | 4 (observation dim) | | Hidden 1 | Linear(4, 256) + ReLU | | Hidden 2 | Linear(256, 256) + ReLU | | Output | Linear(256, 1) | ### State Space The observation vector consists of 4 normalized states representing the longitudinal dynamics: | Index | State | Description | Units | |-------|-------|-------------|-------| | 0 | u | Forward velocity perturbation | normalized | | 1 | w | Vertical velocity perturbation | normalized | | 2 | q | Pitch rate | normalized | | 3 | θ | Pitch angle (tracking target) | normalized | ### Action Space | Dimension | Description | Range | |-----------|-------------|-------| | 1 | Elevator deflection | [-1.0, 1.0] (normalized) | The normalized action is scaled to physical elevator deflection in degrees by the environment. ## Training Details ### Training Configuration | Hyperparameter | Value | |----------------|-------| | Algorithm | PPO (Clip) | | Max Episodes | 90,000 | | Rollout Length | 256 steps | | Batch Size | 16,384 | | Epochs per Update | 2 | | Clip Parameter (ε) | 0.15 | | Discount Factor (γ) | 0.995 | | GAE Lambda (λ) | 0.95 | | Actor Learning Rate | 1e-4 | | Critic Learning Rate | 2e-4 | | Entropy Coefficient | 0.01 | | Max Gradient Norm | 0.5 | | Target KL | 0.01 | | Normalize Observations | False | | Normalize Rewards | True | ### Environment Configuration | Parameter | Value | |-----------|-------| | Environment | `ImprovedB747VecEnvTorch` | | Number of Parallel Envs | 64 | | Time Step (dt) | 0.1 s | | Episode Duration | 20 s | | Initial State | [0, 0, 0, 0] | | Reference Signal | Step function | | Step Amplitude Range | 1.0° | | Step Time Range | 5.0 s | ### Training Infrastructure - **Hardware**: NVIDIA GPU with CUDA support - **Framework**: PyTorch 2.0+ - **Training Time**: ~7,510 episodes to best checkpoint - **Best Episode**: 7,510 ## Evaluation Results ### Performance Metrics | Metric | Value | |--------|-------| | **Best Evaluation Reward** | 0.9137 | | **Overshoot** | 0.49% | | **Settling Time** | 0.60 s | | **Rise Time** | 0.30 s | | **Peak Time** | 0.80 s | | **Static Error** | -0.0046 | | **Oscillation Count** | 1 | | **Performance Index** | 3.06 | ### Integral Criteria | Criterion | Value | |-----------|-------| | IAE (Integral Absolute Error) | 4.08 | | ISE (Integral Squared Error) | 2.64 | | ITAE (Integral Time-weighted Absolute Error) | 4.77 | ### Step Response Characteristics The agent demonstrates excellent step tracking performance with: - ✅ Minimal overshoot (<1%) - ✅ Fast settling time (0.6s) - ✅ Quick rise time (0.3s) - ✅ Near-zero static error - ✅ Minimal oscillations (1 cycle) ## Usage ### Installation ```bash pip install tensoraerospace ``` ### Quick Start ```python import numpy as np import torch from tensoraerospace.agent.ppo.model import PPO from tensoraerospace.envs.b747 import ImprovedB747Env from tensoraerospace.signals.standart import unit_step from tensoraerospace.utils import generate_time_period, convert_tp_to_sec_tp # Load pretrained agent agent = PPO.from_pretrained("TensorAeroSpace/ppo-b747-pitch-control") # Setup environment dt = 0.1 tp = generate_time_period(tn=20, dt=dt) tps = convert_tp_to_sec_tp(tp, dt=dt) # Create step reference signal (1 degree step at t=5s) reference = unit_step(tp=tps, degree=1.0, time_step=5.0, output_rad=True).reshape(1, -1) env = ImprovedB747Env( initial_state=np.array([0.0, 0.0, 0.0, 0.0], dtype=np.float32), reference_signal=reference, number_time_steps=len(tp), dt=dt, ) # Run evaluation obs, _ = env.reset() done = False while not done: action, mean_action, _ = agent.act(obs, deterministic=True) action_scalar = float(np.asarray(mean_action).flatten()[0]) obs, reward, terminated, truncated, info = env.step(action_scalar) done = terminated or truncated ``` ### Load from Local Checkpoint ```python from tensoraerospace.agent.ppo.model import PPO # Load from local directory agent = PPO.from_pretrained("./path/to/checkpoint") ``` ## Limitations - **Fixed Aircraft Model**: Trained specifically on Boeing 747 longitudinal dynamics; may not generalize to other aircraft - **Step Reference Only**: Optimized for step reference tracking; performance on other signal types (sine, ramp) may vary - **Simulation Gap**: Trained in simulation; real-world deployment would require additional validation - **State Observability**: Assumes all 4 longitudinal states are observable - **Linear Dynamics**: Based on linearized aircraft model around trim conditions ## Ethical Considerations - **Not for Real Flight Control**: This model is for research and educational purposes only. It should NOT be used for actual aircraft control systems without extensive testing, certification, and regulatory approval. - **Simulation Only**: All training and evaluation performed in simulation environments. ## Citation If you use this model in your research, please cite: ```bibtex @software{tensoraerospace2024, title = {TensorAeroSpace: Advanced Aerospace Control Systems \& Reinforcement Learning Framework}, author = {TensorAeroSpace Team}, year = {2024}, url = {https://github.com/TensorAeroSpace/TensorAeroSpace}, license = {MIT} } ``` ## Model Card Authors TensorAeroSpace Team ## Model Card Contact - **GitHub**: [TensorAeroSpace/TensorAeroSpace](https://github.com/TensorAeroSpace/TensorAeroSpace) - **Documentation**: [tensoraerospace.readthedocs.io](https://tensoraerospace.readthedocs.io/) - **Hugging Face**: [TensorAeroSpace](https://huggingface.co/TensorAeroSpace)