ppo-b747-step-response / README.md

Mr8bit

Create README.md

b74c508 verified 13 days ago

preview code

raw

history blame contribute delete

8.54 kB

metadata

license: mit
language:
  - en
tags:
  - reinforcement-learning
  - pytorch
  - ppo
  - aerospace
  - flight-control
  - boeing-747
  - continuous-control
  - gymnasium
library_name: tensoraerospace
pipeline_tag: reinforcement-learning
model-index:
  - name: PPO-B747-PitchControl
    results:
      - task:
          type: reinforcement-learning
          name: Pitch Angle Tracking Control
        dataset:
          type: custom
          name: Boeing 747 Longitudinal Dynamics Simulation
        metrics:
          - type: eval_reward
            value: 0.9137
            name: Best Evaluation Reward
          - type: overshoot
            value: 0.49
            name: Overshoot (%)
          - type: settling_time
            value: 0.6
            name: Settling Time (s)
          - type: rise_time
            value: 0.3
            name: Rise Time (s)
          - type: static_error
            value: 0.0046
            name: Static Error

PPO Agent for Boeing 747 Pitch Angle Control

Proximal Policy Optimization (PPO) for Longitudinal Aircraft Control

Model Description

This model is a Proximal Policy Optimization (PPO) agent trained to control the pitch angle (θ) of a Boeing 747 aircraft in a longitudinal flight dynamics simulation. The agent receives normalized state observations and outputs continuous elevator deflection commands to track reference pitch angle signals.

Intended Uses

Primary Use: Automatic pitch angle tracking and stabilization for Boeing 747 aircraft simulation
Research Applications: Benchmarking RL algorithms for aerospace control systems
Educational: Learning reinforcement learning concepts in aerospace applications
Hybrid Control: Can be combined with PID/MPC controllers for robust flight control

Model Architecture

The PPO agent consists of separate Actor and Critic neural networks:

Actor Network (Policy)

Layer	Configuration
Input	4 (observation dim)
Hidden 1	Linear(4, 256) + ReLU
Hidden 2	Linear(256, 256) + ReLU
Output (μ)	Linear(256, 1) + Tanh
Output (log σ)	Linear(256, 1), clamped to [-5.0, -1.5]

Critic Network (Value Function)

Layer	Configuration
Input	4 (observation dim)
Hidden 1	Linear(4, 256) + ReLU
Hidden 2	Linear(256, 256) + ReLU
Output	Linear(256, 1)

State Space

The observation vector consists of 4 normalized states representing the longitudinal dynamics:

Index	State	Description	Units
0	u	Forward velocity perturbation	normalized
1	w	Vertical velocity perturbation	normalized
2	q	Pitch rate	normalized
3	θ	Pitch angle (tracking target)	normalized

Action Space

Dimension	Description	Range
1	Elevator deflection	[-1.0, 1.0] (normalized)

The normalized action is scaled to physical elevator deflection in degrees by the environment.

Training Details

Training Configuration

Hyperparameter	Value
Algorithm	PPO (Clip)
Max Episodes	90,000
Rollout Length	256 steps
Batch Size	16,384
Epochs per Update	2
Clip Parameter (ε)	0.15
Discount Factor (γ)	0.995
GAE Lambda (λ)	0.95
Actor Learning Rate	1e-4
Critic Learning Rate	2e-4
Entropy Coefficient	0.01
Max Gradient Norm	0.5
Target KL	0.01
Normalize Observations	False
Normalize Rewards	True

Environment Configuration

Parameter	Value
Environment	`ImprovedB747VecEnvTorch`
Number of Parallel Envs	64
Time Step (dt)	0.1 s
Episode Duration	20 s
Initial State	[0, 0, 0, 0]
Reference Signal	Step function
Step Amplitude Range	1.0°
Step Time Range	5.0 s

Training Infrastructure

Hardware: NVIDIA GPU with CUDA support
Framework: PyTorch 2.0+
Training Time: ~7,510 episodes to best checkpoint
Best Episode: 7,510

Evaluation Results

Performance Metrics

Metric	Value
Best Evaluation Reward	0.9137
Overshoot	0.49%
Settling Time	0.60 s
Rise Time	0.30 s
Peak Time	0.80 s
Static Error	-0.0046
Oscillation Count	1
Performance Index	3.06

Integral Criteria

Criterion	Value
IAE (Integral Absolute Error)	4.08
ISE (Integral Squared Error)	2.64
ITAE (Integral Time-weighted Absolute Error)	4.77

Step Response Characteristics

The agent demonstrates excellent step tracking performance with:

✅ Minimal overshoot (<1%)
✅ Fast settling time (0.6s)
✅ Quick rise time (0.3s)
✅ Near-zero static error
✅ Minimal oscillations (1 cycle)

Usage

Installation

pip install tensoraerospace

Quick Start

import numpy as np
import torch
from tensoraerospace.agent.ppo.model import PPO
from tensoraerospace.envs.b747 import ImprovedB747Env
from tensoraerospace.signals.standart import unit_step
from tensoraerospace.utils import generate_time_period, convert_tp_to_sec_tp

# Load pretrained agent
agent = PPO.from_pretrained("TensorAeroSpace/ppo-b747-pitch-control")

# Setup environment
dt = 0.1
tp = generate_time_period(tn=20, dt=dt)
tps = convert_tp_to_sec_tp(tp, dt=dt)

# Create step reference signal (1 degree step at t=5s)
reference = unit_step(tp=tps, degree=1.0, time_step=5.0, output_rad=True).reshape(1, -1)

env = ImprovedB747Env(
    initial_state=np.array([0.0, 0.0, 0.0, 0.0], dtype=np.float32),
    reference_signal=reference,
    number_time_steps=len(tp),
    dt=dt,
)

# Run evaluation
obs, _ = env.reset()
done = False

while not done:
    action, mean_action, _ = agent.act(obs, deterministic=True)
    action_scalar = float(np.asarray(mean_action).flatten()[0])
    obs, reward, terminated, truncated, info = env.step(action_scalar)
    done = terminated or truncated

Load from Local Checkpoint

from tensoraerospace.agent.ppo.model import PPO

# Load from local directory
agent = PPO.from_pretrained("./path/to/checkpoint")

Limitations

Fixed Aircraft Model: Trained specifically on Boeing 747 longitudinal dynamics; may not generalize to other aircraft
Step Reference Only: Optimized for step reference tracking; performance on other signal types (sine, ramp) may vary
Simulation Gap: Trained in simulation; real-world deployment would require additional validation
State Observability: Assumes all 4 longitudinal states are observable
Linear Dynamics: Based on linearized aircraft model around trim conditions

Ethical Considerations

Not for Real Flight Control: This model is for research and educational purposes only. It should NOT be used for actual aircraft control systems without extensive testing, certification, and regulatory approval.
Simulation Only: All training and evaluation performed in simulation environments.

Citation

If you use this model in your research, please cite:

@software{tensoraerospace2024,
  title = {TensorAeroSpace: Advanced Aerospace Control Systems \& Reinforcement Learning Framework},
  author = {TensorAeroSpace Team},
  year = {2024},
  url = {https://github.com/TensorAeroSpace/TensorAeroSpace},
  license = {MIT}
}

Model Card Authors

TensorAeroSpace Team

Model Card Contact

GitHub: TensorAeroSpace/TensorAeroSpace
Documentation: tensoraerospace.readthedocs.io
Hugging Face: TensorAeroSpace