|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- pytorch |
|
|
- ppo |
|
|
- aerospace |
|
|
- flight-control |
|
|
- boeing-747 |
|
|
- continuous-control |
|
|
- gymnasium |
|
|
library_name: tensoraerospace |
|
|
pipeline_tag: reinforcement-learning |
|
|
model-index: |
|
|
- name: PPO-B747-PitchControl |
|
|
results: |
|
|
- task: |
|
|
type: reinforcement-learning |
|
|
name: Pitch Angle Tracking Control |
|
|
dataset: |
|
|
type: custom |
|
|
name: Boeing 747 Longitudinal Dynamics Simulation |
|
|
metrics: |
|
|
- type: eval_reward |
|
|
value: 0.9137 |
|
|
name: Best Evaluation Reward |
|
|
- type: overshoot |
|
|
value: 0.49 |
|
|
name: Overshoot (%) |
|
|
- type: settling_time |
|
|
value: 0.60 |
|
|
name: Settling Time (s) |
|
|
- type: rise_time |
|
|
value: 0.30 |
|
|
name: Rise Time (s) |
|
|
- type: static_error |
|
|
value: 0.0046 |
|
|
name: Static Error |
|
|
--- |
|
|
|
|
|
# PPO Agent for Boeing 747 Pitch Angle Control |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
|
|
|
**Proximal Policy Optimization (PPO) for Longitudinal Aircraft Control** |
|
|
|
|
|
[](https://github.com/TensorAeroSpace/TensorAeroSpace) |
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[](https://pytorch.org/) |
|
|
|
|
|
</div> |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a **Proximal Policy Optimization (PPO)** agent trained to control the pitch angle (θ) of a **Boeing 747** aircraft in a longitudinal flight dynamics simulation. The agent receives normalized state observations and outputs continuous elevator deflection commands to track reference pitch angle signals. |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
### Intended Uses |
|
|
|
|
|
- **Primary Use**: Automatic pitch angle tracking and stabilization for Boeing 747 aircraft simulation |
|
|
- **Research Applications**: Benchmarking RL algorithms for aerospace control systems |
|
|
- **Educational**: Learning reinforcement learning concepts in aerospace applications |
|
|
- **Hybrid Control**: Can be combined with PID/MPC controllers for robust flight control |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
The PPO agent consists of separate **Actor** and **Critic** neural networks: |
|
|
|
|
|
#### Actor Network (Policy) |
|
|
| Layer | Configuration | |
|
|
|-------|--------------| |
|
|
| Input | 4 (observation dim) | |
|
|
| Hidden 1 | Linear(4, 256) + ReLU | |
|
|
| Hidden 2 | Linear(256, 256) + ReLU | |
|
|
| Output (μ) | Linear(256, 1) + Tanh | |
|
|
| Output (log σ) | Linear(256, 1), clamped to [-5.0, -1.5] | |
|
|
|
|
|
#### Critic Network (Value Function) |
|
|
| Layer | Configuration | |
|
|
|-------|--------------| |
|
|
| Input | 4 (observation dim) | |
|
|
| Hidden 1 | Linear(4, 256) + ReLU | |
|
|
| Hidden 2 | Linear(256, 256) + ReLU | |
|
|
| Output | Linear(256, 1) | |
|
|
|
|
|
### State Space |
|
|
|
|
|
The observation vector consists of 4 normalized states representing the longitudinal dynamics: |
|
|
|
|
|
| Index | State | Description | Units | |
|
|
|-------|-------|-------------|-------| |
|
|
| 0 | u | Forward velocity perturbation | normalized | |
|
|
| 1 | w | Vertical velocity perturbation | normalized | |
|
|
| 2 | q | Pitch rate | normalized | |
|
|
| 3 | θ | Pitch angle (tracking target) | normalized | |
|
|
|
|
|
### Action Space |
|
|
|
|
|
| Dimension | Description | Range | |
|
|
|-----------|-------------|-------| |
|
|
| 1 | Elevator deflection | [-1.0, 1.0] (normalized) | |
|
|
|
|
|
The normalized action is scaled to physical elevator deflection in degrees by the environment. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
|----------------|-------| |
|
|
| Algorithm | PPO (Clip) | |
|
|
| Max Episodes | 90,000 | |
|
|
| Rollout Length | 256 steps | |
|
|
| Batch Size | 16,384 | |
|
|
| Epochs per Update | 2 | |
|
|
| Clip Parameter (ε) | 0.15 | |
|
|
| Discount Factor (γ) | 0.995 | |
|
|
| GAE Lambda (λ) | 0.95 | |
|
|
| Actor Learning Rate | 1e-4 | |
|
|
| Critic Learning Rate | 2e-4 | |
|
|
| Entropy Coefficient | 0.01 | |
|
|
| Max Gradient Norm | 0.5 | |
|
|
| Target KL | 0.01 | |
|
|
| Normalize Observations | False | |
|
|
| Normalize Rewards | True | |
|
|
|
|
|
### Environment Configuration |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Environment | `ImprovedB747VecEnvTorch` | |
|
|
| Number of Parallel Envs | 64 | |
|
|
| Time Step (dt) | 0.1 s | |
|
|
| Episode Duration | 20 s | |
|
|
| Initial State | [0, 0, 0, 0] | |
|
|
| Reference Signal | Step function | |
|
|
| Step Amplitude Range | 1.0° | |
|
|
| Step Time Range | 5.0 s | |
|
|
|
|
|
### Training Infrastructure |
|
|
|
|
|
- **Hardware**: NVIDIA GPU with CUDA support |
|
|
- **Framework**: PyTorch 2.0+ |
|
|
- **Training Time**: ~7,510 episodes to best checkpoint |
|
|
- **Best Episode**: 7,510 |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
### Performance Metrics |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Best Evaluation Reward** | 0.9137 | |
|
|
| **Overshoot** | 0.49% | |
|
|
| **Settling Time** | 0.60 s | |
|
|
| **Rise Time** | 0.30 s | |
|
|
| **Peak Time** | 0.80 s | |
|
|
| **Static Error** | -0.0046 | |
|
|
| **Oscillation Count** | 1 | |
|
|
| **Performance Index** | 3.06 | |
|
|
|
|
|
### Integral Criteria |
|
|
|
|
|
| Criterion | Value | |
|
|
|-----------|-------| |
|
|
| IAE (Integral Absolute Error) | 4.08 | |
|
|
| ISE (Integral Squared Error) | 2.64 | |
|
|
| ITAE (Integral Time-weighted Absolute Error) | 4.77 | |
|
|
|
|
|
### Step Response Characteristics |
|
|
|
|
|
The agent demonstrates excellent step tracking performance with: |
|
|
- ✅ Minimal overshoot (<1%) |
|
|
- ✅ Fast settling time (0.6s) |
|
|
- ✅ Quick rise time (0.3s) |
|
|
- ✅ Near-zero static error |
|
|
- ✅ Minimal oscillations (1 cycle) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install tensoraerospace |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
import torch |
|
|
from tensoraerospace.agent.ppo.model import PPO |
|
|
from tensoraerospace.envs.b747 import ImprovedB747Env |
|
|
from tensoraerospace.signals.standart import unit_step |
|
|
from tensoraerospace.utils import generate_time_period, convert_tp_to_sec_tp |
|
|
|
|
|
# Load pretrained agent |
|
|
agent = PPO.from_pretrained("TensorAeroSpace/ppo-b747-pitch-control") |
|
|
|
|
|
# Setup environment |
|
|
dt = 0.1 |
|
|
tp = generate_time_period(tn=20, dt=dt) |
|
|
tps = convert_tp_to_sec_tp(tp, dt=dt) |
|
|
|
|
|
# Create step reference signal (1 degree step at t=5s) |
|
|
reference = unit_step(tp=tps, degree=1.0, time_step=5.0, output_rad=True).reshape(1, -1) |
|
|
|
|
|
env = ImprovedB747Env( |
|
|
initial_state=np.array([0.0, 0.0, 0.0, 0.0], dtype=np.float32), |
|
|
reference_signal=reference, |
|
|
number_time_steps=len(tp), |
|
|
dt=dt, |
|
|
) |
|
|
|
|
|
# Run evaluation |
|
|
obs, _ = env.reset() |
|
|
done = False |
|
|
|
|
|
while not done: |
|
|
action, mean_action, _ = agent.act(obs, deterministic=True) |
|
|
action_scalar = float(np.asarray(mean_action).flatten()[0]) |
|
|
obs, reward, terminated, truncated, info = env.step(action_scalar) |
|
|
done = terminated or truncated |
|
|
``` |
|
|
|
|
|
### Load from Local Checkpoint |
|
|
|
|
|
```python |
|
|
from tensoraerospace.agent.ppo.model import PPO |
|
|
|
|
|
# Load from local directory |
|
|
agent = PPO.from_pretrained("./path/to/checkpoint") |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Fixed Aircraft Model**: Trained specifically on Boeing 747 longitudinal dynamics; may not generalize to other aircraft |
|
|
- **Step Reference Only**: Optimized for step reference tracking; performance on other signal types (sine, ramp) may vary |
|
|
- **Simulation Gap**: Trained in simulation; real-world deployment would require additional validation |
|
|
- **State Observability**: Assumes all 4 longitudinal states are observable |
|
|
- **Linear Dynamics**: Based on linearized aircraft model around trim conditions |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- **Not for Real Flight Control**: This model is for research and educational purposes only. It should NOT be used for actual aircraft control systems without extensive testing, certification, and regulatory approval. |
|
|
- **Simulation Only**: All training and evaluation performed in simulation environments. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{tensoraerospace2024, |
|
|
title = {TensorAeroSpace: Advanced Aerospace Control Systems \& Reinforcement Learning Framework}, |
|
|
author = {TensorAeroSpace Team}, |
|
|
year = {2024}, |
|
|
url = {https://github.com/TensorAeroSpace/TensorAeroSpace}, |
|
|
license = {MIT} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
TensorAeroSpace Team |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
- **GitHub**: [TensorAeroSpace/TensorAeroSpace](https://github.com/TensorAeroSpace/TensorAeroSpace) |
|
|
- **Documentation**: [tensoraerospace.readthedocs.io](https://tensoraerospace.readthedocs.io/) |
|
|
- **Hugging Face**: [TensorAeroSpace](https://huggingface.co/TensorAeroSpace) |
|
|
|
|
|
|