|
|
--- |
|
|
library_name: tensoraerospace |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- control |
|
|
- aerospace |
|
|
- boeing-747 |
|
|
- gymnasium |
|
|
- sac |
|
|
license: mit |
|
|
datasets: [] |
|
|
language: [] |
|
|
model-index: |
|
|
- name: SAC Boeing 747 Pitch Control (ImprovedB747Env) |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# SAC Boeing 747 Pitch Control (ImprovedB747Env) |
|
|
|
|
|
This model is a Soft Actor-Critic (SAC) agent trained to control the pitch channel of a Boeing 747 in the `tensoraerospace.envs.b747.ImprovedB747Env` environment. The agent tracks a reference pitch profile while minimizing control effort and promoting smoothness. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** TensorAeroSpace |
|
|
- **Shared by:** TensorAeroSpace |
|
|
- **Model type:** Reinforcement Learning — Soft Actor-Critic (continuous control) |
|
|
- **Environment:** `tensoraerospace.envs.b747.ImprovedB747Env` |
|
|
- **Action space:** normalized [-1, 1] (mapped to stabilizer angle ±25 deg) |
|
|
- **Observation:** `[norm_pitch_error, norm_q, norm_theta, norm_prev_action]` |
|
|
- **License:** MIT |
|
|
- **Finetuned from:** Trained from scratch |
|
|
|
|
|
### Sources |
|
|
|
|
|
- **Repository:** https://github.com/tensoraerospace/tensoraerospace |
|
|
- **Docs:** https://tensoraerospace.readthedocs.io/ |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
Use the pretrained policy for simulation of pitch tracking tasks in the provided environment. Suitable for research and demonstration of RL-based flight control. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Real aircraft control or safety-critical deployment without rigorous certification. |
|
|
- Environments and state/action definitions that differ from `ImprovedB747Env`. |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
### Install |
|
|
|
|
|
```bash |
|
|
pip install tensoraerospace |
|
|
``` |
|
|
|
|
|
### Load the Agent Locally |
|
|
|
|
|
```python |
|
|
from tensoraerospace.agent.sac import SAC |
|
|
|
|
|
agent = SAC.from_pretrained( |
|
|
"./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/", |
|
|
load_gradients=False, # set True to resume training with optimizer states |
|
|
) |
|
|
|
|
|
# Evaluate |
|
|
obs, info = agent.env.reset() |
|
|
done = False |
|
|
while not done: |
|
|
action = agent.select_action(obs, evaluate=True) |
|
|
obs, reward, terminated, truncated, info = agent.env.step(action) |
|
|
done = terminated or truncated |
|
|
``` |
|
|
|
|
|
### Continue Training from Checkpoint |
|
|
|
|
|
```python |
|
|
from tensoraerospace.agent.sac import SAC |
|
|
|
|
|
agent = SAC.from_pretrained( |
|
|
"./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/", |
|
|
load_gradients=True, |
|
|
) |
|
|
|
|
|
agent.train(num_episodes=10) |
|
|
agent.save("./runs", save_gradients=True) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
The saved `config.json` contains the exact environment and policy parameters used for training. Key entries: |
|
|
|
|
|
- `env.name`: `tensoraerospace.envs.b747.ImprovedB747Env` |
|
|
- `env.params`: |
|
|
- `initial_state`: `[0, 0, 0, 0]` |
|
|
- `reference_signal`: shape `(1, 201)` sinusoidal-like target for pitch |
|
|
- `number_time_steps`: `201` |
|
|
- `policy.params`: |
|
|
- `gamma`: `0.99` |
|
|
- `tau`: `0.02` |
|
|
- `alpha`: `auto` via automatic entropy tuning |
|
|
- `batch_size`: `256` |
|
|
- `updates_per_step`: `2` |
|
|
- `target_update_interval`: `1` |
|
|
- `lr`: `3e-4` |
|
|
- `policy_type`: `Gaussian` |
|
|
- `device`: `cpu` |
|
|
|
|
|
Note: With `automatic_entropy_tuning=True`, `log_alpha` and `alpha_optim` state are saved and can be restored. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The agent was validated in simulation on the same environment by tracking the provided reference pitch signal over `201` steps. Reward aligns with negative quadratic costs on tracking error, pitch rate, control magnitude, smoothness, and jerk. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- Simulation fidelity limits real-world applicability. |
|
|
- Trained on a specific reference and time horizon; generalization requires retraining. |
|
|
- Safety constraints are implicit via reward shaping and bounds; not certified for real flight. |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
Training performed on CPU for this checkpoint. For large-scale training, estimate CO2eq with the [ML CO2 Impact](https://mlco2.github.io/impact#compute) calculator. |
|
|
|
|
|
## Technical Specs |
|
|
|
|
|
- **Algorithm:** Soft Actor-Critic |
|
|
- **Networks:** MLP policy and twin Q-networks (hidden size: 256 by default) |
|
|
- **Frameworks:** PyTorch, Gymnasium |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the TensorAeroSpace repository. |
|
|
|
|
|
```bibtex |
|
|
@misc{tensoraerospace, |
|
|
title = {TensorAeroSpace: Aerospace Simulation and RL Framework}, |
|
|
author = {TensorAeroSpace contributors}, |
|
|
year = {2023}, |
|
|
howpublished = {\url{https://github.com/tensoraerospace/tensoraerospace}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
TensorAeroSpace Team |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions, please open an issue at the repository or email support@tensoraerospace.org. |
|
|
|
|
|
|
|
|
|