Mr8bit commited on
Commit
a2d051f
·
verified ·
1 Parent(s): aa32376

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: tensoraerospace
3
+ tags:
4
+ - reinforcement-learning
5
+ - control
6
+ - aerospace
7
+ - boeing-747
8
+ - gymnasium
9
+ - sac
10
+ license: mit
11
+ datasets: []
12
+ language: []
13
+ model-index:
14
+ - name: SAC Boeing 747 Pitch Control (ImprovedB747Env)
15
+ results: []
16
+ ---
17
+
18
+ # SAC Boeing 747 Pitch Control (ImprovedB747Env)
19
+
20
+ This model is a Soft Actor-Critic (SAC) agent trained to control the pitch channel of a Boeing 747 in the `tensoraerospace.envs.b747.ImprovedB747Env` environment. The agent tracks a reference pitch profile while minimizing control effort and promoting smoothness.
21
+
22
+ ## Model Details
23
+
24
+ - **Developed by:** TensorAeroSpace
25
+ - **Shared by:** TensorAeroSpace
26
+ - **Model type:** Reinforcement Learning — Soft Actor-Critic (continuous control)
27
+ - **Environment:** `tensoraerospace.envs.b747.ImprovedB747Env`
28
+ - **Action space:** normalized [-1, 1] (mapped to stabilizer angle ±25 deg)
29
+ - **Observation:** `[norm_pitch_error, norm_q, norm_theta, norm_prev_action]`
30
+ - **License:** MIT
31
+ - **Finetuned from:** Trained from scratch
32
+
33
+ ### Sources
34
+
35
+ - **Repository:** https://github.com/tensoraerospace/tensoraerospace
36
+ - **Docs:** https://tensoraerospace.readthedocs.io/
37
+
38
+ ## Uses
39
+
40
+ ### Direct Use
41
+
42
+ Use the pretrained policy for simulation of pitch tracking tasks in the provided environment. Suitable for research and demonstration of RL-based flight control.
43
+
44
+ ### Out-of-Scope Use
45
+
46
+ - Real aircraft control or safety-critical deployment without rigorous certification.
47
+ - Environments and state/action definitions that differ from `ImprovedB747Env`.
48
+
49
+ ## How to Get Started
50
+
51
+ ### Install
52
+
53
+ ```bash
54
+ pip install tensoraerospace
55
+ ```
56
+
57
+ ### Load the Agent Locally
58
+
59
+ ```python
60
+ from tensoraerospace.agent.sac import SAC
61
+
62
+ agent = SAC.from_pretrained(
63
+ "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
64
+ load_gradients=False, # set True to resume training with optimizer states
65
+ )
66
+
67
+ # Evaluate
68
+ obs, info = agent.env.reset()
69
+ done = False
70
+ while not done:
71
+ action = agent.select_action(obs, evaluate=True)
72
+ obs, reward, terminated, truncated, info = agent.env.step(action)
73
+ done = terminated or truncated
74
+ ```
75
+
76
+ ### Continue Training from Checkpoint
77
+
78
+ ```python
79
+ from tensoraerospace.agent.sac import SAC
80
+
81
+ agent = SAC.from_pretrained(
82
+ "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
83
+ load_gradients=True,
84
+ )
85
+
86
+ agent.train(num_episodes=10)
87
+ agent.save("./runs", save_gradients=True)
88
+ ```
89
+
90
+ ## Training Details
91
+
92
+ The saved `config.json` contains the exact environment and policy parameters used for training. Key entries:
93
+
94
+ - `env.name`: `tensoraerospace.envs.b747.ImprovedB747Env`
95
+ - `env.params`:
96
+ - `initial_state`: `[0, 0, 0, 0]`
97
+ - `reference_signal`: shape `(1, 201)` sinusoidal-like target for pitch
98
+ - `number_time_steps`: `201`
99
+ - `policy.params`:
100
+ - `gamma`: `0.99`
101
+ - `tau`: `0.02`
102
+ - `alpha`: `auto` via automatic entropy tuning
103
+ - `batch_size`: `256`
104
+ - `updates_per_step`: `2`
105
+ - `target_update_interval`: `1`
106
+ - `lr`: `3e-4`
107
+ - `policy_type`: `Gaussian`
108
+ - `device`: `cpu`
109
+
110
+ Note: With `automatic_entropy_tuning=True`, `log_alpha` and `alpha_optim` state are saved and can be restored.
111
+
112
+ ## Evaluation
113
+
114
+ The agent was validated in simulation on the same environment by tracking the provided reference pitch signal over `201` steps. Reward aligns with negative quadratic costs on tracking error, pitch rate, control magnitude, smoothness, and jerk.
115
+
116
+ ## Bias, Risks, and Limitations
117
+
118
+ - Simulation fidelity limits real-world applicability.
119
+ - Trained on a specific reference and time horizon; generalization requires retraining.
120
+ - Safety constraints are implicit via reward shaping and bounds; not certified for real flight.
121
+
122
+ ## Environmental Impact
123
+
124
+ Training performed on CPU for this checkpoint. For large-scale training, estimate CO2eq with the [ML CO2 Impact](https://mlco2.github.io/impact#compute) calculator.
125
+
126
+ ## Technical Specs
127
+
128
+ - **Algorithm:** Soft Actor-Critic
129
+ - **Networks:** MLP policy and twin Q-networks (hidden size: 256 by default)
130
+ - **Frameworks:** PyTorch, Gymnasium
131
+
132
+ ## Citation
133
+
134
+ If you use this model, please cite the TensorAeroSpace repository.
135
+
136
+ ```bibtex
137
+ @misc{tensoraerospace,
138
+ title = {TensorAeroSpace: Aerospace Simulation and RL Framework},
139
+ author = {TensorAeroSpace contributors},
140
+ year = {2023},
141
+ howpublished = {\url{https://github.com/tensoraerospace/tensoraerospace}},
142
+ }
143
+ ```
144
+
145
+ ## Model Card Authors
146
+
147
+ TensorAeroSpace Team
148
+
149
+ ## Contact
150
+
151
+ For questions, please open an issue at the repository or email support@tensoraerospace.org.
152
+
153
+