File size: 25,947 Bytes
d431d95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 |
# Classical Robotics {#sec:classical}
::: epigraph
*Know your enemy* \[\...\]
Sun Tzu
:::
::: tldr
Learning-based approaches to robotics are motivated by the need to (1)
generalize across tasks and embodiments (2) reduce dependency on human
expertise (3) leverage historical trends on the production of data---all
traditionally overlooked by dynamics-based techniques.
:::
## Explicit and Implicit Models
{#fig:generating-motion-atlas
width="50%"}
Robotics is concerned with producing artificial motion in the physical
world in useful, reliable and safe fashion. Thus, robotics is an
inherently multi-disciplinar domain: producing autonomous motion in the
physical world requires, to the very least, interfacing different
software (motion planners) and hardware (motion executioners)
components. Further, knowledge of mechanical, electrical, and software
engineering, as well as rigid-body mechanics and control theory have
therefore proven quintessential in robotics since the field first
developed in the 1950s. More recently, Machine Learning (ML) has also
proved effective in robotics, complementing these more traditional
disciplines [@connellRobotLearning1993]. As a direct consequence of its
multi-disciplinar nature, robotics has developed as a rather wide array
of methods, all concerned with the main purpose of .
Methods to produce robotics motion range from traditional *explicit*
models---[^1] methods, leveraging precise descriptions of the mechanics
of robots' rigid bodies and their interactions with eventual obstacles
in the environment---to *implicit* models--- methods, treating
artificial motion as a statistical pattern to learn given multiple
sensorimotor
readings [@agrawalComputationalSensorimotorLearning; @bekrisStateRobotMotion2024].
A variety of methods have been developed between these two extrema. For
instance, @hansenTemporalDifferenceLearning2022 show how learning-based
systems can benefit from information on the physics of problems,
complementing a traditional learning method such as Temporal Difference
(TD)-learning @suttonReinforcementLearningIntroduction2018 with
Model-Predictive Control (MPC). Conversely, as explicit models may be
relying on assumptions proving overly simplistic---or even
unrealistic---in practice, learning can prove effective to improve
modeling of complex phenomena or complement
perception [@mccormacSemanticFusionDense3D2016]. Such examples aim at
demonstrating the richness of approaches to robotics, and
Figure [1](#fig:generating-motion-atlas){reference-type="ref"
reference="fig:generating-motion-atlas"} graphically illustrates some of
the most relevant techniques. Such a list is clearly far from being
exhaustive, and we refer to @bekrisStateRobotMotion2024 for a more
comprehensive overview of both general and application-specific methods
for motion generation. In this section, we wish to introduce the
inherent benefits of ---the core focus on this tutorial.
## Different Types of Motion
{#fig:robotics-platforms-atlas
width="70%"}
In the vast majority of instances, robotics deals with producing motion
via actuating joints connecting nearly entirely-rigid links. A key
distinction between focus areas in robotics is based on whether the
generated motion modifies (1) the absolute state of the environment (via
dexterity), (2) the relative state of the robot with respect to its
environment (exercising mobility skills), or (3) a combination of the
two (Figure [2](#fig:robotics-platforms-atlas){reference-type="ref"
reference="fig:robotics-platforms-atlas"}).
Effects such as (1) are typically achieved *through* the robot, i.e.
generating motion to perform an action inducing a desirable
modification, effectively *manipulating* the environment (manipulation).
Motions like (2) may result in changes in the robot's physical location
within its environment. Generally, modifications to a robot's location
within its environment may be considered instances of the general
*locomotion* problem, further specified as *wheeled* or *legged*
locomotion based on whenever a robot makes use of wheels or leg(s) to
move in the environment. Lastly, an increased level of dynamism in the
robot-environment interactions can be obtained combining (1) and (2),
thus designing systems capable to interact with *and* move within their
environment. This category is problems is typically termed *mobile
manipulation*, and is characterized by a typically much larger set of
control variables compared to either locomotion or manipulation alone.
The traditional body of work developed since the very inception of
robotics is increasingly complemented by learning-based approaches. ML
has indeed proven particularly transformative across the entire robotics
stack, first empowering planning-based techniques with improved state
estimation used for traditional
planning [@tangPerceptionNavigationAutonomous2023] and then end-to-end
replacing controllers, effectively yielding perception-to-action
methods [@koberReinforcementLearningRobotics]. Work in producing robots
capable of navigating a diverse set of terrains demonstrated the premise
of both dynamics and learning-based approaches for
locomotion [@griffinWalkingStabilizationUsing2017; @jiDribbleBotDynamicLegged2023; @leeLearningQuadrupedalLocomotion2020; @margolisRapidLocomotionReinforcement2022],
and recent works on whole-body control indicated the premise of
learning-based approaches to generate rich motion on complex robots,
including
humanoids [@zhangWoCoCoLearningWholeBody2024; @bjorckGR00TN1Open2025].
Manipulation has also been widely studied, particularly considering its
relevance for many impactful use-cases ranging from high-risk
applications for
humans [@fujitaDevelopmentRobotsNuclear2020; @alizadehComprehensiveSurveySpace2024]
to manufacturing [@sannemanStateIndustrialRobotics2020]. While explicit
models have proven fundamental in achieving important milestones towards
the development of modern robotics, recent works leveraging implicit
models proved particularly promising in surpassing scalability and
applicability challenges via
learning [@koberReinforcementLearningRobotics].
## Example: Planar Manipulation
Robot manipulators typically consist of a series of links and joints,
articulated in a chain finally connected to an *end-effector*. Actuated
joints are considered responsible for generating motion of the links,
while the end effector is instead used to perform specific actions at
the target location (e.g., grasping/releasing objects via
closing/opening a gripper end-effector, using a specialized tool like a
screwdriver, etc.).
Recently, the development of low-cost manipulators like the
ALOHA [@zhaoLearningFineGrainedBimanual2023]
ALOHA-2 [@aldacoALOHA2Enhanced] and
SO-100/SO-101 [@knightStandardOpenSO100] platforms significantly lowered
the barrier to entry to robotics, considering the increased
accessibility of these robots compared to more traditional platforms
like the Franka Emika Panda arm
(Figure [3](#fig:robotic-platforms-costs){reference-type="ref"
reference="fig:robotic-platforms-costs"}).
{#fig:robotic-platforms-costs
width="40%"}
Deriving an intuition as per why learning-based approaches are gaining
popularity in the robotics community requires briefly analyzing
traditional approaches for manipulation, leveraging tools like forward
and inverse kinematics (FK, IK) and control theory. Providing a detailed
overview of these methods falls (well) out of the scope of this
tutorial, and we refer the reader to works
including @sicilianoSpringerHandbookRobotics2016
[@lynchModernRoboticsMechanics2017; @tedrakeRoboticManipulationPerception; @tedrakeUnderactuatedRoboticsAlgorithms]
for a much more comprehensive description of these techniques. Here, we
mostly wish to highlight the benefits of ML over these traditional
techniques
{#fig:make-so100-planar-manipulator
width="70%"}
Consider the (simple) case where a SO-100 is restrained from actuating
(1) the shoulder pane and (2) the wrist flex and roll motors. This
effectively reduces the degrees of freedom of the SO-100 from the
original 5+1 (5 joints + 1 gripper) to 2+1 (shoulder lift, elbow flex +
gripper). As the end-effector does not impact motion in this model, the
SO-100 is effectively reduced to the planar manipulator robot presented
in Figure [4](#fig:make-so100-planar-manipulator){reference-type="ref"
reference="fig:make-so100-planar-manipulator"}, where spheres represent
actuators, and solid lines indicate length-$l$ links from the base of
the SO-100 to the end-effector (*ee*).
Further, let us make the simplifying assumption that actuators can
produce rotations up to $2 \pi$ radians. In practice, this is seldom the
case due to movement obstructions caused by the robot body itself (for
instance, the shoulder lift cannot produce counter-clockwise movement
due to the presence of the robot's base used to secure the SO-100 to its
support and host the robot bus), but we will introduce movement
obstruction at a later stage.
All these simplifying assumptions leave us with the planar manipulator
of Figure [5](#fig:planar-manipulation-simple){reference-type="ref"
reference="fig:planar-manipulation-simple"}, free of moving its
end-effector by controlling the angles $\theta_1$ and $\theta_2$,
jointly referred to as the robot's *configuration*, and indicated with
$q = [\theta_1, \theta_2 ] \in [-\pi, +\pi]^2$. The axis attached to the
joints indicate the associated reference frame, whereas circular arrows
indicate the maximal feasible rotation allowed at each joint. In this
tutorial, we do not cover topics related to spatial algebra, and we
instead refer the reader to @lynchModernRoboticsMechanics2017
[Chapter 2] and @tedrakeRoboticManipulationPerception [Chapter 3] for
excellent explanations of the mechanics and theoretical foundations of
producing motion on rigid bodies.
<figure id="fig:planar-manipulator-floor-shelf">
<figure id="fig:planar-manipulation-simple">
<img src="figures/ch2/ch2-planar-manipulator-free.png"
style="height:3.2cm" />
<figcaption>Free to move</figcaption>
</figure>
<figure id="fig:planar-manipulator-floor">
<img src="figures/ch2/ch2-planar-manipulator-floor.png"
style="height:3.2cm" />
<figcaption>Constrained by the surface</figcaption>
</figure>
<figure id="fig:planar-manipulator-floor-shelf">
<img src="figures/ch2/ch2-planar-manipulator-floor-shelf.png"
style="height:3.2cm" />
<figcaption>Constrained by surface and (fixed) obstacle</figcaption>
</figure>
<figcaption>Planar, 2-dof schematic representation of the SO-100
manipulator under diverse deployment settings. From left to right:
completely free of moving; constrained by the presence of the surface;
constrained by the surface and presence of obstacles. Circular arrows
around each joint indicate the maximal rotation feasible at that
joint.</figcaption>
</figure>
Considering the (toy) example presented in
Figure [5](#fig:planar-manipulation-simple){reference-type="ref"
reference="fig:planar-manipulation-simple"}, then we can analytically
write the end-effector's position $p \in \mathbb R^2$ as a function of
the robot's configuration,
$p = p(q), p: \mathcal Q \mapsto \mathbb R^2$. In particular, we have:
$$\begin{equation*}
p(q) =
\begin{pmatrix}
p_x(\theta_1, \theta_2) \\
p_y(\theta_1, \theta_2)
\end{pmatrix}
=
\begin{pmatrix}
l \cos(\theta_1) + l \cos(\theta_1 + \theta_2) \\
l \sin(\theta_1) + l \sin(\theta_1 + \theta_2)
\end{pmatrix}
\in S^{n=2}_{l_1+l_2} = \{ p(q) \in \mathbb R^2: \Vert p(q) \Vert_2^2 \leq (2l)^2, \ \forall q \in \mathcal Q \}
\end{equation*}$$
Deriving the end-effector's *pose*---position *and* orientation---in
some $m$-dimensional space
$\vec{p} \in \mathcal{P} \subset \mathbb{R}^{m}$ starting from the
configuration $\q \in \mathcal Q \subset \mathbb R^n$ of a $n$-joints
robot is referred to as *forward kinematics* (FK), whereas identifying
the configuration corresponding to any given target pose is termed
*inverse kinematics* (IK). In that, FK is used to map a robot
configuration into the corresponding end-effector pose, whereas IK is
used to reconstruct the configuration(s) given an end-effector pose.
In the simplified case here considered (for which $\vec{p} \equiv p$, as
the orientation of the end-effector is disregarded for simplicity), one
can solve the problem of controlling the end-effector's location to
reach a goal position $p^*$ by solving analytically for
$q: p(q) = f_{\FK}(q) = p^*$. However, in the general case, one might
not be able to solve this problem analytically, and can typically resort
to iterative optimization methods comparing candidate solutions using a
loss function (in the simplest case, $\Vert p(q) - p^* \Vert_2^2$ is a
natural candidate), yielding:
$$\begin{align}
\min_{q \in \mathcal Q} \Vert p(q) - p^* \Vert_2^2 \, .
\label{eq:ik_problem}
\end{align}$$
Exact analytical solutions to IK are even less appealing when one
considers the presence of obstacles in the robot's workspace, resulting
in constraints on the possible values of
$q \in \mathcal Q \subseteq [-\pi, +\pi]^n \subset \mathbb R^n$ in the
general case of $n$-links robots.
For instance, the robot in
Figure [6](#fig:planar-manipulator-floor){reference-type="ref"
reference="fig:planar-manipulator-floor"} is (very naturally) obstacled
by the presence of the surface upon which it rests: $\theta_1$ can now
exclusively vary within $[0, \pi]$, while possible variations in
$\theta_2$ depend on $\theta_1$ (when $\theta_1 \to 0$ or
$\theta_1 \to \pi$, further downwards movements are restricted). Even
for a simplified kinematic model, developing techniques to
solve eq. [\[eq:ik_problem\]](#eq:ik_problem){reference-type="ref"
reference="eq:ik_problem"} is in general non-trivial in the presence of
constraints, particularly considering that the feasible set of solutions
$\mathcal Q$ may change across problems.
Figure [8](#fig:planar-manipulator-floor-shelf){reference-type="ref"
reference="fig:planar-manipulator-floor-shelf"} provides an example of
how the environment influences the feasible set considered, with a new
set of constraints deriving from the position of a new obstacle.
However, IK---solving
eq. [\[eq:ik_problem\]](#eq:ik_problem){reference-type="ref"
reference="eq:ik_problem"} for a feasible $q$---only proves useful in
determining information regarding the robot's configuration in the goal
pose, and crucially does not provide information on the *trajectory* to
follow over time to reach a target pose. Expert-defined trajectories
obviate to this problem providing a length-$K$ succession of goal poses
$\tau_K = [p^*_0, p^*_1, \dots p^*_K]$ for tracking. In practice,
trajectories can also be obtained automatically through *motion
planning* algorithms, thus avoiding expensive trajectory definition from
human experts. However, tracking $\tau_K$ via IK can prove prohibitively
expensive, as tracking would require $K$ resolutions of
eq. [\[eq:ik_problem\]](#eq:ik_problem){reference-type="ref"
reference="eq:ik_problem"} (one for each target pose). *Differential*
inverse kinematics (diff-IK) complements IK via closed-form solution of
a variant of
eq. [\[eq:ik_problem\]](#eq:ik_problem){reference-type="ref"
reference="eq:ik_problem"}. Let $J(q)$ denote the Jacobian matrix of
(partial) derivatives of the FK-function
$f_\FK: \mathcal Q \mapsto \mathcal P$, such that
$J(q) = \frac{\partial f_{FK}(q)}{\partial q }$. Then, one can apply the
chain rule to any $p(q) = f_{\FK}(q)$, deriving $\dot p = J(q) \dot q$,
and thus finally relating variations in the robot configurations to
variations in pose, thereby providing a platform for control.
Given a desired end-effector trajectory $\targetvel(t)$ (1) indicating
anchor regions in space and (2) how much time to spend in each region,
diff-IK finds $\dot q(t)$ solving for joints' *velocities* instead of
*configurations*, $$\begin{align}
\dot q(t) = \arg\min_\nu \; \lVert J(q(t)) \nu - \targetvel (t) \rVert_2^2
\label{eq:reg_ik_velocity}
\end{align}$$
Unlike eq. [\[eq:ik_problem\]](#eq:ik_problem){reference-type="ref"
reference="eq:ik_problem"}, solving for $\dot q$ is much less dependent
on the environment (typically, variations in velocity are constrained by
physical limits on the actuators). Conveniently,
eq. [\[eq:reg_ik_velocity\]](#eq:reg_ik_velocity){reference-type="ref"
reference="eq:reg_ik_velocity"} also often admits the closed-form
solution $\dot q = J(q)^+ \targetvel$, where $J^+(q)$ denotes the
Moore-Penrose pseudo-inverse of $J(q)$. Finally, discrete-time joint
configurations $q$ can be reconstructed from joint velocities $\dot q$
using forward-integration on the continuous-time joint velocity ,
$q_{t+1} = q_t + \Delta t\,\dot q_t$ for a given $\Delta t$, resulting
in tracking via diff-IK.
Following trajectories with diff-IK is a valid option in well-controlled
and static environments (e.g., industrial manipulators in controlled
manufacturing settings), and relies on the ability to define a set of
target velocities to track
$[\targetvel_0, \targetvel_1, \dots, \targetvel_k ]$---an error-prone
task largely requiring human expertise. Furthermore, diff-IK relies on
the ability to (1) access $J(q) \, \forall q \in \mathcal Q$ and (2)
compute its pseudo-inverse at every iteration of a given control
cycle---a challenging assumption in highly dynamical settings, or for
complex kinematic chains.
### Adding Feedback Loops
While very effective when a goal trajectory has been well specified, the
performance of diff-IK can degrade significantly in the presence of
modeling/tracking errors, or in the presence of non-modeled dynamics in
the environment.
::: wrapfigure
r0.3
{width="\\linewidth"}
:::
One such case is presented in
Figure [\[fig:planar-manipulator-box-velocity\]](#fig:planar-manipulator-box-velocity){reference-type="ref"
reference="fig:planar-manipulator-box-velocity"}, where another rigid
body other than the manipulator is moving in the environment along the
horizontal axis, with velocity $\dot x_B$. Accounting analytically for
the presence of this disturbance---for instance, to prevent the midpoint
of the link from ever colliding with the object---requires access to
$\dot x_B$ at least, to derive the equation characterizing the motion of
the environment.
Less predictable disturbances however (e.g.,
$\dot x_B \leftarrow \dot x_B + \eps, \eps \sim N(0,1)$) may prove
challenging to model analytically, and one could attain the same result
of preventing link-object collision by adding a condition on the
distance between the midpoint of $l$ and $x_B$, enforced through a
feedback loop on the position of the robot and object at each control
cycle.
To mitigate the effect of modeling errors, sensing noise and other
disturbances, classical pipelines indeed do augment diff-IK with
feedback control looping back quantities of interest. In practice,
following a trajectory with a closed feedback loop might consist in
backwarding the error between the target and measured pose,
$\Delta p = \targetpos - p(q)$, hereby modifying the control applied to
$\dot q = J(q)^+ (\targetvel + k_p \Delta p )$, with $k_p$ defined as
the (proportional) gain.
More advanced techniques for control consisting in feedback
linearization, PID control, Linear Quatratic Regulator (LQR) or
Model-Predictive Control (MPC) can be employed to stabilize tracking and
reject moderate perturbations, and we refer to
@sicilianoSpringerHandbookRobotics2016 [Chapter 8] for in-detail
explanation of these concepts, or [@tedrakeRoboticManipulationPerception
Chapter 8] for a simple, intuitive example in the case of a point-mass
system. Nonetheless, feedback control presents its challenges as well:
tuning gains remains laborious and system-specific. Further,
manipulation tasks present intermittent contacts inducing hybrid
dynamics (mode switches) and discontinuities in the Jacobian,
challenging the stability guarantees of the controller and thus often
necessitating rather conservative gains and substantial hand-tuning.
We point the interested reader to @sicilianoSpringerHandbookRobotics2016
[Chapter 2,7,8], @lynchModernRoboticsMechanics2017 [Chapter 6,11],
and @tedrakeRoboticManipulationPerception [Chapter 3,8] for extended
coverage of FK, IK, diff-IK and control for (diff-)IK.
## Limitations of Dynamics-based Robotics
Despite the last 60+ years of robotics research, autonomous robots are
still largely incapable of performing tasks at human-level performance
in the physical world generalizing across (1) robot embodiments
(different manipulators, different locomotion platforms, etc.) and (2)
tasks (tying shoe-laces, manipulating a diverse set of objects). While
essential in the early development of robotics, the aforementioned
methods require significant human expertise to be used in practice, and
are typically specific to a particular applicative problem.
{#fig:classical-limitations
width="90%"}
Dynamics-based robotics pipelines have historically been now within most
architectures for specific purposes. That is, sensing, state estimation,
mapping, planning, (diff-)IK, and low-level control have been
traditionally developed as distinct modules with fixed interfaces.
Pipelining these specific modules proved error-prone, and brittleness
emerges---alongside compounding errors---whenever changes incur (e.g.,
changes in lighting for sensing, occlusion/failure of sensors, control
failures). Adapting such a stack to new tasks or robotic platforms often
entails re-specifying objectives, constraints, and heuristics at
multiple stages, incurring significant engineering overhead.
Moreover, classical planners operate on compact, assumed-sufficient
state representations; extending them to reason directly over raw,
heterogeneous and noisy data streams is non-trivial. This results in a ,
as incorporating high-dimensional perceptual inputs (RGB, depth,
tactile, audio) traditionally required extensive engineering efforts to
extract meaningful features for control. Also, the large number of
tasks, coupled with the adoption of *per-task* planners, goal
parameterizations, and safety constraints, results in an explosion in
design and validation options, with little opportunity to reuse
solutions across tasks.
Setting aside integration and scalability challenges: developing
accurate modeling of contact, friction, and compliance for complicated
systems remains difficult. Rigid-body approximations are often
insufficient in the presence of deformable objects, and of the methods
developed. In the case of complex, time-dependent and/or non-linear
dynamics, even moderate mismatches in parameters, unmodeled evolutions,
or grasp-induced couplings can qualitatively affect the observed
dynamics.
Lastly, dynamics-based methods (naturally) overlook the rather recent .
The curation of academic datasets by large centralized groups of human
experts in
robotics [@collaborationOpenXEmbodimentRobotic2025; @khazatskyDROIDLargeScaleInTheWild2025]
is now increasingly complemented by a by individuals with varied
expertise. If not tangentially, dynamics-based approaches are not posed
to maximally benefit from this trend, which holds the premise of
allowing generalization in the space of tasks and embodiments, like data
was the cornerstone for advancements in
vision [@alayracFlamingoVisualLanguage2022] and natural-language
understanding [@brownLanguageModelsAre2020].
Taken together, these limitations
(Figure [9](#fig:classical-limitations){reference-type="ref"
reference="fig:classical-limitations"}) motivate the exploration of
learning-based approaches that can (1) integrate perception and control
more tightly, (2) adapt across tasks and embodiments with reduced expert
modeling interventions and (3) scale gracefully in performance as more
robotics data becomes available.
[^1]: In here, we refer to both *kinematics* and *dynamics*-based
control.
|