Mechanics & Optimal Control — A Cheatsheet for Robotics Research

Part I

Analytical Mechanics

From configuration spaces to manipulator dynamics — the equations every controller operates on.

01Generalized coordinates and configuration spaces

A robot's configuration is a point $\mathbf{q}\in\mathcal{Q}$ in the configuration space, a smooth manifold of dimension $n$ (the number of degrees of freedom). Each coordinate $q_i$ is a generalized coordinate.

Serial manipulator (n-DoF)

$\mathbf{q} = (\theta_1,\ldots,\theta_n)\in\mathbb{T}^n$ or $\mathbb{R}^n$ for revolute/prismatic joints. Flat configuration space.

Floating base robot

$\mathbf{q}=(\mathbf{p},\mathbf{R},\mathbf{q}_{\text{joints}})\in\mathrm{SE}(3)\times\mathbb{R}^{n_j}$. Rotation part is a Lie group, not a vector space.

Quadrotor

$\mathbf{q}\in\mathrm{SE}(3)$, underactuated (4 inputs, 6 DoF).

Car-like mobile robot

$\mathbf{q}=(x,y,\theta)\in\mathrm{SE}(2)$. Nonholonomic: cannot slide sideways.

The tangent space $T_\mathbf{q}\mathcal{Q}$ holds generalized velocities $\dot{\mathbf{q}}$; the cotangent space $T_\mathbf{q}^*\mathcal{Q}$ holds generalized forces and momenta. The total state is $(\mathbf{q},\dot{\mathbf{q}})\in T\mathcal{Q}$ or $(\mathbf{q},\mathbf{p})\in T^*\mathcal{Q}$.

02Lagrangian mechanics

Define the Lagrangian $\mathcal{L}(\mathbf{q},\dot{\mathbf{q}},t) = T(\mathbf{q},\dot{\mathbf{q}}) - V(\mathbf{q})$, the difference between kinetic and potential energy. Hamilton's principle of stationary action says that physical trajectories are critical points of $$S[\mathbf{q}] = \int_{t_0}^{t_1} \mathcal{L}(\mathbf{q}(t),\dot{\mathbf{q}}(t),t)\,dt,$$ among curves with fixed endpoints.

Euler–Lagrange Stationarity of $S$ yields, for each coordinate $q_i$, $$\boxed{\ \frac{d}{dt}\!\left(\frac{\partial\mathcal{L}}{\partial\dot q_i}\right) - \frac{\partial\mathcal{L}}{\partial q_i} = Q_i\ }$$ where $Q_i$ are generalized forces not derivable from a potential (friction, actuators, contact impulses).

Kinetic energy of a mechanical system with mass matrix $M(\mathbf{q})\succ 0$: $$T = \tfrac{1}{2}\dot{\mathbf{q}}^\top M(\mathbf{q})\dot{\mathbf{q}}.$$ The Euler–Lagrange equations then take the manipulator form $$M(\mathbf{q})\ddot{\mathbf{q}} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) = \boldsymbol{\tau},$$ with $G = \partial V/\partial\mathbf{q}$ and the Coriolis matrix $C$ satisfying $[C]_{ij} = \sum_k \Gamma_{ijk}\dot q_k$ for the Christoffel symbols $\Gamma_{ijk} = \tfrac{1}{2}(\partial_k M_{ij} + \partial_j M_{ik} - \partial_i M_{jk})$ of the kinetic-energy metric $M$. A key structural property is that $\dot M - 2C$ is skew-symmetric — used heavily in adaptive and passivity-based control.

Mental model Pick coordinates, write $T$ and $V$, form $\mathcal{L}=T-V$, grind the Euler–Lagrange operator, collect into $M\ddot q + C\dot q + G = \tau$. No free-body diagrams, no force balances. Works unchanged on $\mathrm{SE}(3)$, on constrained systems (with multipliers), and on systems with non-standard generalized coordinates.

03Rigid-body kinematics: $\mathrm{SO}(3)$ and $\mathrm{SE}(3)$

The rotation group $\mathrm{SO}(3) = \{R\in\mathbb{R}^{3\times3}: R^\top R = I, \det R = +1\}$ is a three-dimensional Lie group. Its Lie algebra $\mathfrak{so}(3)$ is the set of $3\times 3$ skew-symmetric matrices. The hat map lifts a vector $\boldsymbol{\omega}\in\mathbb{R}^3$ to a skew matrix: $$\widehat{\boldsymbol{\omega}} = \begin{pmatrix}0 & -\omega_3 & \omega_2\\ \omega_3 & 0 & -\omega_1\\ -\omega_2 & \omega_1 & 0\end{pmatrix},\qquad \widehat{\boldsymbol{\omega}}\,\mathbf{v}=\boldsymbol{\omega}\times\mathbf{v}.$$ The exponential map (Rodrigues' formula) takes $\mathfrak{so}(3)\to\mathrm{SO}(3)$: $$\exp(\widehat{\boldsymbol{\omega}}\,\theta) = I + \sin\theta\,\widehat{\hat{\mathbf n}} + (1-\cos\theta)\,\widehat{\hat{\mathbf n}}^{\,2},\qquad \hat{\mathbf n}=\boldsymbol{\omega}/\|\boldsymbol{\omega}\|.$$

Rigid transforms form $\mathrm{SE}(3)=\mathrm{SO}(3)\ltimes\mathbb{R}^3$, represented as $4\times 4$ homogeneous matrices $$T = \begin{pmatrix}R & \mathbf{p}\\ \mathbf 0 & 1\end{pmatrix}.$$ The Lie algebra $\mathfrak{se}(3)$ is parametrized by a twist $\boldsymbol{\xi}=(\boldsymbol{v},\boldsymbol{\omega})\in\mathbb{R}^6$ (linear + angular velocity).

Why SO(3) matters for robotics Rotations do not live in a vector space: you cannot add quaternions or rotation matrices naively and expect a rotation. Integration, interpolation, and optimization on $\mathrm{SO}(3)$ require Lie-group-aware tools: $\exp/\log$, the right- and left-trivialized tangent, and manifold retraction. Filter updates (EKF, UKF) must be done on-manifold for stability.

Angular velocity

If $R(t)\in\mathrm{SO}(3)$, then $\dot R = R\,\widehat{\boldsymbol{\omega}}_B = \widehat{\boldsymbol{\omega}}_S\,R$ where $\boldsymbol{\omega}_B$ is angular velocity in the body frame and $\boldsymbol{\omega}_S=R\boldsymbol{\omega}_B$ in the spatial frame. This is why small-angle rotation increments are applied on the left (spatial) or right (body) depending on convention.

04Manipulator kinematics and Jacobians

Forward kinematics: map joint angles to end-effector pose, $\mathbf{x}_{\text{ee}} = f(\mathbf{q})\in\mathrm{SE}(3)$. For a serial chain with Denavit–Hartenberg parameters, $f$ is the product of per-link transforms $$T_{0n}(\mathbf{q}) = \prod_{i=1}^n T_{i-1,i}(q_i).$$ More modern product-of-exponentials form: $T_{0n}(\mathbf{q}) = e^{\widehat{\boldsymbol{\xi}}_1 q_1}\cdots e^{\widehat{\boldsymbol{\xi}}_n q_n} T_{0n}(0)$, with $\boldsymbol{\xi}_i$ the screw axis of joint $i$ in the base frame.

Manipulator Jacobian The geometric Jacobian $J(\mathbf{q})\in\mathbb{R}^{6\times n}$ maps joint velocities to spatial end-effector twist: $$\begin{pmatrix}\mathbf{v}_{\text{ee}}\\ \boldsymbol{\omega}_{\text{ee}}\end{pmatrix} = J(\mathbf{q})\,\dot{\mathbf{q}}.$$ Singularities occur where $\mathrm{rank}(J) < 6$: directions the end-effector cannot instantaneously move.

Inverse kinematics

Given $\mathbf{x}^\star$, find $\mathbf{q}^\star$ with $f(\mathbf{q}^\star)=\mathbf{x}^\star$. Approaches:

Analytical: closed form for specific geometries (e.g. 6-DoF with spherical wrist).
Jacobian inverse iteration: $\dot{\mathbf{q}} = J^+(\mathbf{q})(\mathbf{x}^\star - f(\mathbf{q}))$, with $J^+$ the Moore–Penrose pseudoinverse.
Damped least squares: $\dot{\mathbf{q}} = J^\top(JJ^\top + \lambda^2 I)^{-1}\mathbf{e}$ — stable near singularities.
Optimization-based: nonlinear least squares on the pose error on $\mathrm{SE}(3)$.

Fig. 1Two-link planar manipulator. Forward kinematics maps $(q_1,q_2)$ to end-effector $(x,y)$; the Jacobian is the derivative of that map. Singularities are configurations where $J$ loses rank — the workspace locally collapses in some direction.

05Manipulator dynamics: the equations of motion

For an $n$-link open-chain robot, the Lagrangian is $\mathcal{L} = \tfrac{1}{2}\dot{\mathbf{q}}^\top M(\mathbf{q})\dot{\mathbf{q}} - V(\mathbf{q})$ with mass matrix $$M(\mathbf{q}) = \sum_{i=1}^n \big(J_{v,i}^\top m_i J_{v,i} + J_{\omega,i}^\top R_i\,I_i\,R_i^\top J_{\omega,i}\big),$$ where $m_i$ is the $i$-th link mass, $I_i$ its inertia tensor in body frame, and $J_{v,i},J_{\omega,i}$ are the link-center Jacobians. The potential is $V = \sum_i m_i\,g\,z_i(\mathbf{q})$.

Manipulator equation $$M(\mathbf{q})\ddot{\mathbf{q}} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) + F(\dot{\mathbf{q}}) = \boldsymbol{\tau} + J(\mathbf{q})^\top \mathbf{F}_{\text{ext}}.$$ Here $F$ captures friction (viscous $F_v\dot{\mathbf{q}}$, Coulomb $F_s\mathrm{sgn}(\dot{\mathbf{q}})$) and $\mathbf{F}_{\text{ext}}$ is the external wrench at the end effector.

Structural properties used by controllers

$M(\mathbf{q})\succ 0$ uniformly; $M,M^{-1}$ are both $C^\infty$ in $\mathbf{q}$.
$\dot M - 2C$ is skew-symmetric (energy-preserving Coriolis).
The system is linear in inertial parameters: $M\ddot{\mathbf{q}}+C\dot{\mathbf{q}}+G = Y(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}})\,\boldsymbol{\pi}$ for a known regressor $Y$ and unknown $\boldsymbol{\pi}$. The foundation of adaptive control.
Passivity: the map $\boldsymbol{\tau}\mapsto\dot{\mathbf{q}}$ is passive with storage $\tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}} + V(\mathbf{q})$.

Recursive computation (Newton–Euler)

For chains of more than a few DoF, writing $M,C$ in closed form is impractical. Recursive Newton–Euler computes $\boldsymbol{\tau}(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}})$ in $O(n)$: an outward pass propagates velocities and accelerations link-by-link, and an inward pass accumulates forces. Featherstone's articulated-body algorithm solves the forward dynamics $\ddot{\mathbf{q}} = f(\mathbf{q},\dot{\mathbf{q}},\boldsymbol{\tau})$ in $O(n)$ without forming $M^{-1}$ explicitly.

06Hamiltonian mechanics and symplectic structure

Define the generalized momentum $\mathbf{p} = \partial\mathcal{L}/\partial\dot{\mathbf{q}} = M(\mathbf{q})\dot{\mathbf{q}}$, and the Hamiltonian via the Legendre transform $$H(\mathbf{q},\mathbf{p}) = \mathbf{p}^\top\dot{\mathbf{q}} - \mathcal{L} = \tfrac{1}{2}\mathbf{p}^\top M(\mathbf{q})^{-1}\mathbf{p} + V(\mathbf{q}).$$ The equations of motion become $$\dot{\mathbf{q}} = \frac{\partial H}{\partial \mathbf{p}},\qquad \dot{\mathbf{p}} = -\frac{\partial H}{\partial\mathbf{q}} + \boldsymbol{\tau},$$ a first-order system in $(\mathbf{q},\mathbf{p})$.

Why Hamiltonian form matters (1) Energy: for autonomous conservative systems, $H$ is conserved along trajectories. (2) Symplectic integration: methods like Verlet/leapfrog preserve the symplectic form $d\mathbf{q}\wedge d\mathbf{p}$, giving bounded energy error over long horizons — essential for stable simulation of articulated robots and Hamiltonian Monte Carlo. (3) Optimal control: Pontryagin's principle is most naturally stated in Hamiltonian variables (§10).

07Constraints and contact

Holonomic constraints act on configuration: $\boldsymbol{\phi}(\mathbf{q})=0$, $\boldsymbol{\phi}:\mathbb{R}^n\to\mathbb{R}^m$. Differentiating: $A(\mathbf{q})\dot{\mathbf{q}}=0$ with $A = \partial\boldsymbol{\phi}/\partial\mathbf{q}$. Augment the Lagrangian with multipliers $\boldsymbol{\lambda}\in\mathbb{R}^m$: $$M\ddot{\mathbf{q}}+C\dot{\mathbf{q}}+G = \boldsymbol{\tau} + A^\top\boldsymbol{\lambda},\qquad A\ddot{\mathbf{q}}+\dot A\dot{\mathbf{q}}=0.$$ Solve jointly — the multipliers are constraint forces.

Nonholonomic constraints act on velocity and are not integrable to a configuration constraint, e.g. rolling without slipping on a wheel or car. The car's no-sideways-slip constraint $\sin\theta\,\dot x - \cos\theta\,\dot y = 0$ restricts velocities but leaves $(x,y,\theta)\in\mathrm{SE}(2)$ fully accessible — the configuration manifold keeps its full dimension.

Contact dynamics

Contact with friction is modeled as a linear complementarity problem: $$0\le\lambda_n\perp g(\mathbf{q})\ge 0,\qquad \|\boldsymbol{\lambda}_t\|\le\mu\lambda_n,$$ with $g$ a signed distance. Modern rigid-body simulators (MuJoCo, Bullet, Drake) solve this via regularized/smoothed contact models, or true LCP/NCP solvers. Differentiable contact (DiffSim, Warp) enables gradient-based policy learning through contacts.

08Exercises · Mechanics

Exercise 1 · Simple pendulum★

A point mass $m$ on a massless rod of length $\ell$ swings in a vertical plane, with $q$ the angle from the downward vertical. Derive the equation of motion via Euler–Lagrange.

Show solution

Position: $(x,y) = (\ell\sin q, -\ell\cos q)$. Velocity-squared: $\dot x^2+\dot y^2 = \ell^2\dot q^2$. Kinetic energy $T = \tfrac{1}{2}m\ell^2\dot q^2$. Potential $V = -mg\ell\cos q$ (zero at the pivot). Lagrangian $\mathcal{L}=T-V = \tfrac{1}{2}m\ell^2\dot q^2 + mg\ell\cos q$.

Euler–Lagrange: $\partial\mathcal{L}/\partial\dot q = m\ell^2\dot q$, so $(d/dt)(\partial\mathcal{L}/\partial\dot q) = m\ell^2\ddot q$. And $\partial\mathcal{L}/\partial q = -mg\ell\sin q$. Therefore $m\ell^2\ddot q + mg\ell\sin q = 0$, i.e. $$\ddot q + (g/\ell)\sin q = 0.$$ Linearizing about $q=0$ gives $\ddot q + (g/\ell)q = 0$ with period $T = 2\pi\sqrt{\ell/g}$.

Exercise 2 · Two-link planar manipulator★★

Point masses $m_1,m_2$ at the tips of two massless links of length $\ell_1,\ell_2$ (planar, revolute joints). Derive the mass matrix $M(\mathbf{q})$.

Show solution

End of link 1: $(\ell_1 c_1, \ell_1 s_1)$; end of link 2: $(\ell_1 c_1 + \ell_2 c_{12}, \ell_1 s_1 + \ell_2 s_{12})$, where $c_1=\cos q_1$, $c_{12}=\cos(q_1+q_2)$, etc.

Kinetic energies: $$T_1 = \tfrac{1}{2}m_1\ell_1^2\dot q_1^2,\qquad T_2 = \tfrac{1}{2}m_2\big(\ell_1^2\dot q_1^2 + \ell_2^2(\dot q_1+\dot q_2)^2 + 2\ell_1\ell_2\,c_2\,\dot q_1(\dot q_1+\dot q_2)\big).$$ Collecting $T=\tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}}$: $$M(\mathbf{q}) = \begin{pmatrix} m_1\ell_1^2 + m_2(\ell_1^2+\ell_2^2+2\ell_1\ell_2 c_2) & m_2(\ell_2^2 + \ell_1\ell_2 c_2)\\ m_2(\ell_2^2+\ell_1\ell_2 c_2) & m_2\ell_2^2\end{pmatrix}.$$ The off-diagonal coupling $m_2 \ell_1\ell_2 c_2$ is why robot arms have configuration-dependent inertia: swinging link 1 accelerates link 2 and vice versa.

Exercise 3 · Skew-symmetry of $\dot M - 2C$★★★

Show that there exists a choice of Coriolis matrix $C(\mathbf{q},\dot{\mathbf{q}})$ such that $\dot M - 2C$ is skew-symmetric, and interpret this in terms of energy conservation.

Show solution

Take $C_{ij} = \sum_k \Gamma_{ijk}\dot q_k$ with Christoffel symbols $\Gamma_{ijk} = \tfrac{1}{2}(\partial_k M_{ij} + \partial_j M_{ik} - \partial_i M_{jk})$. Then $\dot M_{ij} = \sum_k \partial_k M_{ij}\dot q_k$ and $$(\dot M - 2C)_{ij} = \sum_k\big(\partial_k M_{ij} - \partial_k M_{ij} - \partial_j M_{ik} + \partial_i M_{jk}\big)\dot q_k = \sum_k(\partial_i M_{jk} - \partial_j M_{ik})\dot q_k,$$ which is antisymmetric in $(i,j)$.

Consequence: $\dot{\mathbf{q}}^\top(\dot M - 2C)\dot{\mathbf{q}} = 0$. Multiplying $M\ddot{\mathbf{q}}+C\dot{\mathbf{q}}+G=\boldsymbol{\tau}$ by $\dot{\mathbf{q}}^\top$ and using this identity gives the power balance $\dot E = \dot{\mathbf{q}}^\top\boldsymbol{\tau}$ where $E = \tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}}+V$ — total energy changes only by actuator work. This passivity identity is the foundation of passivity-based manipulator control (Slotine–Li adaptive control, PD+gravity compensation stability).

Exercise 4 · Rodrigues' formula★★

Let $\hat{\mathbf n}\in\mathbb{R}^3$ be a unit axis and $\theta\in\mathbb{R}$ an angle. Show that $\exp(\widehat{\hat{\mathbf n}}\theta) = I + \sin\theta\,\widehat{\hat{\mathbf n}} + (1-\cos\theta)\,\widehat{\hat{\mathbf n}}^{\,2}$.

Show solution

Since $\hat{\mathbf n}$ is a unit vector, $\widehat{\hat{\mathbf n}}^3 = -\widehat{\hat{\mathbf n}}$ (verify from $\widehat{\hat{\mathbf n}}^2 = \hat{\mathbf n}\hat{\mathbf n}^\top - I$ and $\widehat{\hat{\mathbf n}}\hat{\mathbf n}=0$). Expand the exponential series: $$e^{\widehat{\hat{\mathbf n}}\theta} = \sum_{k=0}^\infty \frac{(\widehat{\hat{\mathbf n}}\theta)^k}{k!} = I + \theta\widehat{\hat{\mathbf n}} + \tfrac{\theta^2}{2}\widehat{\hat{\mathbf n}}^{\,2} - \tfrac{\theta^3}{6}\widehat{\hat{\mathbf n}} - \tfrac{\theta^4}{24}\widehat{\hat{\mathbf n}}^{\,2} + \cdots$$ Grouping odd powers gives $\sin\theta\,\widehat{\hat{\mathbf n}}$; even powers (from $k\ge 2$) give $(1-\cos\theta)\widehat{\hat{\mathbf n}}^{\,2}$. Hence $e^{\widehat{\hat{\mathbf n}}\theta} = I + \sin\theta\,\widehat{\hat{\mathbf n}}+(1-\cos\theta)\,\widehat{\hat{\mathbf n}}^{\,2}$. This is the canonical closed form used in IK solvers, quaternion conversions, and Lie-group integration.

Exercise 5 · Cartpole★★★

A cart of mass $M$ slides horizontally with position $x$; a pole of length $\ell$ and tip mass $m$ is hinged atop, angle $\theta$ from vertical. A horizontal force $u$ acts on the cart. Derive the equations of motion.

Show solution

Generalized coordinates $\mathbf{q}=(x,\theta)$. Cart position $(x,0)$, pole-tip position $(x+\ell\sin\theta, \ell\cos\theta)$. Velocities: tip $(\dot x+\ell\cos\theta\,\dot\theta, -\ell\sin\theta\,\dot\theta)$.

$T = \tfrac{1}{2}M\dot x^2 + \tfrac{1}{2}m\big((\dot x+\ell c\dot\theta)^2 + (\ell s\dot\theta)^2\big) = \tfrac{1}{2}(M+m)\dot x^2 + m\ell c\dot x\dot\theta + \tfrac{1}{2}m\ell^2\dot\theta^2$, using $c=\cos\theta,s=\sin\theta$. Potential $V = mg\ell\cos\theta$. Euler–Lagrange applied to $x$ (non-conservative force $u$) and $\theta$: $$(M+m)\ddot x + m\ell(c\ddot\theta - s\dot\theta^2) = u,$$ $$m\ell c\ddot x + m\ell^2\ddot\theta - m g\ell s = 0.$$ Equivalently, in manipulator form $$\begin{pmatrix}M+m & m\ell c\\ m\ell c & m\ell^2\end{pmatrix}\!\!\begin{pmatrix}\ddot x\\ \ddot\theta\end{pmatrix} + \begin{pmatrix}-m\ell s\dot\theta^2\\ 0\end{pmatrix} + \begin{pmatrix}0\\ -mg\ell s\end{pmatrix} = \begin{pmatrix}u\\ 0\end{pmatrix}.$$ This is the canonical underactuated system for demonstrating LQR stabilization, energy shaping, iLQR trajectory optimization, and model-based RL.

Part II

Optimal Control

Calculus of variations, Pontryagin, HJB, LQR, iLQR, MPC, and Lyapunov — the toolkit for designing and certifying robot controllers.

09Calculus of variations

The foundational problem: minimize a functional $$J[\mathbf{x}] = \int_{t_0}^{t_f} L(\mathbf{x}(t),\dot{\mathbf{x}}(t),t)\,dt$$ over curves $\mathbf{x}:[t_0,t_f]\to\mathbb{R}^n$ with fixed endpoints. Stationary curves satisfy the Euler–Lagrange equation $\frac{d}{dt}\partial_{\dot{\mathbf{x}}}L = \partial_{\mathbf{x}}L$ — the same operator that governs conservative mechanics (§2). Lagrangian mechanics is variational, and so is optimal control.

Transversality: if the terminal state is free, $\partial_{\dot{\mathbf{x}}}L(t_f)=0$; if the terminal time is free, $L(t_f) - \dot{\mathbf{x}}(t_f)^\top\partial_{\dot{\mathbf{x}}}L(t_f) = 0$. These boundary conditions emerge from integration-by-parts of the first variation.

10Pontryagin's maximum principle

Optimal control problem: minimize $$J(\mathbf{u}) = \phi(\mathbf{x}(t_f)) + \int_{t_0}^{t_f} L(\mathbf{x}(t),\mathbf{u}(t),t)\,dt\quad\text{subject to}\quad \dot{\mathbf{x}} = \mathbf{f}(\mathbf{x},\mathbf{u},t),\ \mathbf{u}(t)\in\mathcal{U}.$$

Pontryagin's maximum principle Along an optimal trajectory there exists a costate $\boldsymbol{\lambda}:[t_0,t_f]\to\mathbb{R}^n$ such that, with the control Hamiltonian $$H(\mathbf{x},\boldsymbol{\lambda},\mathbf{u},t) = L(\mathbf{x},\mathbf{u},t) + \boldsymbol{\lambda}^\top \mathbf{f}(\mathbf{x},\mathbf{u},t),$$

State: $\dot{\mathbf{x}} = \partial H/\partial\boldsymbol{\lambda}$;
Costate: $\dot{\boldsymbol{\lambda}} = -\partial H/\partial\mathbf{x}$;
Minimum principle: $\mathbf{u}^\star(t) \in \arg\min_{\mathbf{u}\in\mathcal{U}} H(\mathbf{x},\boldsymbol{\lambda},\mathbf{u},t)$;
Transversality: $\boldsymbol{\lambda}(t_f) = \partial\phi/\partial\mathbf{x}(\mathbf{x}(t_f))$.

The principle turns an infinite-dimensional optimization into a two-point boundary-value problem (TPBVP) in $(\mathbf{x},\boldsymbol{\lambda})$. For problems with control constraints, the minimum condition often yields bang-bang control: $\mathbf{u}^\star$ saturates at the bounds whenever the switching function $\partial H/\partial\mathbf{u}$ is non-zero. Singular arcs occur where it vanishes on an interval.

Indirect method Solve the TPBVP directly (shooting, multiple shooting, collocation) — historically dominant in aerospace. Accurate but sensitive to initialization. Direct methods (§13) convert the problem to an NLP instead and are the default in modern robotics stacks.

11Dynamic programming and the HJB equation

Define the value function as the optimal cost-to-go from state $\mathbf{x}$ at time $t$: $$V(\mathbf{x},t) = \min_{\mathbf{u}(\cdot)}\!\left[\phi(\mathbf{x}(t_f)) + \int_t^{t_f} L(\mathbf{x}(s),\mathbf{u}(s),s)\,ds\right],\quad \mathbf{x}(t)=\mathbf{x}.$$

Hamilton–Jacobi–Bellman $$-\partial_t V(\mathbf{x},t) = \min_{\mathbf{u}\in\mathcal{U}}\!\big[L(\mathbf{x},\mathbf{u},t) + \nabla_{\mathbf{x}}V(\mathbf{x},t)^\top \mathbf{f}(\mathbf{x},\mathbf{u},t)\big],\quad V(\mathbf{x},t_f)=\phi(\mathbf{x}).$$ The optimal control is the arg-min: $\mathbf{u}^\star(\mathbf{x},t) = \arg\min_{\mathbf{u}}[L + \nabla V^\top \mathbf{f}]$.

The HJB equation defines the optimal feedback policy $\mathbf{u}^\star(\mathbf{x},t)$, unlike the PMP which gives an open-loop optimal trajectory. For LQR it admits closed form; for general nonlinear systems it suffers from the curse of dimensionality: numerical solution requires gridding state space, which scales as $N^n$.

Discrete-time / finite-horizon analog: $$V_k(\mathbf{x}) = \min_{\mathbf{u}}\big[\ell_k(\mathbf{x},\mathbf{u}) + V_{k+1}(\mathbf{f}_k(\mathbf{x},\mathbf{u}))\big],\quad V_N(\mathbf{x})=\phi(\mathbf{x}).$$ This is the Bellman equation solved (exactly or approximately) by DP, value iteration, Q-learning, and all of reinforcement learning.

12Linear quadratic regulator (LQR)

A linear system $\dot{\mathbf{x}} = A\mathbf{x} + B\mathbf{u}$ with quadratic cost $$J = \tfrac{1}{2}\mathbf{x}(t_f)^\top Q_f\,\mathbf{x}(t_f) + \tfrac{1}{2}\int_{t_0}^{t_f}\!\big(\mathbf{x}^\top Q\mathbf{x} + \mathbf{u}^\top R\mathbf{u}\big)\,dt,\quad Q\succeq 0,\ R\succ 0,\ Q_f\succeq 0.$$ Guess $V(\mathbf{x},t) = \tfrac{1}{2}\mathbf{x}^\top P(t)\mathbf{x}$ and substitute into HJB. The inner minimization gives $\mathbf{u}^\star = -R^{-1}B^\top P\mathbf{x} =: -K(t)\mathbf{x}$, and $P(t)$ satisfies the continuous-time Riccati differential equation $$-\dot P = A^\top P + P A - P B R^{-1} B^\top P + Q,\quad P(t_f)=Q_f.$$

Infinite-horizon LQR If $(A,B)$ is stabilizable and $(A,\sqrt Q)$ is detectable, then $P(t)\to P_\infty$, the unique symmetric positive-semidefinite solution of the algebraic Riccati equation $$A^\top P + PA - PBR^{-1}B^\top P + Q = 0.$$ The stationary feedback $K = R^{-1}B^\top P$ stabilizes the closed-loop $\dot{\mathbf{x}} = (A-BK)\mathbf{x}$ and minimizes the infinite-horizon cost.

Fig. 2Infinite-horizon LQR is a static-gain linear feedback $\mathbf{u} = -K\mathbf{x}$ with $K$ computed once offline from the algebraic Riccati equation. Closed-loop dynamics are $\dot{\mathbf{x}} = (A-BK)\mathbf{x}$, stable by construction under standard detectability assumptions.

Discrete-time LQR

For $\mathbf{x}_{k+1}=A\mathbf{x}_k+B\mathbf{u}_k$ with stage cost $\mathbf{x}^\top Q\mathbf{x} + \mathbf{u}^\top R\mathbf{u}$, the DP recursion is $$P_k = Q + A^\top P_{k+1}A - A^\top P_{k+1}B(R+B^\top P_{k+1}B)^{-1}B^\top P_{k+1}A,$$ $$K_k = (R+B^\top P_{k+1}B)^{-1}B^\top P_{k+1}A,\quad \mathbf{u}_k^\star = -K_k\mathbf{x}_k.$$ The infinite-horizon fixed point gives the discrete algebraic Riccati equation (DARE), solved in practice by scipy.linalg.solve_discrete_are.

LQR as a local controller

For a nonlinear system $\dot{\mathbf{x}}=f(\mathbf{x},\mathbf{u})$ with equilibrium $(\mathbf{x}^\star,\mathbf{u}^\star)$, linearize: $A = \partial f/\partial\mathbf{x}$, $B = \partial f/\partial\mathbf{u}$ at the equilibrium. LQR designed on $(A,B)$ locally stabilizes the true nonlinear system — the backbone of hover controllers for quadrotors, walking-balance controllers for bipeds, and the terminal cost in most MPC formulations.

13Iterative LQR, DDP, and model predictive control

Iterative LQR (iLQR) extends LQR to nonlinear systems by repeated local linearization:

Guess a nominal trajectory $(\bar{\mathbf{x}}_k,\bar{\mathbf{u}}_k)$ satisfying dynamics.
Linearize dynamics and quadratize costs around the nominal.
Backward pass: run the LQR recursion on the linearization to get feedback gains $K_k$ and feedforward terms $k_k$.
Forward pass: roll out $\mathbf{u}_k = \bar{\mathbf{u}}_k + \alpha\,k_k + K_k(\mathbf{x}_k-\bar{\mathbf{x}}_k)$ with a line-search $\alpha$.
Iterate until convergence.

DDP (differential dynamic programming) keeps the second-order expansion of dynamics as well, giving quadratic convergence near the optimum. iLQR drops those terms for a $\sim10\times$ speedup with nearly the same behavior.

Direct trajectory optimization

Convert the continuous optimal-control problem into a finite NLP: $$\min_{\mathbf{x}_{0:N},\mathbf{u}_{0:N-1}} \phi(\mathbf{x}_N) + \sum_{k=0}^{N-1}\ell_k(\mathbf{x}_k,\mathbf{u}_k)\quad\text{s.t.}\quad \mathbf{x}_{k+1} = \mathbf{f}_k(\mathbf{x}_k,\mathbf{u}_k),\ \mathbf{x}_k\in\mathcal{X},\ \mathbf{u}_k\in\mathcal{U}.$$ Direct shooting optimizes over $\mathbf{u}$ only (states eliminated by forward simulation); multiple shooting breaks the horizon into segments; collocation treats $\mathbf{x}_k$ and $\mathbf{u}_k$ as independent variables with dynamics as equality constraints. Solved by SQP or interior-point (Ipopt, SNOPT, FORCES Pro).

Model predictive control (MPC)

Repeat at every control tick $t_k$:

Measure current state $\mathbf{x}_k$.
Solve a finite-horizon optimal-control problem starting from $\mathbf{x}_k$.
Apply only the first control $\mathbf{u}_k^\star$.
Shift horizon and repeat.

MPC fuses long-horizon planning with feedback. For linear systems with quadratic cost and linear constraints, each problem is a convex QP — closed-loop stability follows from a terminal cost and terminal constraint that render the infinite-horizon tail cost-decreasing. For nonlinear robots, NMPC solves an SQP online, now tractable at kilohertz rates for quadrupeds and humanoids with real-time iteration schemes and warm-started solvers (acados, HPIPM).

14Lyapunov stability and feedback linearization

Lyapunov function For $\dot{\mathbf{x}} = f(\mathbf{x})$ with equilibrium $\mathbf{0}$, a $C^1$ function $V:\mathbb{R}^n\to\mathbb{R}$ is a Lyapunov function if $V(\mathbf{0})=0$, $V(\mathbf{x})>0$ for $\mathbf{x}\ne\mathbf{0}$, and $\dot V = \nabla V^\top f(\mathbf{x})\le 0$.

Lyapunov theorem If such a $V$ exists locally, the origin is stable. If $\dot V<0$ for $\mathbf{x}\ne\mathbf{0}$, the origin is asymptotically stable. With radial unboundedness ($V(\mathbf{x})\to\infty$ as $\|\mathbf{x}\|\to\infty$), it is globally asymptotically stable (GAS).

Example — PD + gravity compensation for a manipulator at setpoint $\mathbf{q}_d$: the controller $\boldsymbol{\tau} = -K_p(\mathbf{q}-\mathbf{q}_d) - K_d\dot{\mathbf{q}} + G(\mathbf{q})$ is GAS with Lyapunov function $V = \tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}} + \tfrac{1}{2}(\mathbf{q}-\mathbf{q}_d)^\top K_p(\mathbf{q}-\mathbf{q}_d)$; $\dot V = -\dot{\mathbf{q}}^\top K_d\dot{\mathbf{q}}\le 0$ and LaSalle gives convergence.

Feedback linearization (computed torque)

For the manipulator, the control $\boldsymbol{\tau} = M(\mathbf{q})\mathbf{v} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q})$ yields $\ddot{\mathbf{q}} = \mathbf{v}$ — a chain of integrators. Choosing $\mathbf{v} = \ddot{\mathbf{q}}_d - K_d(\dot{\mathbf{q}}-\dot{\mathbf{q}}_d) - K_p(\mathbf{q}-\mathbf{q}_d)$ gives exponentially stable tracking with eigenvalues set by $(K_p,K_d)$. The assumption is full actuation and accurate model; for underactuated robots, only partial feedback linearization is possible.

Modern connection For high-dimensional or learned dynamics, control-Lyapunov functions (CLFs) are now composed with control-barrier functions (CBFs) inside a QP-based controller (CLF-CBF-QP). Neural Lyapunov functions, parameterized and trained with verified SDP-like losses, extend the methodology to learned systems.

15Exercises · Optimal control

Exercise 6 · Brachistochrone (flavor)★★

A bead slides from rest under gravity along a curve $y(x)$ from $(0,0)$ to $(x_f,y_f<0)$. Show that the travel-time functional is $T = \int_0^{x_f}\sqrt{(1+y'^2)/(-2gy)}\,dx$ and write the Euler–Lagrange equation.

Show solution

Arc length $ds = \sqrt{1+y'^2}\,dx$. Energy conservation: speed $v = \sqrt{-2gy}$ (starting at rest from the origin). Time $T = \int ds/v = \int_0^{x_f}\sqrt{(1+y'^2)/(-2gy)}\,dx$.

The Lagrangian $L(y,y') = \sqrt{(1+y'^2)/(-2gy)}$ is independent of $x$, so the Beltrami identity $L - y' \partial_{y'}L = \text{const}$ applies. After simplification this yields $y(1+y'^2) = C$, whose parametric solution is a cycloid — the brachistochrone. Classic lesson: variational principles give you the PDE; solving it is its own craft.

Exercise 7 · Bang-bang double integrator★★

Minimize the time $t_f$ to drive $\ddot x = u$ from $(x(0),\dot x(0)) = (x_0, v_0)$ to rest at the origin, with $|u|\le 1$. Use the maximum principle.

Show solution

State $(x_1,x_2)=(x,\dot x)$, dynamics $\dot x_1 = x_2,\dot x_2 = u$, cost $L=1$, terminal constraint $(x_1,x_2)(t_f)=(0,0)$. Control Hamiltonian: $H = 1 + \lambda_1 x_2 + \lambda_2 u$.

Costate: $\dot\lambda_1 = -\partial H/\partial x_1 = 0$, $\dot\lambda_2 = -\partial H/\partial x_2 = -\lambda_1$. So $\lambda_1 = c_1$ and $\lambda_2 = c_2 - c_1 t$ — a linear switching function.

Minimum principle: $u^\star = -\mathrm{sgn}(\lambda_2)$. Since $\lambda_2$ is linear in $t$, it switches sign at most once. The optimal policy is bang-bang with at most one switch: apply $u=\pm 1$ to accelerate toward the origin along a parabola, then switch to $u=\mp 1$ to brake. The switching curve in the phase plane is $x_1 = -\tfrac{1}{2}x_2|x_2|$. This is the archetypal time-optimal solution — easy to compute, hard to implement smoothly.

Exercise 8 · HJB for the LQR★★★

For $\dot{\mathbf{x}} = A\mathbf{x}+B\mathbf{u}$ and cost $\tfrac{1}{2}\!\int(\mathbf{x}^\top Q\mathbf{x}+\mathbf{u}^\top R\mathbf{u})dt$, show that the ansatz $V(\mathbf{x},t)=\tfrac{1}{2}\mathbf{x}^\top P(t)\mathbf{x}$ reduces HJB to the Riccati differential equation.

Show solution

HJB: $-\partial_t V = \min_\mathbf{u}\big[\tfrac{1}{2}\mathbf{x}^\top Q\mathbf{x} + \tfrac{1}{2}\mathbf{u}^\top R\mathbf{u} + \nabla_\mathbf{x} V^\top(A\mathbf{x}+B\mathbf{u})\big]$. With the ansatz, $\nabla V = P\mathbf{x}$ and $\partial_t V = \tfrac{1}{2}\mathbf{x}^\top \dot P\mathbf{x}$.

Inner minimization over $\mathbf{u}$: set $\partial/\partial\mathbf{u} = R\mathbf{u} + B^\top P\mathbf{x} = 0 \Rightarrow \mathbf{u}^\star = -R^{-1}B^\top P\mathbf{x}$. Plug back: $$-\tfrac{1}{2}\mathbf{x}^\top\dot P\mathbf{x} = \tfrac{1}{2}\mathbf{x}^\top Q\mathbf{x} + \tfrac{1}{2}\mathbf{x}^\top P B R^{-1}B^\top P\mathbf{x} - \mathbf{x}^\top P B R^{-1}B^\top P\mathbf{x} + \mathbf{x}^\top P A\mathbf{x}.$$ Symmetrize the last term ($\mathbf{x}^\top PA\mathbf{x} = \tfrac{1}{2}\mathbf{x}^\top(PA+A^\top P)\mathbf{x}$) and collect: $$-\dot P = A^\top P + PA - PBR^{-1}B^\top P + Q,$$ which must hold for all $\mathbf{x}$. Boundary: $V(\mathbf{x},t_f)=\tfrac{1}{2}\mathbf{x}^\top Q_f\mathbf{x}\Rightarrow P(t_f)=Q_f$. The guess is self-consistent, and the value function for LQR is exactly quadratic.

Exercise 9 · Scalar Riccati fixed point★★

For scalar system $\dot x = ax + bu$ with cost $\tfrac{1}{2}\!\int(qx^2 + ru^2)dt$, solve the algebraic Riccati equation for the infinite-horizon $P$, and find the closed-loop eigenvalue.

Show solution

ARE: $2aP - b^2 P^2/r + q = 0$. Solve the quadratic $b^2 P^2/r - 2aP - q = 0$: $$P = \frac{r}{b^2}\!\left(a + \sqrt{a^2 + b^2 q/r}\right)$$ (the positive root — the other branch fails positive-semidefiniteness).

Gain: $K = b P/r = (a+\sqrt{a^2+b^2q/r})/b$. Closed-loop eigenvalue: $a - bK = -\sqrt{a^2+b^2 q/r}$, always negative — asymptotic stability by construction. Increasing $q/r$ (penalizing state more) pushes the closed-loop pole further left, for a faster but more aggressive response.

Exercise 10 · Lyapunov stability of a damped pendulum★★

Show that the downward equilibrium of $\ddot q + c\dot q + (g/\ell)\sin q = 0$ (with damping $c>0$) is asymptotically stable.

Show solution

Total mechanical energy $V(q,\dot q) = \tfrac{1}{2}\dot q^2 + (g/\ell)(1-\cos q)$ is positive-definite on $|q|<\pi$. Its time derivative along trajectories: $$\dot V = \dot q\ddot q + (g/\ell)\sin q\,\dot q = \dot q\big(-c\dot q - (g/\ell)\sin q\big) + (g/\ell)\sin q\,\dot q = -c\dot q^2 \le 0.$$ So the origin is stable. $\dot V = 0$ only on the set $S = \{\dot q=0\}$; the only invariant subset of $S$ under the dynamics is $\{q=0,\dot q=0\}$ (any $q\ne 0$ has $\ddot q = -(g/\ell)\sin q\ne 0$). By LaSalle's invariance principle, the origin is locally asymptotically stable — basin at least $\{V<2g/\ell\}$, which excludes the inverted equilibrium.

Exercise 11 · LQR for the linearized cartpole★★★

Linearize the cartpole (Exercise 5) about the upright equilibrium $\theta=0$, state $(x,\theta,\dot x,\dot\theta)$, and write $A,B$. Indicate how to synthesize a stabilizing LQR.

Show solution

At $\theta=0$, $\sin\theta\approx\theta$, $\cos\theta\approx 1$, $\dot\theta^2$ negligible. The manipulator equation reduces to $$(M+m)\ddot x + m\ell\ddot\theta = u,\qquad m\ell\ddot x + m\ell^2\ddot\theta - mg\ell\theta = 0.$$ Solve for $(\ddot x,\ddot\theta)$: $$\ddot x = \frac{u - m g\theta}{M},\qquad \ddot\theta = \frac{(M+m)g\,\theta - u}{M\ell}.$$ With state $\mathbf{x}=(x,\theta,\dot x,\dot\theta)^\top$ and input $u$: $$A = \begin{pmatrix}0&0&1&0\\0&0&0&1\\0&-mg/M&0&0\\0&(M+m)g/(M\ell)&0&0\end{pmatrix},\quad B = \begin{pmatrix}0\\0\\1/M\\-1/(M\ell)\end{pmatrix}.$$ Check $(A,B)$ is controllable (e.g. Kalman rank condition). Choose diagonal $Q\succ 0$ penalizing $x,\theta$ strongly and $\dot x,\dot\theta$ weakly, and scalar $R>0$. Solve the continuous-time ARE with scipy.linalg.solve_continuous_are(A,B,Q,R), form $K=R^{-1}B^\top P$, apply $u=-K\mathbf{x}$. The resulting feedback locally stabilizes the full nonlinear cartpole — a classic demonstration.

Exercise 12 · iLQR backward pass★★★

For a discrete-time system linearized as $\delta\mathbf{x}_{k+1} = A_k\delta\mathbf{x}_k + B_k\delta\mathbf{u}_k$ with quadraticized cost $\ell_k\approx\tfrac{1}{2}\!\begin{pmatrix}\delta\mathbf{x}\\\delta\mathbf{u}\end{pmatrix}^\top\!\begin{pmatrix}Q_{xx}&Q_{xu}\\Q_{ux}&Q_{uu}\end{pmatrix}\!\begin{pmatrix}\delta\mathbf{x}\\\delta\mathbf{u}\end{pmatrix}$, derive the iLQR feedforward $k_k$ and feedback $K_k$.

Show solution

Let $V_{k+1}(\delta\mathbf{x}) = \tfrac{1}{2}\delta\mathbf{x}^\top V_{xx}^{k+1}\delta\mathbf{x} + v_x^{k+1\top}\delta\mathbf{x}$ be the quadratic cost-to-go. Define the state-action "Q-function" $$Q_k(\delta\mathbf{x},\delta\mathbf{u}) = \ell_k(\delta\mathbf{x},\delta\mathbf{u}) + V_{k+1}(A_k\delta\mathbf{x}+B_k\delta\mathbf{u}).$$ Its Hessians and gradients: $$Q_{xx}^{k} = \ell_{xx} + A_k^\top V_{xx}^{k+1}A_k,\ Q_{uu}^{k} = \ell_{uu} + B_k^\top V_{xx}^{k+1}B_k,\ Q_{ux}^{k} = \ell_{ux} + B_k^\top V_{xx}^{k+1}A_k,$$ $$Q_x^{k} = \ell_x + A_k^\top v_x^{k+1},\ Q_u^{k} = \ell_u + B_k^\top v_x^{k+1}.$$ Minimize over $\delta\mathbf{u}$: $\delta\mathbf{u}^\star = -{Q_{uu}^k}^{-1}(Q_u^k + Q_{ux}^k\delta\mathbf{x}) = k_k + K_k\delta\mathbf{x}$ with $$k_k = -{Q_{uu}^k}^{-1}Q_u^k,\quad K_k = -{Q_{uu}^k}^{-1}Q_{ux}^k.$$ Substitute back to get $V_k$: $V_{xx}^k = Q_{xx}^k + K_k^\top Q_{uu}^k K_k + Q_{ux}^{k\top}K_k + K_k^\top Q_{ux}^k$; $v_x^k = Q_x^k + K_k^\top Q_u^k$. Start from $V_{xx}^N = \phi_{xx},v_x^N = \phi_x$ and iterate backward. Forward pass rolls out $\mathbf{u}_k = \bar{\mathbf{u}}_k + \alpha k_k + K_k(\mathbf{x}_k-\bar{\mathbf{x}}_k)$ with line-search on $\alpha$; regularize $Q_{uu}$ if it becomes non-PSD.