Analytical mechanics, manipulator dynamics, and the control-theoretic tools every roboticist reaches for — Lagrangian and Hamiltonian formulations, Pontryagin's maximum principle, the HJB equation, LQR, iLQR, MPC, and Lyapunov stability — distilled and paired with worked exercises.
From configuration spaces to manipulator dynamics — the equations every controller operates on.
A robot's configuration is a point $\mathbf{q}\in\mathcal{Q}$ in the configuration space, a smooth manifold of dimension $n$ (the number of degrees of freedom). Each coordinate $q_i$ is a generalized coordinate.
$\mathbf{q} = (\theta_1,\ldots,\theta_n)\in\mathbb{T}^n$ or $\mathbb{R}^n$ for revolute/prismatic joints. Flat configuration space.
$\mathbf{q}=(\mathbf{p},\mathbf{R},\mathbf{q}_{\text{joints}})\in\mathrm{SE}(3)\times\mathbb{R}^{n_j}$. Rotation part is a Lie group, not a vector space.
$\mathbf{q}\in\mathrm{SE}(3)$, underactuated (4 inputs, 6 DoF).
$\mathbf{q}=(x,y,\theta)\in\mathrm{SE}(2)$. Nonholonomic: cannot slide sideways.
The tangent space $T_\mathbf{q}\mathcal{Q}$ holds generalized velocities $\dot{\mathbf{q}}$; the cotangent space $T_\mathbf{q}^*\mathcal{Q}$ holds generalized forces and momenta. The total state is $(\mathbf{q},\dot{\mathbf{q}})\in T\mathcal{Q}$ or $(\mathbf{q},\mathbf{p})\in T^*\mathcal{Q}$.
Define the Lagrangian $\mathcal{L}(\mathbf{q},\dot{\mathbf{q}},t) = T(\mathbf{q},\dot{\mathbf{q}}) - V(\mathbf{q})$, the difference between kinetic and potential energy. Hamilton's principle of stationary action says that physical trajectories are critical points of $$S[\mathbf{q}] = \int_{t_0}^{t_1} \mathcal{L}(\mathbf{q}(t),\dot{\mathbf{q}}(t),t)\,dt,$$ among curves with fixed endpoints.
Kinetic energy of a mechanical system with mass matrix $M(\mathbf{q})\succ 0$: $$T = \tfrac{1}{2}\dot{\mathbf{q}}^\top M(\mathbf{q})\dot{\mathbf{q}}.$$ The Euler–Lagrange equations then take the manipulator form $$M(\mathbf{q})\ddot{\mathbf{q}} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) = \boldsymbol{\tau},$$ with $G = \partial V/\partial\mathbf{q}$ and the Coriolis matrix $C$ satisfying $[C]_{ij} = \sum_k \Gamma_{ijk}\dot q_k$ for the Christoffel symbols $\Gamma_{ijk} = \tfrac{1}{2}(\partial_k M_{ij} + \partial_j M_{ik} - \partial_i M_{jk})$ of the kinetic-energy metric $M$. A key structural property is that $\dot M - 2C$ is skew-symmetric — used heavily in adaptive and passivity-based control.
The rotation group $\mathrm{SO}(3) = \{R\in\mathbb{R}^{3\times3}: R^\top R = I, \det R = +1\}$ is a three-dimensional Lie group. Its Lie algebra $\mathfrak{so}(3)$ is the set of $3\times 3$ skew-symmetric matrices. The hat map lifts a vector $\boldsymbol{\omega}\in\mathbb{R}^3$ to a skew matrix: $$\widehat{\boldsymbol{\omega}} = \begin{pmatrix}0 & -\omega_3 & \omega_2\\ \omega_3 & 0 & -\omega_1\\ -\omega_2 & \omega_1 & 0\end{pmatrix},\qquad \widehat{\boldsymbol{\omega}}\,\mathbf{v}=\boldsymbol{\omega}\times\mathbf{v}.$$ The exponential map (Rodrigues' formula) takes $\mathfrak{so}(3)\to\mathrm{SO}(3)$: $$\exp(\widehat{\boldsymbol{\omega}}\,\theta) = I + \sin\theta\,\widehat{\hat{\mathbf n}} + (1-\cos\theta)\,\widehat{\hat{\mathbf n}}^{\,2},\qquad \hat{\mathbf n}=\boldsymbol{\omega}/\|\boldsymbol{\omega}\|.$$
Rigid transforms form $\mathrm{SE}(3)=\mathrm{SO}(3)\ltimes\mathbb{R}^3$, represented as $4\times 4$ homogeneous matrices $$T = \begin{pmatrix}R & \mathbf{p}\\ \mathbf 0 & 1\end{pmatrix}.$$ The Lie algebra $\mathfrak{se}(3)$ is parametrized by a twist $\boldsymbol{\xi}=(\boldsymbol{v},\boldsymbol{\omega})\in\mathbb{R}^6$ (linear + angular velocity).
If $R(t)\in\mathrm{SO}(3)$, then $\dot R = R\,\widehat{\boldsymbol{\omega}}_B = \widehat{\boldsymbol{\omega}}_S\,R$ where $\boldsymbol{\omega}_B$ is angular velocity in the body frame and $\boldsymbol{\omega}_S=R\boldsymbol{\omega}_B$ in the spatial frame. This is why small-angle rotation increments are applied on the left (spatial) or right (body) depending on convention.
Forward kinematics: map joint angles to end-effector pose, $\mathbf{x}_{\text{ee}} = f(\mathbf{q})\in\mathrm{SE}(3)$. For a serial chain with Denavit–Hartenberg parameters, $f$ is the product of per-link transforms $$T_{0n}(\mathbf{q}) = \prod_{i=1}^n T_{i-1,i}(q_i).$$ More modern product-of-exponentials form: $T_{0n}(\mathbf{q}) = e^{\widehat{\boldsymbol{\xi}}_1 q_1}\cdots e^{\widehat{\boldsymbol{\xi}}_n q_n} T_{0n}(0)$, with $\boldsymbol{\xi}_i$ the screw axis of joint $i$ in the base frame.
Given $\mathbf{x}^\star$, find $\mathbf{q}^\star$ with $f(\mathbf{q}^\star)=\mathbf{x}^\star$. Approaches:
For an $n$-link open-chain robot, the Lagrangian is $\mathcal{L} = \tfrac{1}{2}\dot{\mathbf{q}}^\top M(\mathbf{q})\dot{\mathbf{q}} - V(\mathbf{q})$ with mass matrix $$M(\mathbf{q}) = \sum_{i=1}^n \big(J_{v,i}^\top m_i J_{v,i} + J_{\omega,i}^\top R_i\,I_i\,R_i^\top J_{\omega,i}\big),$$ where $m_i$ is the $i$-th link mass, $I_i$ its inertia tensor in body frame, and $J_{v,i},J_{\omega,i}$ are the link-center Jacobians. The potential is $V = \sum_i m_i\,g\,z_i(\mathbf{q})$.
For chains of more than a few DoF, writing $M,C$ in closed form is impractical. Recursive Newton–Euler computes $\boldsymbol{\tau}(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}})$ in $O(n)$: an outward pass propagates velocities and accelerations link-by-link, and an inward pass accumulates forces. Featherstone's articulated-body algorithm solves the forward dynamics $\ddot{\mathbf{q}} = f(\mathbf{q},\dot{\mathbf{q}},\boldsymbol{\tau})$ in $O(n)$ without forming $M^{-1}$ explicitly.
Define the generalized momentum $\mathbf{p} = \partial\mathcal{L}/\partial\dot{\mathbf{q}} = M(\mathbf{q})\dot{\mathbf{q}}$, and the Hamiltonian via the Legendre transform $$H(\mathbf{q},\mathbf{p}) = \mathbf{p}^\top\dot{\mathbf{q}} - \mathcal{L} = \tfrac{1}{2}\mathbf{p}^\top M(\mathbf{q})^{-1}\mathbf{p} + V(\mathbf{q}).$$ The equations of motion become $$\dot{\mathbf{q}} = \frac{\partial H}{\partial \mathbf{p}},\qquad \dot{\mathbf{p}} = -\frac{\partial H}{\partial\mathbf{q}} + \boldsymbol{\tau},$$ a first-order system in $(\mathbf{q},\mathbf{p})$.
Holonomic constraints act on configuration: $\boldsymbol{\phi}(\mathbf{q})=0$, $\boldsymbol{\phi}:\mathbb{R}^n\to\mathbb{R}^m$. Differentiating: $A(\mathbf{q})\dot{\mathbf{q}}=0$ with $A = \partial\boldsymbol{\phi}/\partial\mathbf{q}$. Augment the Lagrangian with multipliers $\boldsymbol{\lambda}\in\mathbb{R}^m$: $$M\ddot{\mathbf{q}}+C\dot{\mathbf{q}}+G = \boldsymbol{\tau} + A^\top\boldsymbol{\lambda},\qquad A\ddot{\mathbf{q}}+\dot A\dot{\mathbf{q}}=0.$$ Solve jointly — the multipliers are constraint forces.
Nonholonomic constraints act on velocity and are not integrable to a configuration constraint, e.g. rolling without slipping on a wheel or car. The car's no-sideways-slip constraint $\sin\theta\,\dot x - \cos\theta\,\dot y = 0$ restricts velocities but leaves $(x,y,\theta)\in\mathrm{SE}(2)$ fully accessible — the configuration manifold keeps its full dimension.
Contact with friction is modeled as a linear complementarity problem: $$0\le\lambda_n\perp g(\mathbf{q})\ge 0,\qquad \|\boldsymbol{\lambda}_t\|\le\mu\lambda_n,$$ with $g$ a signed distance. Modern rigid-body simulators (MuJoCo, Bullet, Drake) solve this via regularized/smoothed contact models, or true LCP/NCP solvers. Differentiable contact (DiffSim, Warp) enables gradient-based policy learning through contacts.
Position: $(x,y) = (\ell\sin q, -\ell\cos q)$. Velocity-squared: $\dot x^2+\dot y^2 = \ell^2\dot q^2$. Kinetic energy $T = \tfrac{1}{2}m\ell^2\dot q^2$. Potential $V = -mg\ell\cos q$ (zero at the pivot). Lagrangian $\mathcal{L}=T-V = \tfrac{1}{2}m\ell^2\dot q^2 + mg\ell\cos q$.
Euler–Lagrange: $\partial\mathcal{L}/\partial\dot q = m\ell^2\dot q$, so $(d/dt)(\partial\mathcal{L}/\partial\dot q) = m\ell^2\ddot q$. And $\partial\mathcal{L}/\partial q = -mg\ell\sin q$. Therefore $m\ell^2\ddot q + mg\ell\sin q = 0$, i.e. $$\ddot q + (g/\ell)\sin q = 0.$$ Linearizing about $q=0$ gives $\ddot q + (g/\ell)q = 0$ with period $T = 2\pi\sqrt{\ell/g}$.
End of link 1: $(\ell_1 c_1, \ell_1 s_1)$; end of link 2: $(\ell_1 c_1 + \ell_2 c_{12}, \ell_1 s_1 + \ell_2 s_{12})$, where $c_1=\cos q_1$, $c_{12}=\cos(q_1+q_2)$, etc.
Kinetic energies: $$T_1 = \tfrac{1}{2}m_1\ell_1^2\dot q_1^2,\qquad T_2 = \tfrac{1}{2}m_2\big(\ell_1^2\dot q_1^2 + \ell_2^2(\dot q_1+\dot q_2)^2 + 2\ell_1\ell_2\,c_2\,\dot q_1(\dot q_1+\dot q_2)\big).$$ Collecting $T=\tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}}$: $$M(\mathbf{q}) = \begin{pmatrix} m_1\ell_1^2 + m_2(\ell_1^2+\ell_2^2+2\ell_1\ell_2 c_2) & m_2(\ell_2^2 + \ell_1\ell_2 c_2)\\ m_2(\ell_2^2+\ell_1\ell_2 c_2) & m_2\ell_2^2\end{pmatrix}.$$ The off-diagonal coupling $m_2 \ell_1\ell_2 c_2$ is why robot arms have configuration-dependent inertia: swinging link 1 accelerates link 2 and vice versa.
Take $C_{ij} = \sum_k \Gamma_{ijk}\dot q_k$ with Christoffel symbols $\Gamma_{ijk} = \tfrac{1}{2}(\partial_k M_{ij} + \partial_j M_{ik} - \partial_i M_{jk})$. Then $\dot M_{ij} = \sum_k \partial_k M_{ij}\dot q_k$ and $$(\dot M - 2C)_{ij} = \sum_k\big(\partial_k M_{ij} - \partial_k M_{ij} - \partial_j M_{ik} + \partial_i M_{jk}\big)\dot q_k = \sum_k(\partial_i M_{jk} - \partial_j M_{ik})\dot q_k,$$ which is antisymmetric in $(i,j)$.
Consequence: $\dot{\mathbf{q}}^\top(\dot M - 2C)\dot{\mathbf{q}} = 0$. Multiplying $M\ddot{\mathbf{q}}+C\dot{\mathbf{q}}+G=\boldsymbol{\tau}$ by $\dot{\mathbf{q}}^\top$ and using this identity gives the power balance $\dot E = \dot{\mathbf{q}}^\top\boldsymbol{\tau}$ where $E = \tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}}+V$ — total energy changes only by actuator work. This passivity identity is the foundation of passivity-based manipulator control (Slotine–Li adaptive control, PD+gravity compensation stability).
Since $\hat{\mathbf n}$ is a unit vector, $\widehat{\hat{\mathbf n}}^3 = -\widehat{\hat{\mathbf n}}$ (verify from $\widehat{\hat{\mathbf n}}^2 = \hat{\mathbf n}\hat{\mathbf n}^\top - I$ and $\widehat{\hat{\mathbf n}}\hat{\mathbf n}=0$). Expand the exponential series: $$e^{\widehat{\hat{\mathbf n}}\theta} = \sum_{k=0}^\infty \frac{(\widehat{\hat{\mathbf n}}\theta)^k}{k!} = I + \theta\widehat{\hat{\mathbf n}} + \tfrac{\theta^2}{2}\widehat{\hat{\mathbf n}}^{\,2} - \tfrac{\theta^3}{6}\widehat{\hat{\mathbf n}} - \tfrac{\theta^4}{24}\widehat{\hat{\mathbf n}}^{\,2} + \cdots$$ Grouping odd powers gives $\sin\theta\,\widehat{\hat{\mathbf n}}$; even powers (from $k\ge 2$) give $(1-\cos\theta)\widehat{\hat{\mathbf n}}^{\,2}$. Hence $e^{\widehat{\hat{\mathbf n}}\theta} = I + \sin\theta\,\widehat{\hat{\mathbf n}}+(1-\cos\theta)\,\widehat{\hat{\mathbf n}}^{\,2}$. This is the canonical closed form used in IK solvers, quaternion conversions, and Lie-group integration.
Generalized coordinates $\mathbf{q}=(x,\theta)$. Cart position $(x,0)$, pole-tip position $(x+\ell\sin\theta, \ell\cos\theta)$. Velocities: tip $(\dot x+\ell\cos\theta\,\dot\theta, -\ell\sin\theta\,\dot\theta)$.
$T = \tfrac{1}{2}M\dot x^2 + \tfrac{1}{2}m\big((\dot x+\ell c\dot\theta)^2 + (\ell s\dot\theta)^2\big) = \tfrac{1}{2}(M+m)\dot x^2 + m\ell c\dot x\dot\theta + \tfrac{1}{2}m\ell^2\dot\theta^2$, using $c=\cos\theta,s=\sin\theta$. Potential $V = mg\ell\cos\theta$. Euler–Lagrange applied to $x$ (non-conservative force $u$) and $\theta$: $$(M+m)\ddot x + m\ell(c\ddot\theta - s\dot\theta^2) = u,$$ $$m\ell c\ddot x + m\ell^2\ddot\theta - m g\ell s = 0.$$ Equivalently, in manipulator form $$\begin{pmatrix}M+m & m\ell c\\ m\ell c & m\ell^2\end{pmatrix}\!\!\begin{pmatrix}\ddot x\\ \ddot\theta\end{pmatrix} + \begin{pmatrix}-m\ell s\dot\theta^2\\ 0\end{pmatrix} + \begin{pmatrix}0\\ -mg\ell s\end{pmatrix} = \begin{pmatrix}u\\ 0\end{pmatrix}.$$ This is the canonical underactuated system for demonstrating LQR stabilization, energy shaping, iLQR trajectory optimization, and model-based RL.
Calculus of variations, Pontryagin, HJB, LQR, iLQR, MPC, and Lyapunov — the toolkit for designing and certifying robot controllers.
The foundational problem: minimize a functional $$J[\mathbf{x}] = \int_{t_0}^{t_f} L(\mathbf{x}(t),\dot{\mathbf{x}}(t),t)\,dt$$ over curves $\mathbf{x}:[t_0,t_f]\to\mathbb{R}^n$ with fixed endpoints. Stationary curves satisfy the Euler–Lagrange equation $\frac{d}{dt}\partial_{\dot{\mathbf{x}}}L = \partial_{\mathbf{x}}L$ — the same operator that governs conservative mechanics (§2). Lagrangian mechanics is variational, and so is optimal control.
Transversality: if the terminal state is free, $\partial_{\dot{\mathbf{x}}}L(t_f)=0$; if the terminal time is free, $L(t_f) - \dot{\mathbf{x}}(t_f)^\top\partial_{\dot{\mathbf{x}}}L(t_f) = 0$. These boundary conditions emerge from integration-by-parts of the first variation.
Optimal control problem: minimize $$J(\mathbf{u}) = \phi(\mathbf{x}(t_f)) + \int_{t_0}^{t_f} L(\mathbf{x}(t),\mathbf{u}(t),t)\,dt\quad\text{subject to}\quad \dot{\mathbf{x}} = \mathbf{f}(\mathbf{x},\mathbf{u},t),\ \mathbf{u}(t)\in\mathcal{U}.$$
The principle turns an infinite-dimensional optimization into a two-point boundary-value problem (TPBVP) in $(\mathbf{x},\boldsymbol{\lambda})$. For problems with control constraints, the minimum condition often yields bang-bang control: $\mathbf{u}^\star$ saturates at the bounds whenever the switching function $\partial H/\partial\mathbf{u}$ is non-zero. Singular arcs occur where it vanishes on an interval.
Define the value function as the optimal cost-to-go from state $\mathbf{x}$ at time $t$: $$V(\mathbf{x},t) = \min_{\mathbf{u}(\cdot)}\!\left[\phi(\mathbf{x}(t_f)) + \int_t^{t_f} L(\mathbf{x}(s),\mathbf{u}(s),s)\,ds\right],\quad \mathbf{x}(t)=\mathbf{x}.$$
The HJB equation defines the optimal feedback policy $\mathbf{u}^\star(\mathbf{x},t)$, unlike the PMP which gives an open-loop optimal trajectory. For LQR it admits closed form; for general nonlinear systems it suffers from the curse of dimensionality: numerical solution requires gridding state space, which scales as $N^n$.
Discrete-time / finite-horizon analog: $$V_k(\mathbf{x}) = \min_{\mathbf{u}}\big[\ell_k(\mathbf{x},\mathbf{u}) + V_{k+1}(\mathbf{f}_k(\mathbf{x},\mathbf{u}))\big],\quad V_N(\mathbf{x})=\phi(\mathbf{x}).$$ This is the Bellman equation solved (exactly or approximately) by DP, value iteration, Q-learning, and all of reinforcement learning.
A linear system $\dot{\mathbf{x}} = A\mathbf{x} + B\mathbf{u}$ with quadratic cost $$J = \tfrac{1}{2}\mathbf{x}(t_f)^\top Q_f\,\mathbf{x}(t_f) + \tfrac{1}{2}\int_{t_0}^{t_f}\!\big(\mathbf{x}^\top Q\mathbf{x} + \mathbf{u}^\top R\mathbf{u}\big)\,dt,\quad Q\succeq 0,\ R\succ 0,\ Q_f\succeq 0.$$ Guess $V(\mathbf{x},t) = \tfrac{1}{2}\mathbf{x}^\top P(t)\mathbf{x}$ and substitute into HJB. The inner minimization gives $\mathbf{u}^\star = -R^{-1}B^\top P\mathbf{x} =: -K(t)\mathbf{x}$, and $P(t)$ satisfies the continuous-time Riccati differential equation $$-\dot P = A^\top P + P A - P B R^{-1} B^\top P + Q,\quad P(t_f)=Q_f.$$
For $\mathbf{x}_{k+1}=A\mathbf{x}_k+B\mathbf{u}_k$ with stage cost $\mathbf{x}^\top Q\mathbf{x} + \mathbf{u}^\top R\mathbf{u}$, the DP recursion is
$$P_k = Q + A^\top P_{k+1}A - A^\top P_{k+1}B(R+B^\top P_{k+1}B)^{-1}B^\top P_{k+1}A,$$
$$K_k = (R+B^\top P_{k+1}B)^{-1}B^\top P_{k+1}A,\quad \mathbf{u}_k^\star = -K_k\mathbf{x}_k.$$
The infinite-horizon fixed point gives the discrete algebraic Riccati equation (DARE), solved in practice by scipy.linalg.solve_discrete_are.
For a nonlinear system $\dot{\mathbf{x}}=f(\mathbf{x},\mathbf{u})$ with equilibrium $(\mathbf{x}^\star,\mathbf{u}^\star)$, linearize: $A = \partial f/\partial\mathbf{x}$, $B = \partial f/\partial\mathbf{u}$ at the equilibrium. LQR designed on $(A,B)$ locally stabilizes the true nonlinear system — the backbone of hover controllers for quadrotors, walking-balance controllers for bipeds, and the terminal cost in most MPC formulations.
Iterative LQR (iLQR) extends LQR to nonlinear systems by repeated local linearization:
DDP (differential dynamic programming) keeps the second-order expansion of dynamics as well, giving quadratic convergence near the optimum. iLQR drops those terms for a $\sim10\times$ speedup with nearly the same behavior.
Convert the continuous optimal-control problem into a finite NLP: $$\min_{\mathbf{x}_{0:N},\mathbf{u}_{0:N-1}} \phi(\mathbf{x}_N) + \sum_{k=0}^{N-1}\ell_k(\mathbf{x}_k,\mathbf{u}_k)\quad\text{s.t.}\quad \mathbf{x}_{k+1} = \mathbf{f}_k(\mathbf{x}_k,\mathbf{u}_k),\ \mathbf{x}_k\in\mathcal{X},\ \mathbf{u}_k\in\mathcal{U}.$$ Direct shooting optimizes over $\mathbf{u}$ only (states eliminated by forward simulation); multiple shooting breaks the horizon into segments; collocation treats $\mathbf{x}_k$ and $\mathbf{u}_k$ as independent variables with dynamics as equality constraints. Solved by SQP or interior-point (Ipopt, SNOPT, FORCES Pro).
Repeat at every control tick $t_k$:
MPC fuses long-horizon planning with feedback. For linear systems with quadratic cost and linear constraints, each problem is a convex QP — closed-loop stability follows from a terminal cost and terminal constraint that render the infinite-horizon tail cost-decreasing. For nonlinear robots, NMPC solves an SQP online, now tractable at kilohertz rates for quadrupeds and humanoids with real-time iteration schemes and warm-started solvers (acados, HPIPM).
Example — PD + gravity compensation for a manipulator at setpoint $\mathbf{q}_d$: the controller $\boldsymbol{\tau} = -K_p(\mathbf{q}-\mathbf{q}_d) - K_d\dot{\mathbf{q}} + G(\mathbf{q})$ is GAS with Lyapunov function $V = \tfrac{1}{2}\dot{\mathbf{q}}^\top M\dot{\mathbf{q}} + \tfrac{1}{2}(\mathbf{q}-\mathbf{q}_d)^\top K_p(\mathbf{q}-\mathbf{q}_d)$; $\dot V = -\dot{\mathbf{q}}^\top K_d\dot{\mathbf{q}}\le 0$ and LaSalle gives convergence.
For the manipulator, the control $\boldsymbol{\tau} = M(\mathbf{q})\mathbf{v} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q})$ yields $\ddot{\mathbf{q}} = \mathbf{v}$ — a chain of integrators. Choosing $\mathbf{v} = \ddot{\mathbf{q}}_d - K_d(\dot{\mathbf{q}}-\dot{\mathbf{q}}_d) - K_p(\mathbf{q}-\mathbf{q}_d)$ gives exponentially stable tracking with eigenvalues set by $(K_p,K_d)$. The assumption is full actuation and accurate model; for underactuated robots, only partial feedback linearization is possible.
Arc length $ds = \sqrt{1+y'^2}\,dx$. Energy conservation: speed $v = \sqrt{-2gy}$ (starting at rest from the origin). Time $T = \int ds/v = \int_0^{x_f}\sqrt{(1+y'^2)/(-2gy)}\,dx$.
The Lagrangian $L(y,y') = \sqrt{(1+y'^2)/(-2gy)}$ is independent of $x$, so the Beltrami identity $L - y' \partial_{y'}L = \text{const}$ applies. After simplification this yields $y(1+y'^2) = C$, whose parametric solution is a cycloid — the brachistochrone. Classic lesson: variational principles give you the PDE; solving it is its own craft.
State $(x_1,x_2)=(x,\dot x)$, dynamics $\dot x_1 = x_2,\dot x_2 = u$, cost $L=1$, terminal constraint $(x_1,x_2)(t_f)=(0,0)$. Control Hamiltonian: $H = 1 + \lambda_1 x_2 + \lambda_2 u$.
Costate: $\dot\lambda_1 = -\partial H/\partial x_1 = 0$, $\dot\lambda_2 = -\partial H/\partial x_2 = -\lambda_1$. So $\lambda_1 = c_1$ and $\lambda_2 = c_2 - c_1 t$ — a linear switching function.
Minimum principle: $u^\star = -\mathrm{sgn}(\lambda_2)$. Since $\lambda_2$ is linear in $t$, it switches sign at most once. The optimal policy is bang-bang with at most one switch: apply $u=\pm 1$ to accelerate toward the origin along a parabola, then switch to $u=\mp 1$ to brake. The switching curve in the phase plane is $x_1 = -\tfrac{1}{2}x_2|x_2|$. This is the archetypal time-optimal solution — easy to compute, hard to implement smoothly.
HJB: $-\partial_t V = \min_\mathbf{u}\big[\tfrac{1}{2}\mathbf{x}^\top Q\mathbf{x} + \tfrac{1}{2}\mathbf{u}^\top R\mathbf{u} + \nabla_\mathbf{x} V^\top(A\mathbf{x}+B\mathbf{u})\big]$. With the ansatz, $\nabla V = P\mathbf{x}$ and $\partial_t V = \tfrac{1}{2}\mathbf{x}^\top \dot P\mathbf{x}$.
Inner minimization over $\mathbf{u}$: set $\partial/\partial\mathbf{u} = R\mathbf{u} + B^\top P\mathbf{x} = 0 \Rightarrow \mathbf{u}^\star = -R^{-1}B^\top P\mathbf{x}$. Plug back: $$-\tfrac{1}{2}\mathbf{x}^\top\dot P\mathbf{x} = \tfrac{1}{2}\mathbf{x}^\top Q\mathbf{x} + \tfrac{1}{2}\mathbf{x}^\top P B R^{-1}B^\top P\mathbf{x} - \mathbf{x}^\top P B R^{-1}B^\top P\mathbf{x} + \mathbf{x}^\top P A\mathbf{x}.$$ Symmetrize the last term ($\mathbf{x}^\top PA\mathbf{x} = \tfrac{1}{2}\mathbf{x}^\top(PA+A^\top P)\mathbf{x}$) and collect: $$-\dot P = A^\top P + PA - PBR^{-1}B^\top P + Q,$$ which must hold for all $\mathbf{x}$. Boundary: $V(\mathbf{x},t_f)=\tfrac{1}{2}\mathbf{x}^\top Q_f\mathbf{x}\Rightarrow P(t_f)=Q_f$. The guess is self-consistent, and the value function for LQR is exactly quadratic.
ARE: $2aP - b^2 P^2/r + q = 0$. Solve the quadratic $b^2 P^2/r - 2aP - q = 0$: $$P = \frac{r}{b^2}\!\left(a + \sqrt{a^2 + b^2 q/r}\right)$$ (the positive root — the other branch fails positive-semidefiniteness).
Gain: $K = b P/r = (a+\sqrt{a^2+b^2q/r})/b$. Closed-loop eigenvalue: $a - bK = -\sqrt{a^2+b^2 q/r}$, always negative — asymptotic stability by construction. Increasing $q/r$ (penalizing state more) pushes the closed-loop pole further left, for a faster but more aggressive response.
Total mechanical energy $V(q,\dot q) = \tfrac{1}{2}\dot q^2 + (g/\ell)(1-\cos q)$ is positive-definite on $|q|<\pi$. Its time derivative along trajectories: $$\dot V = \dot q\ddot q + (g/\ell)\sin q\,\dot q = \dot q\big(-c\dot q - (g/\ell)\sin q\big) + (g/\ell)\sin q\,\dot q = -c\dot q^2 \le 0.$$ So the origin is stable. $\dot V = 0$ only on the set $S = \{\dot q=0\}$; the only invariant subset of $S$ under the dynamics is $\{q=0,\dot q=0\}$ (any $q\ne 0$ has $\ddot q = -(g/\ell)\sin q\ne 0$). By LaSalle's invariance principle, the origin is locally asymptotically stable — basin at least $\{V<2g/\ell\}$, which excludes the inverted equilibrium.
At $\theta=0$, $\sin\theta\approx\theta$, $\cos\theta\approx 1$, $\dot\theta^2$ negligible. The manipulator equation reduces to
$$(M+m)\ddot x + m\ell\ddot\theta = u,\qquad m\ell\ddot x + m\ell^2\ddot\theta - mg\ell\theta = 0.$$
Solve for $(\ddot x,\ddot\theta)$:
$$\ddot x = \frac{u - m g\theta}{M},\qquad \ddot\theta = \frac{(M+m)g\,\theta - u}{M\ell}.$$
With state $\mathbf{x}=(x,\theta,\dot x,\dot\theta)^\top$ and input $u$:
$$A = \begin{pmatrix}0&0&1&0\\0&0&0&1\\0&-mg/M&0&0\\0&(M+m)g/(M\ell)&0&0\end{pmatrix},\quad B = \begin{pmatrix}0\\0\\1/M\\-1/(M\ell)\end{pmatrix}.$$
Check $(A,B)$ is controllable (e.g. Kalman rank condition). Choose diagonal $Q\succ 0$ penalizing $x,\theta$ strongly and $\dot x,\dot\theta$ weakly, and scalar $R>0$. Solve the continuous-time ARE with scipy.linalg.solve_continuous_are(A,B,Q,R), form $K=R^{-1}B^\top P$, apply $u=-K\mathbf{x}$. The resulting feedback locally stabilizes the full nonlinear cartpole — a classic demonstration.
Let $V_{k+1}(\delta\mathbf{x}) = \tfrac{1}{2}\delta\mathbf{x}^\top V_{xx}^{k+1}\delta\mathbf{x} + v_x^{k+1\top}\delta\mathbf{x}$ be the quadratic cost-to-go. Define the state-action "Q-function" $$Q_k(\delta\mathbf{x},\delta\mathbf{u}) = \ell_k(\delta\mathbf{x},\delta\mathbf{u}) + V_{k+1}(A_k\delta\mathbf{x}+B_k\delta\mathbf{u}).$$ Its Hessians and gradients: $$Q_{xx}^{k} = \ell_{xx} + A_k^\top V_{xx}^{k+1}A_k,\ Q_{uu}^{k} = \ell_{uu} + B_k^\top V_{xx}^{k+1}B_k,\ Q_{ux}^{k} = \ell_{ux} + B_k^\top V_{xx}^{k+1}A_k,$$ $$Q_x^{k} = \ell_x + A_k^\top v_x^{k+1},\ Q_u^{k} = \ell_u + B_k^\top v_x^{k+1}.$$ Minimize over $\delta\mathbf{u}$: $\delta\mathbf{u}^\star = -{Q_{uu}^k}^{-1}(Q_u^k + Q_{ux}^k\delta\mathbf{x}) = k_k + K_k\delta\mathbf{x}$ with $$k_k = -{Q_{uu}^k}^{-1}Q_u^k,\quad K_k = -{Q_{uu}^k}^{-1}Q_{ux}^k.$$ Substitute back to get $V_k$: $V_{xx}^k = Q_{xx}^k + K_k^\top Q_{uu}^k K_k + Q_{ux}^{k\top}K_k + K_k^\top Q_{ux}^k$; $v_x^k = Q_x^k + K_k^\top Q_u^k$. Start from $V_{xx}^N = \phi_{xx},v_x^N = \phi_x$ and iterate backward. Forward pass rolls out $\mathbf{u}_k = \bar{\mathbf{u}}_k + \alpha k_k + K_k(\mathbf{x}_k-\bar{\mathbf{x}}_k)$ with line-search on $\alpha$; regularize $Q_{uu}$ if it becomes non-PSD.