Optimal control

Optimal control theory is a branch of

objective function is optimized.^[1] It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the Moon with minimum fuel expenditure.^[2] Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy.^[3] A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.^[4]^[5]

Optimal control is an extension of the

control strategy in control theory.^[1]

General method

Optimal control deals with the problem of finding a control law for a given system such that a certain

sufficient condition

).

We begin with a simple example. Consider a car traveling in a straight line on a hilly road. The question is, how should the driver press the accelerator pedal in order to minimize the total traveling time? In this example, the term control law refers specifically to the way in which the driver presses the accelerator and shifts the gears. The system consists of both the car and the road, and the optimality criterion is the minimization of the total traveling time. Control problems usually include ancillary constraints. For example, the amount of available fuel might be limited, the accelerator pedal cannot be pushed through the floor of the car, speed limits, etc.

A proper cost function will be a mathematical expression giving the traveling time as a function of the speed, geometrical considerations, and initial conditions of the system. Constraints are often interchangeable with the cost function.

Another related optimal control problem may be to find the way to drive the car so as to minimize its fuel consumption, given that it must complete a given course in a time not exceeding some amount. Yet another related control problem may be to minimize the total monetary cost of completing the trip, given assumed monetary prices for time and fuel.

A more abstract framework goes as follows.^[1] Minimize the continuous-time cost functional

J[{\textbf {x}}(\cdot ),{\textbf {u}}(\cdot ),t_{0},t_{f}]:=E\,[{\textbf {x}}(t_{0}),t_{0},{\textbf {x}}(t_{f}),t_{f}]+\int _{t_{0}}^{t_{f}}F\,[{\textbf {x}}(t),{\textbf {u}}(t),t]\,\mathrm {d} t

subject to the first-order dynamic constraints (the state equation)

{\dot {\textbf {x}}}(t)={\textbf {f}}\,[\,{\textbf {x}}(t),{\textbf {u}}(t),t],

the algebraic path constraints

{\textbf {h}}\,[{\textbf {x}}(t),{\textbf {u}}(t),t]\leq {\textbf {0}},

and the

endpoint conditions

{\textbf {e}}[{\textbf {x}}(t_{0}),t_{0},{\textbf {x}}(t_{f}),t_{f}]=0

where

{\textbf {x}}(t)

is the state,

{\textbf {u}}(t)

is the control,

t

is the independent variable (generally speaking, time),

t_{0}

is the initial time, and

t_{f}

is the terminal time. The terms

E

and

F

are called the endpoint cost and the running cost respectively. In the calculus of variations,

E

and

F

are referred to as the Mayer term and the Lagrangian, respectively. Furthermore, it is noted that the path constraints are in general inequality constraints and thus may not be active (i.e., equal to zero) at the optimal solution. It is also noted that the optimal control problem as stated above may have multiple solutions (i.e., the solution may not be unique). Thus, it is most often the case that any solution

[{\textbf {x}}^{*}(t),{\textbf {u}}^{*}(t),t_{0}^{*},t_{f}^{*}]

to the optimal control problem is locally minimizing.

Linear quadratic control

A special case of the general nonlinear optimal control problem given in the previous section is the

linear quadratic (LQ) optimal control problem

. The LQ problem is stated as follows. Minimize the quadratic continuous-time cost functional

J={\tfrac {1}{2}}\mathbf {x} ^{\mathsf {T}}(t_{f})\mathbf {S} _{f}\mathbf {x} (t_{f})+{\tfrac {1}{2}}\int _{t_{0}}^{t_{f}}[\,\mathbf {x} ^{\mathsf {T}}(t)\mathbf {Q} (t)\mathbf {x} (t)+\mathbf {u} ^{\mathsf {T}}(t)\mathbf {R} (t)\mathbf {u} (t)]\,\mathrm {d} t

Subject to the linear first-order dynamic constraints

{\dot {\mathbf {x} }}(t)=\mathbf {A} (t)\mathbf {x} (t)+\mathbf {B} (t)\mathbf {u} (t),

and the initial condition

\mathbf {x} (t_{0})=\mathbf {x} _{0}

A particular form of the LQ problem that arises in many control system problems is that of the linear quadratic regulator (LQR) where all of the matrices (i.e., $\mathbf {A}$ , $\mathbf {B}$ , $\mathbf {Q}$ , and $\mathbf {R}$ ) are constant, the initial time is arbitrarily set to zero, and the terminal time is taken in the limit $t_{f}\rightarrow \infty$ (this last assumption is what is known as infinite horizon). The LQR problem is stated as follows. Minimize the infinite horizon quadratic continuous-time cost functional

J={\tfrac {1}{2}}\int _{0}^{\infty }[\mathbf {x} ^{\mathsf {T}}(t)\mathbf {Q} \mathbf {x} (t)+\mathbf {u} ^{\mathsf {T}}(t)\mathbf {R} \mathbf {u} (t)]\,\mathrm {d} t

Subject to the linear time-invariant first-order dynamic constraints

{\dot {\mathbf {x} }}(t)=\mathbf {A} \mathbf {x} (t)+\mathbf {B} \mathbf {u} (t),

and the initial condition

\mathbf {x} (t_{0})=\mathbf {x} _{0}

In the finite-horizon case the matrices are restricted in that $\mathbf {Q}$ and $\mathbf {R}$ are positive semi-definite and positive definite, respectively. In the infinite-horizon case, however, the matrices $\mathbf {Q}$ and $\mathbf {R}$ are not only positive-semidefinite and positive-definite, respectively, but are also constant. These additional restrictions on $\mathbf {Q}$ and $\mathbf {R}$ in the infinite-horizon case are enforced to ensure that the cost functional remains positive. Furthermore, in order to ensure that the cost function is bounded, the additional restriction is imposed that the pair $(\mathbf {A} ,\mathbf {B} )$ is controllable. Note that the LQ or LQR cost functional can be thought of physically as attempting to minimize the control energy (measured as a quadratic form).

The infinite horizon problem (i.e., LQR) may seem overly restrictive and essentially useless because it assumes that the operator is driving the system to zero-state and hence driving the output of the system to zero. This is indeed correct. However the problem of driving the output to a desired nonzero level can be solved after the zero output one is. In fact, it can be proved that this secondary LQR problem can be solved in a very straightforward manner. It has been shown in classical optimal control theory that the LQ (or LQR) optimal control has the feedback form

\mathbf {u} (t)=-\mathbf {K} (t)\mathbf {x} (t)

where

\mathbf {K} (t)

is a properly dimensioned matrix, given as

\mathbf {K} (t)=\mathbf {R} ^{-1}\mathbf {B} ^{\mathsf {T}}\mathbf {S} (t),

and

\mathbf {S} (t)

is the solution of the differential Riccati equation. The differential Riccati equation is given as

{\dot {\mathbf {S} }}(t)=-\mathbf {S} (t)\mathbf {A} -\mathbf {A} ^{\mathsf {T}}\mathbf {S} (t)+\mathbf {S} (t)\mathbf {B} \mathbf {R} ^{-1}\mathbf {B} ^{\mathsf {T}}\mathbf {S} (t)-\mathbf {Q}

For the finite horizon LQ problem, the Riccati equation is integrated backward in time using the terminal boundary condition

\mathbf {S} (t_{f})=\mathbf {S} _{f}

For the infinite horizon LQR problem, the differential Riccati equation is replaced with the algebraic Riccati equation (ARE) given as

\mathbf {0} =-\mathbf {S} \mathbf {A} -\mathbf {A} ^{\mathsf {T}}\mathbf {S} +\mathbf {S} \mathbf {B} \mathbf {R} ^{-1}\mathbf {B} ^{\mathsf {T}}\mathbf {S} -\mathbf {Q}

Understanding that the ARE arises from infinite horizon problem, the matrices $\mathbf {A}$ , $\mathbf {B}$ , $\mathbf {Q}$ , and $\mathbf {R}$ are all constant. It is noted that there are in general multiple solutions to the algebraic Riccati equation and the positive definite (or positive semi-definite) solution is the one that is used to compute the feedback gain. The LQ (LQR) problem was elegantly solved by Rudolf E. Kálmán.^[9]

Numerical methods for optimal control

Optimal control problems are generally nonlinear and therefore, generally do not have analytic solutions (e.g., like the linear-quadratic optimal control problem). As a result, it is necessary to employ numerical methods to solve optimal control problems. In the early years of optimal control (c. 1950s to 1980s) the favored approach for solving optimal control problems was that of indirect methods. In an indirect method, the calculus of variations is employed to obtain the first-order optimality conditions. These conditions result in a two-point (or, in the case of a complex problem, a multi-point)

boundary-value problem. This boundary-value problem actually has a special structure because it arises from taking the derivative of a Hamiltonian. Thus, the resulting dynamical system is a Hamiltonian system of the form^[1]

{\begin{aligned}{\dot {\textbf {x}}}&={\frac {\partial H}{\partial {\boldsymbol {\lambda }}}}\\[1.2ex]{\dot {\boldsymbol {\lambda }}}&=-{\frac {\partial H}{\partial {\textbf {x}}}}\end{aligned}}

where

H=F+{\boldsymbol {\lambda }}^{\mathsf {T}}{\textbf {f}}-{\boldsymbol {\mu }}^{\mathsf {T}}{\textbf {h}}

is the augmented Hamiltonian and in an indirect method, the boundary-value problem is solved (using the appropriate boundary or transversality conditions). The beauty of using an indirect method is that the state and adjoint (i.e.,

{\boldsymbol {\lambda }}

) are solved for and the resulting solution is readily verified to be an extremal trajectory. The disadvantage of indirect methods is that the boundary-value problem is often extremely difficult to solve (particularly for problems that span large time intervals or problems with interior point constraints). A well-known software program that implements indirect methods is BNDSCO.[10]

The approach that has risen to prominence in numerical optimal control since the 1980s is that of so-called direct methods. In a direct method, the state or the control, or both, are approximated using an appropriate function approximation (e.g., polynomial approximation or piecewise constant parameterization). Simultaneously, the cost functional is approximated as a cost function. Then, the coefficients of the function approximations are treated as optimization variables and the problem is "transcribed" to a nonlinear optimization problem of the form:

Minimize

F(\mathbf {z} )

subject to the algebraic constraints

{\begin{aligned}\mathbf {g} (\mathbf {z} )&=\mathbf {0} \\\mathbf {h} (\mathbf {z} )&\leq \mathbf {0} \end{aligned}}

Depending upon the type of direct method employed, the size of the nonlinear optimization problem can be quite small (e.g., as in a direct shooting or quasilinearization method), moderate (e.g.

FORTRAN

.

Discrete-time optimal control

The examples thus far have shown

discrete time systems and solutions. The Theory of Consistent Approximations^[27]^[28] provides conditions under which solutions to a series of increasingly accurate discretized optimal control problem converge to the solution of the original, continuous-time problem. Not all discretization methods have this property, even seemingly obvious ones.^[29] For instance, using a variable step-size routine to integrate the problem's dynamic equations may generate a gradient which does not converge to zero (or point in the right direction) as the solution is approached. The direct method RIOTS

is based on the Theory of Consistent Approximation.

Examples

A common solution strategy in many optimal control problems is to solve for the costate (sometimes called the shadow price) $\lambda (t)$ . The costate summarizes in one number the marginal value of expanding or contracting the state variable next turn. The marginal value is not only the gains accruing to it next turn but associated with the duration of the program. It is nice when $\lambda (t)$ can be solved analytically, but usually, the most one can do is describe it sufficiently well that the intuition can grasp the character of the solution and an equation solver can solve numerically for the values.

Having obtained $\lambda (t)$ , the turn-t optimal value for the control can usually be solved as a differential equation conditional on knowledge of $\lambda (t)$ . Again it is infrequent, especially in continuous-time problems, that one obtains the value of the control or the state explicitly. Usually, the strategy is to solve for thresholds and regions that characterize the optimal control and use a numerical solver to isolate the actual choice values in time.

Finite time

Consider the problem of a mine owner who must decide at what rate to extract ore from their mine. They own rights to the ore from date $0$ to date $T$ . At date $0$ there is $x_{0}$ ore in the ground, and the time-dependent amount of ore $x(t)$ left in the ground declines at the rate of $u(t)$ that the mine owner extracts it. The mine owner extracts ore at cost $u(t)^{2}/x(t)$ (the cost of extraction increasing with the square of the extraction speed and the inverse of the amount of ore left) and sells ore at a constant price $p$ . Any ore left in the ground at time $T$ cannot be sold and has no value (there is no "scrap value"). The owner chooses the rate of extraction varying with time $u(t)$ to maximize profits over the period of ownership with no time discounting.

Discrete-time version

The manager maximizes profit $\Pi$ :
$\Pi =\sum _{t=0}^{T-1}\left[pu_{t}-{\frac {u_{t}^{2}}{x_{t}}}\right]$ subject to the law of motion for the state variable $x_{t}$ $x_{t+1}-x_{t}=-u_{t}$
Form the Hamiltonian and differentiate:
${\begin{aligned}H&=pu_{t}-{\frac {u_{t}^{2}}{x_{t}}}-\lambda _{t+1}u_{t}\\{\frac {\partial H}{\partial u_{t}}}&=p-\lambda _{t+1}-2{\frac {u_{t}}{x_{t}}}=0\\\lambda _{t+1}-\lambda _{t}&=-{\frac {\partial H}{\partial x_{t}}}=-\left({\frac {u_{t}}{x_{t}}}\right)^{2}\end{aligned}}$
As the mine owner does not value the ore remaining at time $T$ ,
$\lambda _{T}=0$
Using the above equations, it is easy to solve for the $x_{t}$ and $\lambda _{t}$ series
${\begin{aligned}\lambda _{t}&=\lambda _{t+1}+{\frac {\left(p-\lambda _{t+1}\right)^{2}}{4}}\\x_{t+1}&=x_{t}{\frac {2-p+\lambda _{t+1}}{2}}\end{aligned}}$
and using the initial and turn-T conditions, the $x_{t}$ series can be solved explicitly, giving $u_{t}$ .
Continuous-time version

The manager maximizes profit $\Pi$ :
$\Pi =\int _{0}^{T}\left[pu(t)-{\frac {u(t)^{2}}{x(t)}}\right]dt$ where the state variable $x(t)$ evolves as follows: ${\dot {x}}(t)=-u(t)$
Form the Hamiltonian and differentiate:
${\begin{aligned}H&=pu(t)-{\frac {u(t)^{2}}{x(t)}}-\lambda (t)u(t)\\{\frac {\partial H}{\partial u}}&=p-\lambda (t)-2{\frac {u(t)}{x(t)}}=0\\{\dot {\lambda }}(t)&=-{\frac {\partial H}{\partial x}}=-\left({\frac {u(t)}{x(t)}}\right)^{2}\end{aligned}}$
As the mine owner does not value the ore remaining at time $T$ ,
$\lambda (T)=0$
Using the above equations, it is easy to solve for the differential equations governing $u(t)$ and $\lambda (t)$
${\begin{aligned}{\dot {\lambda }}(t)&=-{\frac {(p-\lambda (t))^{2}}{4}}\\u(t)&=x(t){\frac {p-\lambda (t)}{2}}\end{aligned}}$ and using the initial and turn-T conditions, the functions can be solved to yield
$x(t)={\frac {\left(4-pt+pT\right)^{2}}{\left(4+pT\right)^{2}}}x_{0}$

References

^
OCLC 625106088
.

ISBN 0-471-02594-1
.

OCLC 869522905
.

arXiv:2005.03186 [math.OC
].

ISSN 2405-8963
.

doi:10.1016/S0377-0427(00)00418-0
.

doi:10.1109/37.506395
.

ISBN 978-0-9843571-0-9
.

^ Kalman, Rudolf. A new approach to linear filtering and prediction problems. Transactions of the ASME, Journal of Basic Engineering, 82:34–45, 1960

^ Oberle, H. J. and Grimm, W., "BNDSCO-A Program for the Numerical Solution of Optimal Control Problems," Institute for Flight Systems Dynamics, DLR, Oberpfaffenhofen, 1989

doi:10.1016/j.arcontrol.2012.09.002
.

ISBN 978-0-89871-688-7
.

^ Gill, P. E., Murray, W. M., and Saunders, M. A., User's Manual for SNOPT Version 7: Software for Large-Scale Nonlinear Programming, University of California, San Diego Report, 24 April 2007

^ von Stryk, O., User's Guide for DIRCOL (version 2.1): A Direct Collocation Method for the Numerical Solution of Optimal Control Problems, Fachgebiet Simulation und Systemoptimierung (SIM), Technische Universität Darmstadt (2000, Version of November 1999).

^ Betts, J.T. and Huffman, W. P., Sparse Optimal Control Software, SOCS, Boeing Information and Support Services, Seattle, Washington, July 1997

doi:10.2514/3.20223
.

^ Gath, P.F., Well, K.H., "Trajectory Optimization Using a Combination of Direct Multiple Shooting and Collocation", AIAA 2001–4047, AIAA Guidance, Navigation, and Control Conference, Montréal, Québec, Canada, 6–9 August 2001

^ Vasile M., Bernelli-Zazzera F., Fornasari N., Masarati P., "Design of Interplanetary and Lunar Missions Combining Low-Thrust and Gravity Assists", Final Report of the ESA/ESOC Study Contract No. 14126/00/D/CS, September 2002

^ Izzo, Dario. "PyGMO and PyKEP: open source tools for massively parallel optimization in astrodynamics (the case of interplanetary trajectory optimization)." Proceed. Fifth International Conf. Astrodynam. Tools and Techniques, ICATT. 2012.

OCLC 35140322
.

^ Ross, I. M., Enhancements to the DIDO Optimal Control Toolbox, arXiv 2020. https://arxiv.org/abs/2004.13112

^ Williams, P., User's Guide to DIRECT, Version 2.00, Melbourne, Australia, 2008

^ FALCON.m, described in Rieck, M., Bittner, M., Grüter, B., Diepolder, J., and Piprek, P., FALCON.m - User Guide, Institute of Flight System Dynamics, Technical University of Munich, October 2019

^ GPOPS Archived 24 July 2011 at the Wayback Machine, described in Rao, A. V., Benson, D. A., Huntington, G. T., Francolin, C., Darby, C. L., and Patterson, M. A., User's Manual for GPOPS: A MATLAB Package for Dynamic Optimization Using the Gauss Pseudospectral Method, University of Florida Report, August 2008.

^ Rutquist, P. and Edvall, M. M, PROPT – MATLAB Optimal Control Software," 1260 S.E. Bishop Blvd Ste E, Pullman, WA 99163, USA: Tomlab Optimization, Inc.

^ I.M. Ross, Computational Optimal Control, 3rd Workshop in Computational Issues in Nonlinear Control, October 8th, 2019, Monterey, CA

^ E. Polak, On the use of consistent approximations in the solution of semi-infinite optimization and optimal control problems Math. Prog. 62 pp. 385–415 (1993).

S2CID 7625851
.

S2CID 756939
.

Further reading

ISBN 1-886529-11-6
.

ISBN 0-470-11481-9
.

ISBN 0-387-90155-8
.

ISBN 0-444-01609-0
.

ISBN 0-13-638098-0
.

External links

Victor M. Becerra, ed. (2008). "Optimal control". Scholarpedia. Retrieved 31 December 2022.

Computational Optimal Control

Dr. Benoît CHACHUAT: Automatic Control Laboratory – Nonlinear Programming, Calculus of Variations and Optimal Control.

DIDO - MATLAB tool for optimal control

GEKKO - Python package for optimal control

GESOP – Graphical Environment for Simulation and OPtimization

GPOPS-II – General-Purpose MATLAB Optimal Control Software

CasADi – Free and open source symbolic framework for optimal control

PROPT – MATLAB Optimal Control Software

OpenOCL – Open Optimal Control Library

Elmer G. Wiens: Optimal Control – Applications of Optimal Control Theory Using the Pontryagin Maximum Principle with interactive models.

On Optimal Control by Yu-Chi Ho

Pseudospectral optimal control: Part 1

Pseudospectral optimal control: Part 2

Lecture Recordings and Script by Prof. Moritz Diehl, University of Freiburg on Numerical Optimal Control

Authority control databases: National

Czech Republic

Retrieved from "https://en.wikipedia.org/w/index.php?title=Optimal_control&oldid=1185965431"

[:0-1] 
OCLC 625106088
.

[2] ISBN 0-471-02594-1
.

[3] OCLC 869522905
.

[4] rXiv:2005.03186 [math.OC
].

[5] ISSN 2405-8963
.

[6] :10.1016/S0377-0427(00)00418-0
.

[7] :10.1109/37.506395
.

[8] ISBN 978-0-9843571-0-9
.

[9] Kalman, Rudolf. A new approach to linear filtering and prediction problems. Transactions of the ASME, Journal of Basic Engineering, 82:34–45, 1960

[10] Oberle, H. J. and Grimm, W., "BNDSCO-A Program for the Numerical Solution of Optimal Control Problems," Institute for Flight Systems Dynamics, DLR, Oberpfaffenhofen, 1989

[ReviewPOC-11] :10.1016/j.arcontrol.2012.09.002
.

[12] ISBN 978-0-89871-688-7
.

[13] Gill, P. E., Murray, W. M., and Saunders, M. A., User's Manual for SNOPT Version 7: Software for Large-Scale Nonlinear Programming, University of California, San Diego Report, 24 April 2007

[14] von Stryk, O., User's Guide for DIRCOL (version 2.1): A Direct Collocation Method for the Numerical Solution of Optimal Control Problems, Fachgebiet Simulation und Systemoptimierung (SIM), Technische Universität Darmstadt (2000, Version of November 1999).

[15] Betts, J.T. and Huffman, W. P., Sparse Optimal Control Software, SOCS, Boeing Information and Support Services, Seattle, Washington, July 1997

[16] :10.2514/3.20223
.

[17] Gath, P.F., Well, K.H., "Trajectory Optimization Using a Combination of Direct Multiple Shooting and Collocation", AIAA 2001–4047, AIAA Guidance, Navigation, and Control Conference, Montréal, Québec, Canada, 6–9 August 2001

[18] Vasile M., Bernelli-Zazzera F., Fornasari N., Masarati P., "Design of Interplanetary and Lunar Missions Combining Low-Thrust and Gravity Assists", Final Report of the ESA/ESOC Study Contract No. 14126/00/D/CS, September 2002

[19] Izzo, Dario. "PyGMO and PyKEP: open source tools for massively parallel optimization in astrodynamics (the case of interplanetary trajectory optimization)." Proceed. Fifth International Conf. Astrodynam. Tools and Techniques, ICATT. 2012.

[20] OCLC 35140322
.

[21] Ross, I. M., Enhancements to the DIDO Optimal Control Toolbox, arXiv 2020. https://arxiv.org/abs/2004.13112

[22] Williams, P., User's Guide to DIRECT, Version 2.00, Melbourne, Australia, 2008

[23] FALCON.m, described in Rieck, M., Bittner, M., Grüter, B., Diepolder, J., and Piprek, P., FALCON.m - User Guide, Institute of Flight System Dynamics, Technical University of Munich, October 2019

[24] GPOPS Archived 24 July 2011 at the Wayback Machine, described in Rao, A. V., Benson, D. A., Huntington, G. T., Francolin, C., Darby, C. L., and Patterson, M. A., User's Manual for GPOPS: A MATLAB Package for Dynamic Optimization Using the Gauss Pseudospectral Method, University of Florida Report, August 2008.

[25] Rutquist, P. and Edvall, M. M, PROPT – MATLAB Optimal Control Software," 1260 S.E. Bishop Blvd Ste E, Pullman, WA 99163, USA: Tomlab Optimization, Inc.

[26] I.M. Ross, Computational Optimal Control, 3rd Workshop in Computational Issues in Nonlinear Control, October 8th, 2019, Monterey, CA

[27] E. Polak, On the use of consistent approximations in the solution of semi-infinite optimization and optimal control problems Math. Prog. 62 pp. 385–415 (1993).

[28] S2CID 7625851
.

[29] S2CID 756939
.

[1]

[2]

[3]

[4]

[5]

[9]

[27]

[28]

[29]

General method

Linear quadratic control

Numerical methods for optimal control

Discrete-time optimal control

Examples

Finite time

See also

References

Further reading

External links