Optimal Control and Adaptive Learning for Stabilization of a Quadrotor-type Unmanned Aerial Vehicle via Approximate Dynamic Programming


  • Joelson Miller Bezerra de Souza Universidade Estadual do Maranhão
  • Patrícia Helena Moraes Rêgo Universidade Estadual do Maranhão
  • Guilherme Bonfim Sousa Universidade Estadual do Maranhão
  • Janes Valdo Rodrigues Lima Universidade Estadual do Maranhão




Optimal Control, Quadrotor, Policy Iteration, Adaptive Critic Design, Approximate Dynamic Programming


The development of an optimal controller for stabilization of a quadrotor system using an adaptive critic structure based on policy iteration schemes is proposed in this paper. This approach is inserted in the context of Approximate Dynamic Programming and it is used to solve optimal decision problems on-line, without requiring complete knowledge of the system dynamics model to be controlled. The main feature of the adaptive critic design method that allows for on-line implementation is that it solves the Bellman optimality equation in a forward-in-time fashion, whereas traditional dynamic programming requires a backward-in-time procedure. This feedback control design technique is able to tune the controller parameters on-line in the presence of variations in plant dynamics and external disturbances using data measured along the system trajectories. Computational simulation results based on a quadrotor model demonstrate the effectiveness of the proposed control scheme.


Download data is not yet available.


ALABAZARES, D. L. et al. QuadrotorUAVattitude stabilization using fuzzy robust control.Transactions of the Instituteof Measurement and Control, v. 43, n. 12, p. 2599–2614, abril 2021.

NOORDIN, A.; BASRI, M. A. M.; MOHAMED, Z. Sliding mode control for altitude and attitude stabilization ofquadrotorUAVwith external disturbance.Indonesian Journal of Electrical Engineering and Informatics (IJEEI), v. 7, n. 2, p.203–210, junho 2019.

OKYERE, E. et al.LQRcontroller design for quad-rotor helicopters.The Journal of Engineering, v. 2019, n. 17, p.4003–4007, junho 2019.

NOORMOHAMMADI-ASL, A. et al. System identification andH∞-based control of quadrotor attitude.MechanicalSystems and Signal Processing, v. 135, p. 1–16, janeiro 2020.

FARZANEH, M.; TAVAKOLPOUR-SALEH, A. Stabilization of a quadrotor system using an optimal neural networkcontroller.Journal of the Brazilian Society of Mechanical Sciences and Engineering, v. 44, p. 1–12, dezembro 2021.

NOORDIN, A. et al. AdaptivePIDcontroller using sliding mode control approaches for quadrotorUAVattitude andposition stabilization.Arabian Journal for Science and Engineering, v. 46, p. 963–981, julho 2021.

LIMA, G. V. et al. Stabilization and path tracking of a mini quadrotor helicopter: Experimental results.IEEE LatinAmerica Transactions, v. 17, n. 3, p. 485–492, março 2019.

ABDELMAKSOUD, S. I.; MAILAH, M.; ABDALLAH, A. M. Control strategies and novel techniques for autonomousrotorcraft unmanned aerial vehicles: A review.IEEE Access, v. 8, p. 195142–195169, outubro 2020.


QUAN, Q.Introduction to Multicopter Design and Control. 1. ed. Singapore: Springer, 2017.

TALEB, A. Y. A.; EL-SAYED, H. S.; YASSIN, H. A. Analysis of a practical quad copter robot using linear quadraticregulator controller.Engineering Research Journal (ERJ), v. 39, n. 2, p. 89–98, abril 2016.

SAFAEI, A.; MAHYUDDIN, M. N. Lyapunov-based nonlinear controller for quadrotor position and attitude tracking withGAoptimization. In:2016 IEEE Industrial Electronics and Applications Conference (IEACon). Kota Kinabalu, Sabá, Malásia:IEEE, 2016. p. 342–347.

CURI, S.; MAS, I.; PENA, R. S. Autonomous flight of a commercial quadrotor.IEEE Latin America Transactions, v. 12,n. 5, p. 853–858, agosto 2014.

JAGGI, N.; MUKHERJEE, K.; KHANRA, M. Design and test of a controller for thePLUTO1.2 quadcopter.IFAC-PapersOnLine, v. 55, n. 1, p. 752–757, fevereiro 2022.

LOZANO, Y.; GUTIéRREZ, O. Design and control of a four-rotary-wing aircraft.IEEE Latin America Transactions,v. 14, n. 11, p. 4433–4438, novembro 2016.

AHMAD, F. et al. Simulation of the quadcopter dynamics withLQRbased control.Materials Today: Proceedings, v. 24,p. 326–332, 2020.

THU, K. M.; GAVRILOV, A. Designing and modeling of quadcopter control system usingL1adaptive control.ProcediaComputer Science, v. 103, p. 528–535, março 2017.

PéREZ, R. I. et al. Attitude control of a quadcopter using adaptive control technique. In:Adaptive Robust Control Systems.[S.l.]: IntechOpen, 2017. cap. 6. DOI: <10.5772/intechopen.71382>.

ROTHE, J. et al. A modified model reference adaptive controller (m-mrac) using an updated mit-rule for the altitude of auav.Electronics, v. 9, p. 1–15, julho 2020.

LI, C.; WANG, Y.; YANG, X. Adaptive fuzzy control of a quadrotor using disturbance observer.Aerospace Science andTechnology, v. 128, p. 1–11, julho 2022.

JIN, X.-Z. et al. Robust adaptive neural network-based compensation control of a class of quadrotor aircrafts.Journal ofthe Franklin Institute, v. 357, n. 17, p. 12241–12263, novembro 2020.

HUANG, J.-W. et al. Demonstration of a model-free backstepping control on a 2-DOFlaboratory helicopter.InternationalJournal of Dynamics and Control, v. 9, p. 97–108, junho 2020.

ALMAKHLES, D. J. Robust backstepping sliding mode control for a quadrotor trajectory tracking application.IEEEAccess, v. 8, p. 5515–5525, janeiro 2020.

HE, D. et al. Real-time visual feedback control of multi-cameraUAV.Journal of Robotics and Mechatronics, v. 33, n. 2, p.263–273, abril 2021.

URBA ́nSKI, K. Low altitude control for quadcopter using visual feedback.Archives of Electrical Engineering, v. 70, n. 4,p. 845–858, maio 2021.

DESHPANDE, A. M.; MINAI, A. A.; KUMAR, M. Robust deep reinforcement learning for quadcopter control.IFAC-PapersOnLine, v. 54, n. 20, p. 90–95, outubro 2021.

PI, C.-H. et al. Low-level autonomous control and tracking of quadrotor using reinforcement learning.Control EngineeringPractice, v. 95, p. 1–11, fevereiro 2020.

KOCH, W. et al. Reinforcement learning forUAVattitude control.ACM Trans. Cyber-Phys. Syst., Association forComputing Machinery, v. 3, n. 2, p. 1–21, fevereiro 2019.

STINGU, E.; LEWIS, F. L. An approximate dynamic programming based controller for an underactuated 6DOF quadrotor.2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), p. 271–278, 2011.

DEEPAK, B.; SINGH, P. A survey on design and development of an unmanned aerial vehicle (quadcopter).InternationalJournal of Intelligent Unmanned Systems, v. 4, p. 70–106, abril 2016.

BERTSEKAS, D. P.Dynamic Programming and Optimal Control. 4. ed. Cambridge: Athena Scientific, 2012. II.

WERBOS, P. Reinforcement learning and approximate dynamic programming (RLADP)-foundations, commonmisconceptions, and the challenges ahead. In:. [S.l.]: Wiley-IEEE Press, 2013. cap. 1, p. 1–30.

LEWIS, F. L.; VRABIE, D.; VAMVOUDAKIS, K. G. Reinforcement learning and feedback control: Using naturaldecision methods to design optimal adaptive controllers.IEEE Control Systems Magazine, v. 32, n. 6, p. 76–105, dezembro2012.

KIUMARSI, B. et al. Optimal tracking control of unknown discrete-time linear systems using input-output measured data.IEEE Transactions on Cybernetics, v. 45, n. 12, p. 2770–2779, dezembro 2015.

VRABIE, D.; VAMVOUDAKIS, K. G.; LEWIS, F. L.Optimal Adaptive Control and Differential Games by ReinforcementLearning Principles. London: The Institution of Engineering and Technology, 2013.

KHAN, S. G. et al. Reinforcement learning and optimal adaptive control: An overview and implementation examples.Annual Reviews in Control, v. 36, n. 1, p. 42–59, abril 2012.

LIU, D. et al. Adaptive dynamic programming for control: A survey and recent advances.IEEE Transactions on Systems,Man, and Cybernetics: Systems, v. 51, n. 1, p. 142–160, janeiro 2021.

FU, Z. et al. Online adaptive optimal control of vehicle active suspension systems using single-network approximatedynamic programming.Mathematical Problems in Engineering, v. 2017, p. 1–9, abril 2017.

MAITI, R.; SHARMA, K. D.; SARKAR, G.PSO based parameter estimation and PID controller tuning for 2-DOF nonlinear twin rotor MIMO system.International Journal of Automation and Control, v. 12, p. 582–609, janeiro 2018.

CHOUDHARY, S. K. Optimal feedback control of twin rotorMIMOsystem with a prescribed degree of stability.International Journal of Intelligent Unmanned Systems, v. 4, p. 226–238, outubro 2016.

SáNCHEZ, O. S. et al. Optimized discrete control law for quadrotor stabilization: Experimental results.Journal ofIntelligent and Robotic Systems, v. 84, p. 67–81, dezembro 2016.

KALLIES, C.; IBRAHIM, M.; FINDEISEN, R. Approximated constrained optimal control subject to variable parameters.IFAC-PapersOnLine, v. 53, n. 2, p. 9310–9315, 2020.

GOVINDHASAMY, J. et al. Reinforcement learning for online control and optimisation.IEE Control Engineering BookSeries, Institution of Engineering and Technology, v. 70, n. 9, p. 293–326, 2005.

AL-DABOONI, S.; WUNSCH, D. The boundedness conditions for model-freeHDP(λ).IEEE Transactions on NeuralNetworks and Learning Systems, v. 30, n. 7, p. 1928–1942, julho 2019.

GUO, W. et al. Online adaptation of controller parameters based on approximate dynamic programming. In:2014International Joint Conference on Neural Networks (IJCNN). Pequim, China: IEEE, 2014. p. 256–262.

VARGAS, F. J. T.; PAGLIONE, P.Ferramentas de Álgebra Computacional: Aplicações em Modelagem, Simulação eControle para Engenharia. 1. ed. São Paulo: LTC, 2015.

STENGEL, R. F.Flight Dynamics. Princeton: Princeton University Press, 2004.

BEARD, R. W.; BEARD, T. W. M.Small unmanned aircraft: theory and practice. 2. ed. Princeton: Princeton UniversityPress, 2012.

LEWIS, F. L.; VRABIE, D. Reinforcement learning and adaptive dynamic programming for feedback control.IEEECircuits and Systems Magazine, v. 9, n. 3, p. 32–50, agosto 2009.

BUSONIU, L. et al.Reinforcement Learning and Dynamic Programming Using Function Approximators. 1st. ed. BocaRaton, Florida, USA: CRC Press, Inc., 2010.

LEWIS, F. L.; YESILDIRAK, A.; JAGANNATHAN, S.Neural Network Control of Robot Manipulators and Nonlinear Systems. Philadelphia, USA: Taylor and amp; Francis, Inc., 1998.

WANG, C. et al. Optimal critic learning for robot control in time-varying environments.IEEE Transactions on NeuralNetworks and Learning Systems, v. 26, n. 10, p. 2301–2310, outubro 2015.

RêGO, P.; FONSECANETO, J.; FERREIRA, E. Convergence of the standardRLSmethod andUDU⊺factorisation ofcovariance matrix for solving the algebraic riccati equation of theDLQRvia heuristic approximate dynamic programming.International Journal of Systems Science, v. 46, p. 1–23, dezembro 2013.

AL-TAMIMI, A.; ABU-KHALAF, M.; LEWIS, F. L. Adaptive critic designs for discrete-time zero-sum games with application to H∞control.IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), v. 37, n. 1, p. 240–247,fevereiro 2007.

IBARZ, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned.The InternationalJournal of Robotics Research, v. 40, n. 4-5, p. 698–721, janeiro 2021.

ZHAO, W.; QUERALTA, J. P.; WESTERLUND, T. Sim-to-real transfer in deep reinforcement learning for robotics: asurvey. In:2020 IEEE Symposium Series on Computational Intelligence (SSCI). Canberra, ACT, Austrália: IEEE, 2020. p.737–744.

DORF, R.; BISHOP, R.Modern Control Systems. 13. ed. Hoboken, New Jersey, USA: Pearson, 2017.

FERREIRA, E. F.; RêGO, P. H.; NETO, J. V. Numerical stability improvements of state-value function approximationsbased on RLS learning for online HDP-DLQR control system design.Engineering Applications of Artificial Intelligence,v. 63, p. 1–19, agosto 2017.

ARAAR, O.; AOUF, N. Full linear control of a quadrotorUAV, LQvsH∞. In:2014 UKACC International Conference onControl (CONTROL). Loughborough, Reino Unido: IEEE, 2014. p. 133–138.

BRYSON, A.; HO, Y.Applied Optimal Control: Optimization, Estimation, and Control. New York: Taylor & Francis Group, 1975.

NETO, J.; RêGO, P.QR-tuning and approximate-ls solutions of theHJBequation for onlineDLQRdesign via state andaction-dependent heuristic dynamic programming.International Journal of Innovative Computing, Information and Control,v. 10, p. 1071–1094, janeiro 2014.

JOELIANTO, E.; CHRISTIAN, D.; SAMSI, A. Swarm control of an unmanned quadrotor model withLQRweightingmatrix optimization using genetic algorithm.Journal of Mechatronics, Electrical Power, and Vehicular Technology, v. 11, p.1–10, julho 2020.

ASSAHUBULKAHFI, M. et al.LQRtuning by particle swarm optimization of full car suspension system.InternationalJournal of Engineering and Technology, v. 7, p. 328–331, maio 2018.

KUANTAMA, E.; TARCA, I.; TARCA, R. Feedback linearizationLQRcontrol for quadcopter position tracking. In:2018 5th International Conference on Control, Decision and Information Technologies (CoDIT). Thessaloniki, Grécia: IEEE, 2018.p. 204–209.

ÅSTRÖM, K.; WITTENMARK, B.Adaptive Control. 2. ed. Lund, Sweden: Addison-Wesley, 1995.

SUN, X. et al. Adaptive forgetting factor recursive least square algorithm for online identification of equivalent circuitmodel parameters of a lithium-ion battery.Energies, v. 12, p. 1–15, junho 2019.




How to Cite

de Souza, J. M. B., Rêgo, P. H. M., Sousa, G. B., & Lima, J. V. R. (2022). Optimal Control and Adaptive Learning for Stabilization of a Quadrotor-type Unmanned Aerial Vehicle via Approximate Dynamic Programming. Revista De Informática Teórica E Aplicada, 29(3), 21–35. https://doi.org/10.22456/2175-2745.121388



Regular Papers