Detection and Pose Adjustment in Physical Exercises using Computer Vision Techniques: Approaches, Challenges and Opportunities
DOI:
https://doi.org/10.22456/2175-2745.135436Keywords:
Body Detection, Pose Estimation, Convolutional Neural Network, Algorithms, SurveyAbstract
In the fast-paced rhythm of modern life, the regular practice of physical exercise emerges as a source of vitality and well-being. However, it is not always feasible to practice on a gym and/or pay a personal trainer to assure a correct pose on calisthenic exercises. This article aims to investigate technological and scientific advancements in the field of computer vision to enhance exercise detection and execution, with the goal of creating a system that can be used autonomously by users. To achieve this, an introduction to the fundamental concepts inherent to the subject is provided, followed by an analysis of different technological approaches, identifying their strengths and weaknesses. Finally, the various challenges and limitations of these technologies are discussed, which remain open to be solved and explored in future research endeavors.
Downloads
References
4 Reasons Personal Trainer Software is important to your growth - Clubworx. Acedido em 22 Agosto 2023. Disponível em: ⟨https://www.clubworx.com/blog/4-reasons -personal-trainer-software-is-important-to-your growth⟩.
PIOTROWSKI, D.; PIOTROWSKA, A. I. Operation of gyms and fitness clubs during the covid-19 pandemic- financial, legal, and organisational conditions. Journal of Physical Education and Sport ®(JPES), v. 21, p. 1021–1028, 2021.
HOOSHYAR, H. et al. Impact in software engineering activities after one year of covid-19 restrictions for startups and established companies. IEEE Access, Institute of Electrical and Electronics Engineers Inc., v. 11, p. 55178–55203, 2023. ISSN 21693536.
WHY use an App for Personal Trainers? — FitSW. Acedido em 23 Agosto 2023. Disponível em: ⟨https://www.fitsw.com/whyFitnessSoftware/⟩.
SHU, Z.; WANG, P.; ZHAN, W. The research and implementation of human posture recognition algorithm via openpose. Proceedings - 2020 2nd International Conference on Artificial Intelligence and Advanced Manufacture, AIAM 2020, Institute of Electrical and Electronics Engineers Inc., p. 90–94, 10 2020.
MENOLOTTO, M. et al. Motion capture technology in industrial applications: A systematic review. Sensors 2020, Vol. 20, Page 5687, Multidisciplinary Digital Publishing Institute, v. 20, p. 5687, 10 2020. ISSN 1424-8220.
TYPES of Motion Trackers And How To Use Them. Acedido em 14 Junho 2023. Disponível em: ⟨https://www.rokoko.com/insights/types-of-motion-trackers⟩.
WHAT is Deep Learning? — IBM. Acedido em 14 Junho 2023. Disponível em: ⟨https://www.ibm.com/topics/deep-learning⟩.
DEEP Learning what it is and why it is key to artificial intelligence - Iberdrola. Acedido em 14 Julho 2023. Disponível em: ⟨https://www.iberdrola.com/innovation/deep-learning⟩.
WHAT Is Deep Learning? — How It Works, Techniques Applications - MATLAB Simulink. Acedido em 14 Julho 2023. Disponível em: ⟨https://www.mathworks.com/discovery/deep-learning.html⟩.
YAMASHITA, R. et al. Convolutional neural networks: an overview and application in radiology. Insights into Imaging, Springer Verlag, v. 9, p. 611–629, 8 2018. ISSN 18694101.
A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way — by Sumit Saha — Towards Data Science. Acedido em 14 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53⟩.
CONVOLUTIONAL Neural Networks (CNN): Step 3 - Flattening - Blogs - SuperDataScience — Machine Learning — AI — Data Science Career — Analytics — Success. Acedido em 3 Agosto 2023. Disponível em: ⟨https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-step-3-flattening⟩.
INTRODUCTION to Convolution Neural Network - GeeksforGeeks. Acedido em 30 Julho 2023. Disponível em: ⟨https://www.geeksforgeeks.org/introduction-convolution-neural-network/⟩.
CONVOLUTIONAL Neural Networks (CNN) — Architecture Explained — by Dharmaraj — Medium. Acedido em 26 Julho 2023. Disponível em: ⟨https://medium.com/@draj0718/convolutional-neural-networks-cnn-architectures-explained-716fb197b243⟩.
BASIC CNN Architecture: Explaining 5 Layers of Convolutional Neural Network — upGrad blog. Acedido em 26 Julho 2023. Disponível em: ⟨https://www.upgrad.com/blog/basic-cnn-architecture/⟩.
SOVIANY, P.; IONESCU, R. T. Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. Proceedings - 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2018, Institute of Electrical and Electronics Engineers Inc., p. 209–214, 9 2018.
AN Overview of One-Stage Object Detection Models — Papers With Code. Acedido em 23 Julho 2023. Disponível em: ⟨https://paperswithcode.com/methods/category/one-stage-object-detection-models⟩.
AN overview of object detection: one-stage methods. Acedido em 23 Julho 2023. Disponível em: ⟨https://www.jeremyjordan.me/object-detection-one-stage/⟩.
BIBLIOMETRIC Analysis of One-stage and Two-stage Object Detection. Acedido em 26 Julho 2023. Disponível em: ⟨https://www.researchgate.net/publication/349297260 Bibliometric Analysis of One-stage and Two-stage Object Detection⟩.
WHAT is Two-stage detector. Acedido em 23 Julho 2023. Disponível em: ⟨https://www.tasq.ai/glossary/two-stage-detector/⟩.
HSU, W. W. et al. Two-stage cascaded cnn model for 3d mitochondria em segmentation. IST 2022 - IEEE International Conference on Imaging Systems and Techniques, Proceedings, Institute of Electrical and Electronics Engineers Inc., 6 2022.
(PDF) What Do We Understand About Convolutional Networks? Acedido em 26 Julho 2023. Disponível em: ⟨https://www.researchgate.net/publication/324005705 What Do We Understand About Convolutional Networks⟩.
SHELHAMER, E.; LONG, J.; DARRELL, T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Computer Society, v. 39, p. 640–651, 11 2014. ISSN 01628828. Disponível em: ⟨https://arxiv.org/abs/1411.4038v2⟩.
AMATO, A. et al. Background subtraction technique based on chromaticity and intensity patterns. Proceedings - International Conference on Pattern Recognition, Institutee of Electrical and Electronics Engineers Inc., 2008. ISSN 10514651.
THE difference between the CNN and FCN (the transforming of fully... — Download Scientific Diagram. Acedido em 23 Dezembro 2022. Disponível em: ⟨https://www.researchgate.net/figure/The-difference-between-the-CNN-and-FCN-the-transforming-of-fully-connected-layers-into fig15 341403564⟩.
REVIEW: FCN — Fully Convolutional Network (Semantic Segmentation) — by Sik-Ho Tsang — Towards Data Science. Acedido em 22 Dezembro 2022. Disponível em: ⟨https://towardsdatascience.com/review-fcn-semantic-segmentation-eb8c9b50d2d1⟩.
PIRAMANAYAGAM, S. et al. Supervised classification of multisensor remotely sensed images using a deep learning framework. Remote Sensing, MDPI AG, v. 10, 9 2018. ISSN 20724292.
REDMON, J.; FARHADI, A. Yolov3: An incremental improvement. Association for Computing Machinery, Inc, 4 2018.
YOLOV3: Real-Time Object Detection Algorithm (Guide) - viso.ai. Acedido em 2 Janeiro 2023. Disponível em: ⟨https://viso.ai/deep-learning/yolov3-overview/⟩.
YOLO: Real-Time Object Detection. Acedido em 2 Janeiro 2023. Disponível em: ⟨https://pjreddie.com/darknet/yolo/⟩.
MACHINE Learning with ML.NET - Object detection with YOLO. Acedido em 5 Janeiro 2023. Disponível em: ⟨https://rubikscode.net/2021/04/05/machine-learning-with-ml-net-object-detection-with-yolo/⟩.
YANG, F. et al. Make skeleton-based action recognition model smaller, faster and better. 7 2019.
CAO, Z. et al. Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Computer Society, v. 43, p. 172–186, 1 2021. ISSN 19393539.
THE Complete Guide to OpenPose in 2023 - viso.ai. Acedido em 23 Julho 2023. Disponível em: ⟨https://viso.ai/deep-learning/openpose/⟩.
OPENPOSE : Human Pose Estimation Method - GeeksforGeeks. Acedido em 23 Julho 2023. Disponível em: ⟨https://www.geeksforgeeks.org/openpose-human-pose-est imation-method/⟩.
OPENPOSE Research Paper Summary: Multi-Person 2D Pose Estimation with Deep Learning — by Chonyy — Towards Data Science. Acedido em 23 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/openpose-research-paper-summary-realtime-multi-person-2d-pose-estimation-3563a4d7e66⟩.
MULTI Person Pose Estimation in OpenCV using OpenPose. Acedido em 23 Julho 2023. Disponível em: ⟨https://learnopencv.com/multi-person-pose-estimation-in-opencv-using-openpose/⟩.
CARREIRA, J. et al. Quo vadis, action recognition? a new model and the kinetics dataset. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Institute of Electrical and Electronics Engineers Inc., v. 2017-January, p. 4724–4733, 5 2017.
GOWADA, R.; PAWAR, D.; BARMAN, B. Unethical human action recognition using deep learning based hybrid model for video forensics. Multimedia Tools and Applications, Springer, 2023. ISSN 15737721.
SUPPORT Vector Machine — Introduction to Machine Learning Algorithms — by Rohith Gandhi — Towards Data Science. Acedido em 23 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47⟩.
YUAN, X.; YANG, X. A robust human action recognition system using single camera. Proceedings - 2009 International Conference on Computational Intelligence and Software Engineering, CiSE 2009, 2009.
SVM: Feature Selection and Kernels — by Pier Paolo Ippolito — Towards Data Science. Acedido em 23 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/svm-feature-selection-and-kernels-840781cc1a6c⟩.
LUGARESI, C. et al. MediaPipe: A Framework for Perceiving and Processing Reality. 2019.
MEDIAPIPE: Google’s Open Source Framework for ML solutions (2023 Guide) - viso.ai. Acedido em 23 Julho 2023. Disponível em: ⟨https://viso.ai/computer-vision/mediapipe/⟩.
INTRODUCTION to MediaPipe — LearnOpenCV. Acedido em 23 Julho 2023. Disponível em: ⟨https://learnopencv.com/introduction-to-mediapipe/⟩.
UNDERSTANDING Depthwise Separable Convolutions and the efficiency of MobileNets — by Arjun Sarkar — Towards Data Science. Acedido em 23 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/understanding-depthwise-separable-convolutions-and-the-efficiency-of-mobilenets-6de3d6b62503⟩.
AN Overview on MobileNet: An Efficient Mobile Vision CNN — by Srudeep PA — Medium. Acedido em 23 Julho 2023. Disponível em: ⟨https://medium.com/@godeep48/an-overview-on-mobilenet-an-efficient-mobile-vision-cnn-f301141db94d⟩.
REVIEW: MobileNetV1 — Depthwise Separable Convolution (Light Weight Model) — by Sik-Ho Tsang — Towards Data Science. Acedido em 23 Julho 2023. Disponível em: ⟨https://towardsdatascience.com/review-mobilenetv1-depthwise-separable-convolution-light-weight-model-a382df364b69⟩.
WANG, W. et al. A novel image classification approach via dense-mobilenet models. Mobile Information Systems, Hindawi Limited, v. 2020, 2020. ISSN 1875905X.
(PDF) Convolutional networks for real-time 6-DOF camera relocalization. Acedido em 23 Julho 2023. Disponível em: ⟨https://www.researchgate.net/publication/277334078 Convolutional networks for real-time 6-DOF camera relocalization⟩.
POSE Estimation: The What, Why, When, How and more. Acedido em 27 Julho 2023. Disponível em: ⟨https://topflightapps.com/ideas/pose-estimation/⟩.
POSENET Pose Estimation - GeeksforGeeks. Acedido em 24 Julho 2023. Disponível em: ⟨https://www.geeksforgeeks.org/posenet-pose-estimation/⟩.
POSTURE Detection using PoseNet with Real-time Deep Learning project. Acedido em 24 Julho 2023. Disponível em: ⟨https://www.analyticsvidhya.com/blog/2021/09/posture-detection-using-posenet-with-real-time-deep-learning-project/⟩.
PENG, Y. El net: Ensemble learning in end-to-end learning. Journal of Physics: Conference Series, IOP Publishing, v. 1634, p. 12029, 2020.
BESL, P. J.; MCKAY, N. D. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 14, p. 239–256, 1992. ISSN 01628828.
12.2: The Iterative Closest Point (ICP) Algorithm - Engineering LibreTexts. Acedido em 11 Junho 2023. Disponível em: ⟨https://eng.libretexts.org/Bookshelves/Mechanical Engineering/Introduction to Autonomous Robots (Correll)/12%3A RGB-D SLAM/12.02%3A The Iterative Closest Point (ICP) Algorithm⟩.
OPENCV: Introduction. Acedido em 23 Julho 2023. Disponível em: ⟨https://docs.opencv.org/4.x/d1/dfb/intro.html⟩.
(PDF) OpenCV for Computer Vision Applications. Acedido em 23 Julho 2023. Disponível em: ⟨https://www.researchgate.net/publication/301590571 OpenCV for Computer Vision Applications⟩.
COMPUTER Vision Fundamentals and OpenCV Overview — by Kerem Kargın — MLearning.ai — Medium. Acedido em 23 Julho 2023. Disponível em: ⟨https://medium.com/mlearning-ai/computer-vision-fundamentals-and-opencv-overview-9a30fe94f0ce⟩.
KOUGIANOS, E. et al. Design of a high-performance system for secure image communication in the internet of things. IEEE Access, Institute of Electrical and Electronics Engineers Inc., v. 4, p. 1222–1242, 2016. ISSN 21693536.
ABADI, M. et al. Tensorflow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, USENIX Association, p. 265–283, 5 2016.
WHAT is TensorFlow, and how does it work? – Towards AI. Acedido em 25 Julho 2023. Disponível em: ⟨https://towardsai.net/p/l/what-is-tensorflow-and-how-does-it-work⟩.
WHAT is TensorFlow? The machine learning library explained — InfoWorld. Acedido em 25 Julho 2023. Disponível em: ⟨https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html⟩.
CHEN, X. et al. Real-time human action recognition based on person detection. 2019 IEEE International Conference on Real-Time Computing and Robotics, RCAR 2019, Institute of Electrical and Electronics Engineers Inc., v. 2019-August, p. 225–230, 8 2019.
YU, T. et al. Towards robust and accurate single-view fast human motion capture. IEEE Access, Institute of Electrical and Electronics Engineers Inc., v. 7, p. 85548–85559, 2019. ISSN 21693536.
CHOI, B.; AN, W.; KANG, H. Human action recognition method using yolo and openpose. International Conference on ICT Convergence, IEEE Computer Society, v. 2022-October, p. 1786–1788, 2022. ISSN 21621241.
PHAM, Q. T. et al. Automatic recognition and assessment of physical exercises from rgb images. ICCE 2022 - 2022 IEEE 9th International Conference on Communications and Electronics, Institute of Electrical and Electronics Engineers Inc., p. 349–354, 2022.
YAN, H. et al. Real-time continuous human rehabilitation action recognition using openpose and fcn. Proceedings - 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020, Institute of Electrical and Electronics Engineers Inc., p. 239–242, 4 2020.
JEON, H.; KIM, D.; KIM, J. Human motion assessment on mobile devices. International Conference on ICT Convergence, IEEE Computer Society, v. 2021-October, p. 1655–1658, 2021. ISSN 21621241.
BORZA, D. L. et al. Teacher or supervisor? effective online knowledge distillation via guided collaborative learning. Computer Vision and Image Understanding, Academic Press, v. 228, p. 103632, 2 2023. ISSN 1077-3142.
YAMAO, K.; KUBOTA, R. Development of human pose recognition system by using raspberry pi and posenet model. Proceedings of ISCIT 2021: 2021 20th International Symposium on Communications and Information Technologies: Quest for Quality of Life and Smart City, Institute of Electrical and Electronics Engineers Inc., p. 41–44, 10 2021.
BHAMIDIPATI, V. S. P. et al. Robust intelligent posture estimation for an ai gym trainer using mediapipe and opencv. Proceedings of the 1st IEEE International Conference on Networking and Communications 2023, ICNWC 2023, Institute of Electrical and Electronics Engineers Inc., 2023.
JIANG, Y. et al. Rgbd-based real-time 3d human pose estimation for fitness assessment. Proceedings - 2020 3rd World Conference on Mechanical Engineering and Intelligent Manufacturing, WCMEIM 2020, Institute of Electrical and Electronics Engineers Inc., p. 103–108, 12 2020.
(PDF) Multi-Scale Context Aggregation by Dilated Convolutions. Acedido em 12 Junho 2023. Disponível em: ⟨https://www.researchgate.net/publication/302305068 Multi-Scale Context Aggregation by Dilated Convolutions⟩.
RAO, A. Efficient min-cost real time action recognition using pose estimates. 2020 IEEE International Conference for Innovation in Technology, INOCON 2020, Institute of Electrical and Electronics Engineers Inc., 11 2020.
LIN, W.; DING, J. Behavior detection method of openpose combined with yolo network. Proceedings - 2020 International Conference on Communications, Information System and Computer Engineering, CISCE 2020, Institute of Electrical and Electronics Engineers Inc., p. 326–330, 7 2020.
SHAHROUDY, A. et al. Ntu rgb+d: A large scale dataset for 3d human activity analysis.
LIU, J. et al. Ntu rgb+d 120: A large-scale benchmark for 3d human activity understanding.
COCO - Common Objects in Context. Acedido em 15 Junho 2023. Disponível em: ⟨https://cocodataset.org/#home⟩.
PENG, Y.; ZHAO, Y.; ZHANG, J. Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Transactions on Circuits and Systems for Video Technology, Institute of Electrical and Electronics Engineers Inc., v. 29, p. 773–786, 11 2017. ISSN 10518215.
WANG, L.; QIAO, Y.; TANG, X. Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, v. 07-12-June-2015, p. 4305–4314, 5 2015.
DATASETS. Acedido em 15 Junho 2023. Disponível em: ⟨http://wangjiangb.github.io/my data.html⟩.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Paulo Alexandre Neves, Prof., João Palhares, João Gonçalves

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Autorizo aos editores a publicação de meu artigo, caso seja aceito, em meio eletrônico de acordo com as regras do Public Knowledge Project.