A Multimodal Decision-Fusion Network Approach for Activity Recognition in Firefighter Self-Contained Breathing Apparatus Endurance Training

Xiaoqing Chai, Junhang Yu, Boon Giin Lee, Matthew Pike, Lionel Nkenyereye, Wan-Young Chung

IEEE Sensors Letters, September 2025

human activity recognition, multimodal fusion, decision fusion, firefighting training, wearable sensors, video classification

Abstract

Insufficient training in firefighting techniques increases the risk of injuries and fatalities among firefighters. Human activity recognition methods show promising potential for performance monitoring and evaluation. However, the present studies focus mainly on individual modalities. This limited scope presents challenges in effectively distinguishing between intricate tasks, such as those encountered in firefighting operations. This study introduces an innovative multimodal decision–fusion network designed to overcome this limitation. This is achieved by integrating vision data sourced from three distinct cameras and sensor data collected from four wearable devices. The proposed network combines a vision-focused Video Swin network with a sensor-driven Sensor Transformer network where the results show that the use of only vision-based methods is insufficient to accurately classify firefighting training activities. The proposed decision–fusion network improves classification with a mean F1-score of 95.73%, outperformed the existing hybrid machine learning network.