loader
Generating audio...

arxiv

Paper 2503.05629

Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study

Authors: Ali Samimi Fard, Mohammadreza Mashhadigholamali, Samaneh Zolfaghari, Hajar Abedi, Mainak Chakraborty, Luigi Borzì, Masoud Daneshtalab, George Shaker

Published: 2025-03-07

Abstract:

Human Activity Recognition has gained significant attention due to its diverse applications, including ambient assisted living and remote sensing. Wearable sensor-based solutions often suffer from user discomfort and reliability issues, while video-based methods raise privacy concerns and perform poorly in low-light conditions or long ranges. This study introduces a Frequency-Modulated Continuous Wave radar-based framework for human activity recognition, leveraging a 60 GHz radar and multi-dimensional feature maps. Unlike conventional approaches that process feature maps as images, this study feeds multi-dimensional feature maps -- Range-Doppler, Range-Azimuth, and Range-Elevation -- as data vectors directly into the machine learning (SVM, MLP) and deep learning (CNN, LSTM, ConvLSTM) models, preserving the spatial and temporal structures of the data. These features were extracted from a novel dataset with seven activity classes and validated using two different validation approaches. The ConvLSTM model outperformed conventional machine learning and deep learning models, achieving an accuracy of 90.51% and an F1-score of 87.31% on cross-scene validation and an accuracy of 89.56% and an F1-score of 87.15% on leave-one-person-out cross-validation. The results highlight the approach's potential for scalable, non-intrusive, and privacy-preserving activity monitoring in real-world scenarios.

Paper Content:
Page 1: arXiv:2503.05629v1 [cs.ET] 7 Mar 2025This work has been submitted to the IEEE for possible publica tion. Copyright may be transferred without notice, after wh ich this version may no longer be accessible. 1 Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study Ali Samimi Fard, Graduate Student Member, IEEE , Mohammadreza Mashhadigholamali, Graduate Student Member, IEEE , Samaneh Zolfaghari*, Hajar Abedi, Member, IEEE , Mainak Chakraborty, Senior Member, IEEE , Luigi Borz` ı, Member, IEEE , Masoud Daneshtalab, Senior Member, IEEE , George Shaker, Senior Member, IEEE Abstract —Human Activity Recognition has gained significant attention due to its diverse applications, including ambie nt as- sisted living and remote sensing. Wearable sensor-based so lutions often suffer from user discomfort and reliability issues, w hile video-based methods raise privacy concerns and perform poo rly in low-light conditions or long ranges. This study introduc es a Frequency-Modulated Continuous Wave radar-based frame- work for human activity recognition, leveraging a 60GHz radar and multi-dimensional feature maps. Unlike conventi onal approaches that process feature maps as images, this study feeds multi-dimensional feature maps—Range-Doppler, Ran ge- Azimuth, and Range-Elevation —as data vectors directly int o the machine learning (SVM, MLP) and deep learning (CNN, LSTM, ConvLSTM) models, preserving the spatial and tempora l structures of the data. These features were extracted from a novel dataset with seven activity classes and validated usi ng two different validation approaches. The ConvLSTM model outperformed conventional machine learning and deep learn ing models, achieving an accuracy of 90.51% and anF1-score of 87.31% on cross-scene validation, and an accuracy of 89.56% and anF1-score of 87.15% on leave-one-person-out cross-validation. The results highlight the approach’s potential for scalabl e, non- intrusive, and privacy-preserving activity monitoring in real- world scenarios. Index Terms —Human Activity Recognition, Frequency- Modulated Continuous Wave Radar, Deep Learning, Ambient Assisted Living, Remote Sensing. I. I NTRODUCTION HUMAN Activity Recognition (HAR) has become essen- tial for applications in smart homes, healthcare mon- itoring, gait analysis, and fall detection [1]–[3]. The ris ing elderly population has heightened the need for reliable act ivity monitoring to improve quality of life and provide timely eme r- gency responses [2]. Recent advances in Machine Learning (ML), Deep Learning (DL), and low-cost sensors have made HAR technologies more accessible, enabling continuous, no n- intrusive monitoring and quick interventions during emerg en- Ali Samimi Fard, Mohammadreza Mashhadigholamali and Luigi Borz` ı are with the Department of Control and Computer Engineering , Poly- technic University of Turin, Turin, Italy ( Email:{ali.samimifard, moham- madreza.mashhadigholami }@studenti.polito.it, luigi.borzi@polito.it ) Samaneh Zolfaghari (Corresponding author), Mainak Chakra borty and Masoud Daneshtalab are with School of Innovation, Design an d Engi- neering, Division of Intelligent Future Technologies, M¨ a lardalen University, V¨ aster˚ as, Sweden ( Email:{samaneh.zolfaghari, mainak.chakraborty, ma- soud.daneshtalab }@mdu.se ) Hajar Abedi and George Shaker are with Department of Electri cal and Computer Engineering, University of Waterloo, Waterloo, O ntario, Canada (Email:{habedifi, gshaker }@uwaterloo.ca )cies [4], [5]. When it comes to sensors, they can be divided into two main categories: wearable and non-wearable [6]. Despite their widespread adoption, wearable sensors come with several challenges. They can cause discomfort for user s, rely heavily on battery life, and often become less reliable during activities like bathing or sleeping [7]. Their accur acy also depends on where they are placed on the body [8]. These issues highlight the need for alternative solutions. In this context, non-wearable sensors offer a different appro ach by gathering data from the surrounding environment without needing users to carry them [3], [9]–[11]. However, they com e with their own set of challenges, such as privacy concerns wi th video cameras [12], and limitations from environmental fac tors (e.g., sunlight, obstructions) for infrared sensors [3], [ 6], [13]. In recent years, radar-based sensing, particularly Freque ncy- Modulated Continuous Wave (FMCW) radar, has become a promising solution for non-intrusive monitoring without w ear- able devices or cameras [3], [14]. Known for high accuracy and robustness, even in low-light and obstructed environ- ments [12], [14], [15], FMCW radars transmit chirp signals and use Doppler shifts to capture detailed motion informati on. This includes larger movements like walking and sitting, as well as precise actions like picking up objects or falling [3 ], [16]. Notably, while HAR has made great strides with diverse sensor technologies, there is still a need for accurate, non - intrusive, and privacy-preserving solutions. This work co n- tributes a new benchmark dataset for HAR, collected in realistic environments with a variety of activities, suppo rting robust model development and evaluation. Additionally, a novel data processing approach is presented that treats mul ti- dimensional radar feature maps as data vectors, preserving the spatiotemporal characteristics. Together, these contrib utions aim to enhance the performance and robustness of the pro- posed approach. This study introduces an FMCW radar-based framework for HAR, with the following key contributions: •We collected a dataset in a realistic setting using a low- resolution 60GHz mmWave FMCW radar, covering intri- cate and less-studied activities for real-life representa tion. It will be publicly available upon publication to support further HAR research with mmWave FMCW radar. •We have structured RD, RA, and RE feature maps to comprehensively capture motion patterns and spatial re- lationships, treating them as 3D data vectors instead of 2D images. Page 2: •We have shown that training an appropriate DL model with data from just three subjects is sufficient to achieve reliable and generalizable results across a range of human activities, provided that there are fixed objects in the environment. •We have validated the performance of the proposed approach using two distinct cross-validation methods on various ML and DL models. The paper is structured as follows: Section II reviews relat ed work. Section III details the methodology, radar setup, and data processing. Section IV presents experiments, results, and per- formance comparisons. Section V highlights key contributi ons and applications. Finally, Section VI concludes and sugges ts future research. II. R ELATED WORK Recent advancements in HAR systems have leveraged sophisticated DL models and innovative feature extraction techniques to enhance accuracy and efficiency [17]. For instance, Abdu et al. [12] used micro-Doppler spec- trogram images from the University of Glasgow dataset [18] and employed a transfer learning approach to compare three different pre-trained models and their combinations. They also applied a channel attention module to improve performance i n classifying six activities. Their experiments achieved 96.86% accuracy with the VGG-19 network and 99.71% accuracy using a combination of AlexNet and VGG-19, enhanced by the Canonical Correlation Analysis (CCA) feature fusion metho d. Kim et al. [19] combined Range-Time-Doppler (RTD) maps with a Range-Distributed Convolutional Neural Network (RD - CNN), achieving 96.49% accuracy in familiar environments using the University of Glasgow dataset [18]. Hsu et al. [20] developed a fall detection system using range-Doppler imag es and a CNN model to detect sudden speed changes, followed by a bidirectional long short-term memory network (Bi-LSTM) to confirm falls, achieving nearly 96% accuracy. Ding et al. [21] used dynamic range-Doppler frames (DRDF) and ST- ConvLSTM networks to capture and integrate spatial-tempor al features, achieving 96.5%accuracy in classifying six motions performed by 16 subjects. Bhavanasi et al. [22] evaluated various ML algorithms on Micro-Doppler and Range-Doppler maps from radar data in hospital environments. CNN models outperformed others, achieving high accuracy in classifyi ng ten activities performed by 29subjects. Despite significant progress, most radar-based HAR systems use image-based feature maps [7], [12], [19], [20]. These ha ve limitations like poor interpretability due to high similar ity between activity images and susceptibility to noise and ar- tifacts [23]. Furthermore, training DL models on image-bas ed feature maps often requires large labeled datasets, which c an be challenging to obtain [19]. To address this, we propose an alternative approach that directly uses multi-dimensiona l radar feature maps—Range-Doppler (RD), Range-Azimuth (RA), and Range-Elevation (RE)— as 3D data vectors. Validation across various conventional and modern models demonstrate d robust performance with data from just three participants, minimizing the need for large datasets. Additionally, real - world data collection poses challenges, as many studies usecontrolled lab environments that do not reflect real-life co n- ditions [21].To advance HAR with new radar systems and benchmark datasets, this study introduces a novel dataset f rom a low-resolution 60GHz mmWave FMCW radar and a frame- work for its pre- and post-processing in HAR applications. III. M ETHODOLOGY This section provides an overview of the functional pro- totype of our proposed framework, which is depicted in Fig. 1. Each component is described in detail in the followin g subsections. A. Radar Setup and Data Collection This study uses the BGT60TR13C, an FMCW radar system from Infineon Technologies AG [24]. It has a single transmitt er and three receivers, operating in the 58-63.5 GHz band with a configurable chirp duration. The antennas are in an L-shaped configuration, with RX1 and RX3 for azimuth and RX2 and RX3 for elevation angle measurements [25]. Table I lists the radar configuration parameters used in this study. TABLE I: Radar configuration and specification. Parameters Value Radar Model BGT60TR13C Start Frequency 61 GHz End Frequency 62 GHz Transmit Output Power 5 dBm ADC Sampling Rate 2 Msps Frame Rate 10 Chirps Per Frame 128 Number of Tx Antennas 1 Number of Rx Antennas 3 Range resolution 15 cm Max Unambiguous Range 4.8 m To collect data, the radar-based sensor is arranged to ensur e optimal bedroom coverage, minimizing side-lobe interfere nce and maximizing performance. It is mounted at a height of 210 cm and angled downward at 30degrees. Three healthy subjects participated in the study, performing activities over 16trials. Two subjects completed six trials each, and the third completed four. Each trial began with a 1-minute data collection in an empty room. Subjects then performed: walki ng for2minutes, sitting on a bed for 2minutes, lying on the bed for5minutes, and lying on the floor for 5minutes. In some sessions, they also sat on a chair for 2minutes. Data was collected continuously, capturing both steady-st ate activities and transitions. Transitions were grouped into a single “Transition” class to address classification challe nges and balance the dataset. The data was categorized into seven activity classes: Empty Room, Walking, Sitting on the Bed, Sitting on a Chair, Lying on the Bed, Lying on the Floor, and Transition. The data collection protocol was approved by the University of Waterloo’s Research Ethics Committee. All activities complied with applicable safety and ethical standards. The dataset will be made available at GitHub1upon acceptance of this manuscript. 1https://github.com/ 2 Page 3: Fig. 1: Overview of the proposed framework for FMCW radar-ba sed HAR. B. Data Preprocessing and Feature Extraction In FMCW radar systems, chirp sequences are transmitted via theTX antenna, with reflections captured by RX an- tennas. The received data forms a three-dimensional array o f dimensions C×N×M, whereCrepresents the number of channels, Ndenotes the number of chirps per frame, and M indicates the number of samples per chirp. The data structur e comprises fast time rows (single chirp/range bin data) and slow time columns (same-sample data across chirps) [26]. 1) Blackman-Harris Window: The preprocessing pipeline involves DC bias removal to eliminate low-frequency noise and artifacts [27]. Subsequently, a Blackman-Harris windo w function is applied to mitigate spectral leakage in the fre- quency domain. This windowing technique gradually attenu- ates signal amplitudes at the boundaries, reducing abrupt t ran- sitions that could introduce spurious frequency component s during Fourier transformation. This approach enhances the fidelity of spectral analysis, particularly when employing the Fast Fourier Transform (FFT). 2) Range-FFT Map: Range detection is performed by computing the FFT along the fast-time axis, where spectral peaks correspond to target distances [28]. The resulting Ra nge- FFT map provides a frequency-domain representation of targ et reflections. The radar transmits consecutive chirps separa ted by a fixed time interval to estimate target velocity. Each reflected chirp undergoes a Range-FFT to determine the targe t position. While the peaks in the Range-FFT spectrum align for both chirps, their phases differ due to target movement. This phase shift provides information about the velocity of the target [29], [30]. 3) Moving Target Indicator (MTI): The signal consists of two main types of reflections. The first is clutter, which refers to echoes from stationary objects in the environment . The second type originates from moving objects, particular ly individuals engaged in daily activities. A clutter removal algorithm is employed to reduce the impact of clutter. The MTI implements linear filtering to suppress clutter while preserving dynamic target signatures. In FMCW radar systems, the Finite Impulse Response (FIR) implementationoffers a good balance of simplicity and effectiveness [31]. At each time step, the maximum absolute value across the slow time dimension for each range bin is denoted as ri,max . The MTI filter output tiis then calculated as a weighted average of this peak value and the previous filter output ti−1, using a weighting factor α: ti=α·ri,max+(1−α)·ti−1 (1) At the initial time step ( t0), the filter output tiis initialized to zero. For each range bin, the MTI filter removes the influence of stationary objects by subtracting ti−1fromri,max , resulting in the filtered FFT value ri,filt [32]: ri,filt=|ri,max−ti−1| (2) This method of subtracting an estimate of stationary back- ground clutter effectively eliminates static targets whil e having minimal effect on slow-moving objects. FIR MTI filters are preferred for their simple design, adjustable parameters, and linear phase response [31], [32]. 4) Range-Doppler Map: The processing sequence contin- ues with a second FFT applied along the vertical axis to extract Doppler information for each channel. The output is the Range-Doppler map (RDM). 5) Capon Algorithm: Three-dimensional object localization requires precise determination of range, as well as vertica l and horizontal angles. The Capon algorithm, also known as the MVDR beamformer, is applied to the RD map to enable accurate position estimation and support the generation of RA and RE maps. It is designed to estimate the Angle of Arrival (AoA) in noisy and interference-prone environments [33], [34]. The algorithm minimizes the output power of an array while maintaining a distortionless response at the desired an- gle, effectively enhancing the signal-to-interference-p lus-noise ratio by suppressing unwanted signals and focusing on the target direction. This process is achieved through constra ined optimization. For more details on the Capon beamforming algorithm, refer to our previous work [35]. 3 Page 4: Fig. 2: Examples of a participant performing the activities with corresponding feature maps. ( A1) walking, ( A2) sitting on the bed, (A3) sitting on the chair, ( A4) lying down on the bed, ( A5) lying down on the floor, ( A6) empty room, ( A7) transition. The final 3D data structure, called the data cube, is formed by combining the RD, RA, and RE feature maps. Before feeding data to DL models, segmentation is done on sets of five consecutive data cubes to balance granularity and efficienc y. If all five data cubes have the same label activity, they are merged into a single segment. If activities differ, such as t hree matching and two different, the matching cubes are excluded , and segmentation pauses. It resumes with the next group of data cubes for the subsequent activity, ensuring consisten cy and preventing conflicting labels from merging. Figure 2 illustrates examples of a participant performing various activities, along with the corresponding feature m aps. As shown in the figure, when the subject is moving, there is significantly less clutter and noise, allowing us to clear ly observe the moving subject (Fig.2 A1toA5). This clarity is achieved through our implementation of the MTI. In contrast , the feature map for the ‘Empty room’ (Fig.2 A6) shows visible clutter and noise, which are echoes from stationary objects . C. Activity Recognition Models In this study, we evaluated the performance of two con- ventional ML classifiers—Support Vector Machines (SVM) and Multi-Layer Perceptrons (MLP)—for recognizing vari- ous activities. To address the limitations of ML models, we further implemented LSTM, CNN and ConvLSTM models. The configurations of these models are outlined below, with unspecified parameters set to their default values: •SVM [36]: The kernel was set to a radial basis function (RBF) with regularization term C= 10 and probabil- ity=True. •MLP [37]: The network consisted of two hidden layers with 128 and 64 neurons, and Rectified Linear Unit (ReLU) activation function. Training was performed us- ing the Adaptive moment estimation (Adam) optimizer with a learning rate of 1×10−3, a maximum of 300 iterations, and random state=42. •CNN [22]: The model consists of four 3D convolutional blocks with 8, 16, 32, and 64 filters respectively, eachusing a kernel size of ( 3×3×3) and Exponential Linear Unit (ELU) activation function. Each block is followed by a1×2×2max pooling layer. The network includes two fully connected layers: one with 128 neurons and another with 7, 6, 5, or 4 neurons (corresponding to the number of activities being classified). Dropout is applied after each layer at progressively increasing rates from 0.2 to 0.5, excluding the output layer. The network is trained using the Adam optimizer with a cross-entropy loss function, a learning rate of 1×10−3, and a maximum of 100 epochs with early stopping (patience = 10 epochs). •LSTM [1]: The architecture comprised two Bi-LSTM layers with 256 units and a dropout of 0.5. The network includes two fully connected layers: one with 128 neurons and another with 7, 6, 5, or 4 neurons (corresponding to the number of activities being classified). The model was trained using the Adam optimizer with a decay factor of 0.9 and an initial learning rate of 1×10−3. This learning rate was reduced to 10% of its initial value at the 200th epoch, with training continuing for a maximum of 400 epochs with early stopping (patience = 40 epochs). •ConvLSTM [38]: The architecture consists of one Con- vLSTM block with 32 filters, kernel size of ( 3×3), and ReLU activation function. This is followed by batch nor- malization, 3D MaxPooling ( 1×2×2), and dropout (0.3). The output is connected to a fully connected layer with 64 neurons and ReLU activation, followed by dropout (0.5). The final fully connected layer contains 7, 6, 5, or 4 neurons (corresponding to the number of activities being classified) with softmax activation. The model was trained using Stochastic Gradient Descent (SGD) with momentum 0.9 and weight decay 1×10−4, optimizing the categorical cross-entropy loss function. Training utiliz ed a learning rate of 1×10−4, batch size of 64, and a maximum of 100 epochs with early stopping (patience = 10 epochs). These hyperparameters were determined via a systematic grid search to balance computational efficiency and model performance. The models employed preprocessed RD, RA, 4 Page 5: and RE data from the FMCW radar as input features. Also, to accelerate training and reduce overfitting, we applied prin cipal component analysis (PCA) with 100 components before feed- ing the data to the ML models. Our framework processes the spatiotemporal characteristics of FMCW radar outputs thro ugh three-dimensional (3D) input tensors. IV. E XPERIMENTAL EVALUATION This section presents experimental results obtained from an FMCW radar dataset, including analyses of feature maps, activities, and ML and DL models. We detail the setup, result s, and performance analysis. A. Experimental Setup To validate our approach, we employed two distinct strate- gies. In the first strategy, Cross-Scene Validation (CSV), w e used80% of the data from 14 of the 16 distinct scenes for training and reserved the remaining 20% for validation. The two remaining scenes were held out for testing. In the second strategy, Leave-One-Person-Out Cross-Validat ion (LOPO-CV), we repeated the procedure three times—each time leaving one subject out—and computed the accuracy andF1-score. Specifically, in each fold, we trained the model using80% of the data from the two subjects, reserved the remaining 20% for validation, and used the unseen subject’s data for testing. For example, in the first iteration, the dat a of the first subject was held out for testing, while data from the second and third subjects was split for training and validat ion; in the subsequent iterations, the roles were rotated accord ingly. Tables II and III summarize the data distribution for each validation strategy. TABLE II: Activity sample counts under CSV approach. Label Training Validation Testing Total Symbol Walking 6,495 1,625 1,250 9,370 A1 Sitting on the Bed 5,130 1,285 705 7,120 A2 Sitting on the Chair 1,255 310 1,205 2,770 A3 Lying Down on the Bed 12,870 3,220 1,540 17,630 A4 Lying Down on the Floor 9,810 2,455 3,040 15,305 A5 Empty Room 3,405 850 640 4,895 A6 Transition 4,305 1,075 790 6,170 A7 Total 43,270 10,820 9,170 63,260 TABLE III: Activity sample counts per subject in LOPO-CV approach. ( A1) walking, ( A2) sitting on the bed, ( A3) sitting on the chair, ( A4) lying down on the bed, ( A5) lying down on the floor, ( A6) empty room, ( A7) transition. ( S1) subject one, (S2) subject two and ( S3) subject three. Label S1S2S3 A1 3020 3800 2583 A2 1800 2661 2684 A3 1205 1568 - A4 5327 6142 6181 A5 5993 6252 3082 A6 1514 2030 1382 A7 1749 2170 2298To evaluate the models, we developed a normalization tech- nique. Mean and standard deviation were computed from each feature channel using the combined training and validation data. These parameters normalized both the training-valid ation subset and the test set, ensuring uniform scaling and improv ing the learning process. All computations were performed on a high-performance system featuring an AMD Ryzen 7 6800H processor, 32GB of system memory, and an NVIDIA GeForce RTX 3060 GPU running on Windows 11. For model de- velopment, we used Python 3.8.19 , Scikit-learn 1.3.2 , and TensorFlow 2.10. For comprehensive models evaluation, we employed differ- ent metrics for each validation approach. In the CSV strat- egy, we assessed each model performance through accuracy, precision, recall, and F1-score. For the LOPO-CV approach, we calculated accuracy and F1-score during classification of different numbers of activity of each subject, ultimatel y computing the average performance across all three subject s. B. Experimental Results 1) CSV: The performance comparison of different models evaluated over multiple activity categories with combinat ion of RD, RA, and RE (RD+RA+RE) feature maps as models’ input is presented in Table IV. Table IV illustrates a consistent performance pattern acro ss all activity sets: DL models (CNN, LSTM, and ConvLSTM) significantly outperform traditional ML approaches (SVM an d MLP). In particular, ConvLSTM consistently achieves the highest performance across all activity sets, with peak acc u- racies of 90.51% for 7-activity classification and 97.87% for 4-activity classification. This trend indicates that ConvL STM’s ability to capture both spatial and temporal features provi des a substantial advantage for HAR tasks using radar data. The classification results of the ConvLSTM model with different feature inputs over various numbers of activitie s are summarized in Tables V–VIII. Additionally, Fig. 3 shows the confusion matrices for different tasks, while Fig. 4 illust rates the training and validation loss curves of the ConvLSTM model across different activity sets using the RD+RA+RE feature maps. As shown in Tables V–VIII, classification performance improves as the number of classes decreases. Specifically, theF1-score increases from 87.31% for seven classes to 98.05% for four classes. The RE map consistently achieves the best results across all tasks among single-feature inputs. For pairwise combinations, except for four classes, RD+RE maps outperform other combinations. Integrating all three feat ures yields the highest performance, slightly surpassing RD+RE maps. This highlights the significance of combining motion (Doppler) and spatial dimensions (azimuth and elevation) t o comprehensively represent activities, thereby enhancing clas- sification accuracy. The training process, shown in Fig. 4, demonstrates a steady decrease in both training and validation loss, reaching sui tably low levels. This reflects effective model learning, the abse nce of overfitting, and strong generalization capability. 5 Page 6: (a) 7 activities. (b) 6 activities. (c) 5 activities. (d) 4 activities. Fig. 3: Confusion matrices for the ConvLSTM model on various activity sets under CSV (RD+RA+RE inputs). (a) 7 activities. (b) 6 activities. (c) 5 activities. (d) 4 activities. Fig. 4: Training and validation loss curves for the ConvLSTM model under CSV (RD+RA+RE inputs). TABLE IV: Performance metrics for various models under CSV with RD+RA+RE inputs. #Activities ModelMetrics (%) Accuracy Precision Recall F1-score 7SVM 70.97 72.20 74.17 70.41 MLP 68.71 66.35 68.76 65.85 CNN 89.48 86.66 88.81 87.17 LSTM 88.50 85.09 87.35 85.64 ConvLSTM 90.51 86.75 88.51 87.31 6SVM 76.34 80.11 82.02 77.75 MLP 75.40 76.30 78.77 75.58 CNN 92.60 91.59 93.67 91.83 LSTM 94.03 92.87 95.19 93.55 ConvLSTM 95.29 94.08 96.02 94.73 5SVM 76.63 83.76 81.19 79.86 MLP 78.25 82.23 83.22 80.99 CNN 92.57 94.09 94.40 93.79 LSTM 93.99 95.08 95.08 94.79 ConvLSTM 96.06 96.86 96.15 96.39 4SVM 88.80 92.16 88.53 89.08 MLP 90.28 92.26 90.33 90.43 CNN 96.28 97.12 96.17 96.49 LSTM 96.91 97.61 96.79 97.11 ConvLSTM 97.87 98.39 97.80 98.05TABLE V: Comparison of ConvLSTM model performance for 7 activity classification under the CSV approach. Metrics (%)Model Input RD RA RE RD+RA RD+RE RA+RE RD+RA+RE Accuracy 76.94 71.76 89.53 86.26 90.40 87.51 90.51 Precision 80.12 74.32 86.02 84.96 87.05 84.47 86.75 Recall 79.98 77.18 87.33 87.16 88.95 86.42 88.51 F1-score 77.66 72.95 86.12 85.51 87.70 84.59 87.31 TABLE VI: Comparison of ConvLSTM model performance for 6 activity classification under the CSV approach. Metrics (%)Model Input RD RA RE RD+RA RD+RE RA+RE RD+RA+RE Accuracy 82.70 78.22 94.09 92.60 95.58 91.23 95.29 Precision 85.53 83.60 92.79 92.69 94.43 90.66 94.08 Recall 87.80 86.96 95.14 94.74 96.43 93.67 96.02 F1-score 85.00 82.41 93.50 93.28 95.20 91.35 94.73 TABLE VII: Comparison of ConvLSTM model performance for 5 activity classification under the CSV approach. Metrics (%)Model Input RD RA RE RD+RA RD+RE RA+RE RD+RA+RE Accuracy 78.42 79.59 95.93 88.37 95.48 93.28 96.06 Precision 83.79 86.31 96.53 91.26 95.98 94.30 96.86 Recall 82.43 87.10 96.11 92.28 95.73 95.11 96.15 F1-score 80.79 84.26 96.24 90.71 95.67 94.43 96.39 2) LOPO-CV: We report LOPO-CV results for ConvL- STM, which outperformed other models in the CSV approach. RD+RA+RE feature maps were used, as they generally yielded the best performance as well. Table IX shows ConvLSTM’s performance in classifying activities using RD+RA+RE fea- 6 Page 7: TABLE VIII: Comparison of ConvLSTM model performance for 4 activity classification under the CSV approach. Metrics (%)Model Input RD RA RE RD+RA RD+RE RA+RE RD+RA+RE Accuracy 86.49 95.43 96.81 96.49 97.45 97.66 97.87 Precision 87.28 95.94 97.43 97.33 97.92 98.22 98.39 Recall 86.89 95.53 96.67 96.30 97.37 97.49 97.80 F1-score 85.76 95.56 96.96 96.68 97.58 97.80 98.05 ture maps. Accuracy and F1-score improve as the number of activities decreases, rising from 89.56% (87.15%) for 7 activ- ities to93% (91.12%) for 4 activities. These results confirm the model’s subject-independent robustness, generalizab ility, and the effectiveness of RD+RA+RE maps. Compared to the CSV approach, this method evaluates the model with a larger number of test samples. TABLE IX: Comparison of ConvLSTM model performance presented as percentages using the LOPO-CV approach. Test Subject7 Activity 6 Activity 5 Activity 4 Activity Accuracy F1-score Accuracy F1-score Accuracy F1-score Accuracy F1-score Subject 1 89.08 85.49 92.19 90.79 92.35 91.15 92.55 91.03 Subject 2 93.14 91.23 96.90 96.25 97.96 97.89 97.95 97.49 Subject 3 86.45 84.72 94.35 93.88 94.51 93.41 88.52 84.84 Average 89.56 87.15 94.48 93.64 94.94 94.15 93.00 91.12 V. D ISCUSSION The results of this study demonstrate the effectiveness of using FMCW radar for HAR through our proposed framework. In this framework, we utilized RD, RA, and RE feature maps across ML and DL models. The performance of the system is validated through two distinct strategies: CSV and LOPO-CV . The ConvLSTM model outperformed other models, achiev- ing an accuracy of 90.51% and anF1-score of87.31% on CSV , and an accuracy of 89.56% and anF1-score of 87.15% on LOPO-CV . ConvLSTM combines CNN and LSTM strengths by integrating convolution into LSTM gates, capturing spa- tiotemporal dependencies from RD, RA, and RE data. It excels in distinguishing similar activities where both spa tial and motion cues are crucial, such as “sitting on a bed” versus “sitting on a chair.” The steady decrease in training and validation loss, as illustrated in Fig. 4, further confir ms the model’s generalization ability and resistance to overfi tting. These findings highlight the potential of the proposed appro ach for scalable, non-intrusive, and privacy-preserving acti vity monitoring in real-world scenarios. One of the main contributions of this study is the introduc- tion of a novel approach that directly feeds multi-dimensio nal radar feature maps into the model. Unlike conventional ap- proaches that process feature maps as images, our method preserves the spatial and temporal structures of the data by treating RD, RA, and RE feature maps as data vectors. This approach successfully develops a robust HAR model using data from only three subjects, making it particularly suita ble for real-world applications where large-scale data collec tion is often infeasible.Our findings advance the current state of knowledge in the field by demonstrating that the proposed solution can achieve strong performance even when trained on a limited dataset. This is especially valuable for applications in he alth- care monitoring and smart environments, where non-intrusi ve activity monitoring is essential. The practical implicati ons of our work include the potential for real-time activity recog nition and monitoring in various domains, such as elderly care, rehabilitation, and smart home automation. VI. C ONCLUSION In this paper, we proposed an FMCW radar-based frame- work for HAR in home-like environments. The study demon- strated the feasibility and advantages of integrating mult i- dimensional feature maps as data vectors to effectively cap ture the temporal and spatial characteristics of human activiti es. By leveraging the advanced capabilities of FMCW radar systems in conjunction with DL models, particularly the ConvLSTM architecture, we successfully extracted spatiotemporal p atterns from radar feature maps, thereby surpassing traditional ML and DL approaches in performance. Our results emphasize the benefits of using multiple feature maps for improved recognition accuracy, with RE proving the most effective, highlighting the role of vertical spati al information in precise activity recognition. Furthermore , the ConvLSTM model effectively captured both motion and spa- tial dependencies, demonstrating robust performance acro ss diverse and challenging scenarios and when tested on unseen data. The proposed solution demonstrated strong performan ce even when trained on a limited dataset, making it suitable for real-world applications in domains such as healthcare a nd smart homes that require non-intrusive activity monitorin g. This advantage is particularly valuable when comprehensiv e data collection may be impractical due to domain-specific constraints. Despite promising results, the proposed approach has cer- tain limitations. Therefore, future research should focus on addressing these limitations such as the limited dataset di - versity and a restricted range of activities. Additionally , the computational cost of these models has not been calculated, though it is a crucial metric for large-scale implementatio n, real-time applications, and deployment on edge devices. Ad - dressing these limitations is essential to enhance the robu stness and applicability of the proposed framework in real-world scenarios. In conclusion, this study presents a novel FMCW radar- based framework for HAR that leverages multi-dimensional feature maps and advanced DL models. The proposed ap- proach demonstrates strong performance and practical appl i- cability, paving the way for future research and developmen t in the field of non-intrusive activity monitoring. ACKNOWLEDGMENTS The authors would like to thank all participants who par- ticipated in this study. The authors also thank M¨ alardalen University, the Schlegel-UW Research Institute for Aging, Gold Sentintel Inc, NSERC, and MITACS for supporting this study. 7 Page 8: REFERENCES [1] H. Li, A. Shrestha, H. Heidari, J. Le Kernec, and F. Fioran elli, “Bi-lstm network for multimodal continuous human activity recognit ion and fall detection,” IEEE Sensors Journal , vol. 20, no. 3, pp. 1191–1201, 2020. [2] S. Hu, S. Cao, N. Toosizadeh, J. Barton, M. G. Hector, and M . J. Fain, “Radar-based fall detection: A survey [survey],” IEEE Robotics & Automation Magazine , vol. 31, no. 3, pp. 170–185, 2024. [3] I. Ullmann, R. G. Guendel, N. C. Kruse, F. Fioranelli, and A. Yarovoy, “A survey on radar-based continuous human activity recogni tion,” IEEE Journal of Microwaves , vol. 3, pp. 938–950, 7 2023. [4] S. Ahmed and S. H. Cho, “Machine learning for healthcare r adars: Recent progresses in human vital sign measurement and activ ity recog- nition,” IEEE Communications Surveys & Tutorials , vol. 26, no. 1, pp. 461–495, 2024. [5] M. Walsh, J. Barton, B. O’Flynn, C. O’Mathuna, A. Hickey, and J. Kellett, “On the relationship between cummulative movem ent, clinical scores and clinical outcomes,” in SENSORS, 2012 IEEE . IEEE, 2012, pp. 1–4. [6] S. Zolfaghari, S. Suravee, D. Riboni, and K. Yordanova, “ Sensor-based locomotion data mining for supporting the diagnosis of neur odegenera- tive disorders: A survey,” ACM Computing Surveys , vol. 56, 8 2023. [7] Y . Zhao, H. Zhou, S. Lu, Y . Liu, X. An, and Q. Liu, “Human act ivity recognition based on non-contact radar data and improved pc a method,” Applied Sciences (Switzerland) , vol. 12, 7 2022. [8] M. Li, P. Li, S. Tian, K. Tang, and X. Chen, “Estimation of t emporal gait parameters using a human body electrostatic sensing-b ased method,” Sensors (Switzerland) , vol. 18, 6 2018. [9] A. Rezaei, M. C. Stevens, A. Argha, A. Mascheroni, A. Puia tti, and N. H. Lovell, “An unobtrusive human activity recognition sy stem using low resolution thermal sensors, machine and deep learning, ”IEEE Transactions on Biomedical Engineering , vol. 70, no. 1, pp. 115–124, 2023. [10] S. Zhang, Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li, “A re view on human activity recognition using vision-based method,” Journal of Healthcare Engineering , vol. 2017, no. 1, p. 3090343, 2017. [11] Y . Ma, G. Zhou, and S. Wang, “Wifi sensing with channel sta te information: A survey,” ACM Comput. Surv. , vol. 52, no. 3, jun 2019. [Online]. Available: https://doi.org/10.1145/3310194 [12] F. J. Abdu, Y . Zhang, and Z. Deng, “Activity classificati on based on feature fusion of fmcw radar human motion micro-doppler sig natures,” IEEE Sensors Journal , vol. 22, no. 9, pp. 8648–8662, 2022. [13] A. K. Seifert, M. Grimmer, and A. M. Zoubir, “Doppler rad ar for the extraction of biomechanical parameters in gait analysis,” IEEE Journal of Biomedical and Health Informatics , vol. 25, pp. 547–558, 2 2021. [14] P. Miazek, A. ˙Zmudzi´ nska, P. Karczmarek, and A. Kiersztyn, “Human behavior analysis using radar data. a survey,” IEEE Access , 2024. [15] K. Papadopoulos and M. Jelali, “A comparative study on r ecent progress of machine learning-based human activity recognition with radar,” Applied Sciences , vol. 13, no. 23, 2023. [16] A. Pearce, J. A. Zhang, R. Xu, and K. Wu, “Multi-object tr acking with mmwave radar: A review,” Electronics , vol. 12, no. 2, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/2/308 [17] H. Abedi, A. Ansariyan, P. P. Morita, A. Wong, J. Boger, a nd G. Shaker, “Ai-powered noncontact in-home gait monitoring and activi ty recogni- tion system based on mm-wave fmcw radar and cloud computing, ”IEEE Internet of Things Journal , vol. 10, no. 11, p. 9465–9481, Jun. 2023. [18] F. Fioranelli, S. A. Shah, H. Li, A. Shrestha, S. Yang, an d J. Le Kernec. (2019) Radar signatures of human activities. [19] W.-Y . Kim and D.-H. Seo, “Radar-based human activity re cog- nition combining range–time–doppler maps and range-distr ibuted- convolutional neural networks,” IEEE Transactions on Geoscience and Remote Sensing , vol. 60, pp. 1–11, 2022. [20] W.-L. Hsu, J.-X. Liu, C.-C. Yang, and J.-S. Leu, “A fall d etection system based on fmcw radar range-doppler image and bi-lstm deep lea rning,” IEEE Sensors Journal , vol. 23, no. 18, pp. 22 031–22 039, 2023. [21] C. Ding, L. Zhang, H. Chen, H. Hong, X. Zhu, and C. Li, “Hum an mo- tion recognition with spatial-temporal-convlstm network using dynamic range-doppler frames based on portable fmcw radar,” IEEE Transactions on Microwave Theory and Techniques , vol. 70, no. 11, pp. 5029–5038, 2022. [22] G. Bhavanasi, L. Werthen-Brabants, T. Dhaene, and I. Co uckuyt, “Pa- tient activity recognition using radar sensors and machine learning,” Neural Computing and Applications , vol. 34, no. 18, pp. 16 033–16 048, 2022.[23] C. Ding, L. Zhang, H. Chen, H. Hong, X. Zhu, and F. Fiorane lli, “Sparsity-based human activity recognition with pointnet using a portable fmcw radar,” IEEE Internet of Things Journal , vol. 10, no. 11, pp. 10 024–10 037, 2023. [24] I. T. AG, “Bgt60tr13c: Xensivtm 60ghz radar sensor for advanced sensing,” Infineon Technologies, accessed on 29th January 2024. [Online]. Available: https://www.infineon.com/cms/en/product/sensor/radar -sensors/radar-sensors-for-iot/60ghz [25] “Bgt60tr13c - xensiv™ 60ghz radar sensor for advanced s ensing,” https://www.infineon.com/cms/en/product/sensor/radar -sensors/radar-sensors-for-iot/60ghz 2023. [26] Z. Peng and C. Li, “Portable microwave radar systems for short-range localization and life tracking: A review,” 3 2019. [27] X. WANG, K. LEI, X. Yang, M. Li, and X. Wang, “Harmonic ana lysis based on blackman-harris self-multiplication window,” in 2020 5th Asia Conference on Power and Electrical Engineering (ACPEE) , 2020, pp. 2165–2169. [28] X. Zeng, H. S. L. B´ aruson, and A. Sundvall, “Walking ste p monitoring with a millimeter-wave radar in real-life environment for d isease and fall prevention for the elderly,” Sensors , vol. 22, no. 24, p. 9901, 2022. [29] C. Lovescu and S. Rao, “The fundamentals of millimeter w ave radar sensors,” Texas Instruments, Julio , 2020. [30] H. Abedi, C. Magnier, V . Mazumdar, and G. Shaker, “Impro ving pas- senger safety in cars using novel radar signal processing,” Engineering Reports , vol. 3, 12 2021. [31] M. Ash, M. Ritchie, and K. Chetty, “On the application of digital moving target indication techniques to short-range fmcw ra dar data,” IEEE Sensors Journal , vol. 18, no. 10, pp. 4167–4175, 2018. [32] C. Will, P. Vaishnav, A. Chakraborty, and A. Santra, “Hu man target detection, tracking, and classification using 24-ghz fmcw r adar,” IEEE Sensors Journal , vol. 19, no. 17, pp. 7283–7299, 2019. [33] J. Capon, “High-resolution frequency-wavenumber spe ctrum analysis,” Proceedings of the IEEE , vol. 57, no. 8, pp. 1408–1418, 1969. [34] M. Alizadeh, H. Abedi, and G. Shaker, “Low-cost low-pow er in- vehicle occupant detection with mm-wave fmcw radar,” in 2019 IEEE SENSORS , IEEE. Montreal, QC, Canada: IEEE, Oct. 27–30 2019, pp. 1–4. [35] H. Abedi, S. Luo, V . Mazumdar, M. M. Y . R. Riad, and G. Shak er, “Ai-powered in-vehicle passenger monitoring using low-co st mm-wave radar,” IEEE Access , vol. 10, pp. 18 998–19 012, 2021. [36] C. Cortes and V . Vapnik, “Support-vector networks,” Machine learning , vol. 20, pp. 273–297, 1995. [37] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learn ing repre- sentations by back-propagating errors,” nature , vol. 323, no. 6088, pp. 533–536, 1986. [38] X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.- c. Woo, “Convolutional lstm network: A machine learning appro ach for precipitation nowcasting,” Advances in neural information processing systems , vol. 28, 2015. 8

---