Authors: Ali Samimi Fard, Mohammadreza Mashhadigholamali, Samaneh Zolfaghari, Hajar Abedi, Mainak Chakraborty, Luigi Borzì, Masoud Daneshtalab, George Shaker
Paper Content:
Page 1:
arXiv:2503.05629v1 [cs.ET] 7 Mar 2025This work has been submitted to the IEEE for possible publica tion. Copyright may be transferred without notice, after wh ich this version may no longer be accessible. 1
Exploring FMCW Radars and Feature Maps for
Activity Recognition: A Benchmark Study
Ali Samimi Fard, Graduate Student Member, IEEE , Mohammadreza Mashhadigholamali, Graduate Student
Member, IEEE , Samaneh Zolfaghari*, Hajar Abedi, Member, IEEE , Mainak Chakraborty, Senior Member,
IEEE , Luigi Borz` ı, Member, IEEE , Masoud Daneshtalab, Senior Member, IEEE ,
George Shaker, Senior Member, IEEE
Abstract —Human Activity Recognition has gained significant
attention due to its diverse applications, including ambie nt as-
sisted living and remote sensing. Wearable sensor-based so lutions
often suffer from user discomfort and reliability issues, w hile
video-based methods raise privacy concerns and perform poo rly
in low-light conditions or long ranges. This study introduc es
a Frequency-Modulated Continuous Wave radar-based frame-
work for human activity recognition, leveraging a 60GHz
radar and multi-dimensional feature maps. Unlike conventi onal
approaches that process feature maps as images, this study
feeds multi-dimensional feature maps—Range-Doppler, Ran ge-
Azimuth, and Range-Elevation —as data vectors directly int o
the machine learning (SVM, MLP) and deep learning (CNN,
LSTM, ConvLSTM) models, preserving the spatial and tempora l
structures of the data. These features were extracted from a
novel dataset with seven activity classes and validated usi ng
two different validation approaches. The ConvLSTM model
outperformed conventional machine learning and deep learn ing
models, achieving an accuracy of 90.51% and anF1-score of
87.31% on cross-scene validation, and an accuracy of 89.56% and
anF1-score of 87.15% on leave-one-person-out cross-validation.
The results highlight the approach’s potential for scalabl e, non-
intrusive, and privacy-preserving activity monitoring in real-
world scenarios.
Index Terms —Human Activity Recognition, Frequency-
Modulated Continuous Wave Radar, Deep Learning, Ambient
Assisted Living, Remote Sensing.
I. I NTRODUCTION
HUMAN Activity Recognition (HAR) has become essen-
tial for applications in smart homes, healthcare mon-
itoring, gait analysis, and fall detection [1]–[3]. The ris ing
elderly population has heightened the need for reliable act ivity
monitoring to improve quality of life and provide timely eme r-
gency responses [2]. Recent advances in Machine Learning
(ML), Deep Learning (DL), and low-cost sensors have made
HAR technologies more accessible, enabling continuous, no n-
intrusive monitoring and quick interventions during emerg en-
Ali Samimi Fard, Mohammadreza Mashhadigholamali and Luigi Borz` ı
are with the Department of Control and Computer Engineering , Poly-
technic University of Turin, Turin, Italy ( Email:{ali.samimifard, moham-
madreza.mashhadigholami }@studenti.polito.it, luigi.borzi@polito.it )
Samaneh Zolfaghari (Corresponding author), Mainak Chakra borty and
Masoud Daneshtalab are with School of Innovation, Design an d Engi-
neering, Division of Intelligent Future Technologies, M¨ a lardalen University,
V¨ aster˚ as, Sweden ( Email:{samaneh.zolfaghari, mainak.chakraborty, ma-
soud.daneshtalab }@mdu.se )
Hajar Abedi and George Shaker are with Department of Electri cal and
Computer Engineering, University of Waterloo, Waterloo, O ntario, Canada
(Email:{habedifi, gshaker }@uwaterloo.ca )cies [4], [5]. When it comes to sensors, they can be divided
into two main categories: wearable and non-wearable [6].
Despite their widespread adoption, wearable sensors come
with several challenges. They can cause discomfort for user s,
rely heavily on battery life, and often become less reliable
during activities like bathing or sleeping [7]. Their accur acy
also depends on where they are placed on the body [8].
These issues highlight the need for alternative solutions. In
this context, non-wearable sensors offer a different appro ach
by gathering data from the surrounding environment without
needing users to carry them [3], [9]–[11]. However, they com e
with their own set of challenges, such as privacy concerns wi th
video cameras [12], and limitations from environmental fac tors
(e.g., sunlight, obstructions) for infrared sensors [3], [ 6], [13].
In recent years, radar-based sensing, particularly Freque ncy-
Modulated Continuous Wave (FMCW) radar, has become a
promising solution for non-intrusive monitoring without w ear-
able devices or cameras [3], [14]. Known for high accuracy
and robustness, even in low-light and obstructed environ-
ments [12], [14], [15], FMCW radars transmit chirp signals
and use Doppler shifts to capture detailed motion informati on.
This includes larger movements like walking and sitting, as
well as precise actions like picking up objects or falling [3 ],
[16].
Notably, while HAR has made great strides with diverse
sensor technologies, there is still a need for accurate, non -
intrusive, and privacy-preserving solutions. This work co n-
tributes a new benchmark dataset for HAR, collected in
realistic environments with a variety of activities, suppo rting
robust model development and evaluation. Additionally, a
novel data processing approach is presented that treats mul ti-
dimensional radar feature maps as data vectors, preserving the
spatiotemporal characteristics. Together, these contrib utions
aim to enhance the performance and robustness of the pro-
posed approach. This study introduces an FMCW radar-based
framework for HAR, with the following key contributions:
•We collected a dataset in a realistic setting using a low-
resolution 60GHz mmWave FMCW radar, covering intri-
cate and less-studied activities for real-life representa tion.
It will be publicly available upon publication to support
further HAR research with mmWave FMCW radar.
•We have structured RD, RA, and RE feature maps to
comprehensively capture motion patterns and spatial re-
lationships, treating them as 3D data vectors instead of
2D images.
Page 2:
•We have shown that training an appropriate DL model
with data from just three subjects is sufficient to achieve
reliable and generalizable results across a range of human
activities, provided that there are fixed objects in the
environment.
•We have validated the performance of the proposed
approach using two distinct cross-validation methods on
various ML and DL models.
The paper is structured as follows: Section II reviews relat ed
work. Section III details the methodology, radar setup, and data
processing. Section IV presents experiments, results, and per-
formance comparisons. Section V highlights key contributi ons
and applications. Finally, Section VI concludes and sugges ts
future research.
II. R ELATED WORK
Recent advancements in HAR systems have leveraged
sophisticated DL models and innovative feature extraction
techniques to enhance accuracy and efficiency [17].
For instance, Abdu et al. [12] used micro-Doppler spec-
trogram images from the University of Glasgow dataset [18]
and employed a transfer learning approach to compare three
different pre-trained models and their combinations. They also
applied a channel attention module to improve performance i n
classifying six activities. Their experiments achieved 96.86%
accuracy with the VGG-19 network and 99.71% accuracy
using a combination of AlexNet and VGG-19, enhanced by the
Canonical Correlation Analysis (CCA) feature fusion metho d.
Kim et al. [19] combined Range-Time-Doppler (RTD) maps
with a Range-Distributed Convolutional Neural Network (RD -
CNN), achieving 96.49% accuracy in familiar environments
using the University of Glasgow dataset [18]. Hsu et al. [20]
developed a fall detection system using range-Doppler imag es
and a CNN model to detect sudden speed changes, followed by
a bidirectional long short-term memory network (Bi-LSTM)
to confirm falls, achieving nearly 96% accuracy. Ding et
al. [21] used dynamic range-Doppler frames (DRDF) and ST-
ConvLSTM networks to capture and integrate spatial-tempor al
features, achieving 96.5%accuracy in classifying six motions
performed by 16 subjects. Bhavanasi et al. [22] evaluated
various ML algorithms on Micro-Doppler and Range-Doppler
maps from radar data in hospital environments. CNN models
outperformed others, achieving high accuracy in classifyi ng
ten activities performed by 29subjects.
Despite significant progress, most radar-based HAR systems
use image-based feature maps [7], [12], [19], [20]. These ha ve
limitations like poor interpretability due to high similar ity
between activity images and susceptibility to noise and ar-
tifacts [23]. Furthermore, training DL models on image-bas ed
feature maps often requires large labeled datasets, which c an
be challenging to obtain [19]. To address this, we propose an
alternative approach that directly uses multi-dimensiona l radar
feature maps—Range-Doppler (RD), Range-Azimuth (RA),
and Range-Elevation (RE)— as 3D data vectors. Validation
across various conventional and modern models demonstrate d
robust performance with data from just three participants,
minimizing the need for large datasets. Additionally, real -
world data collection poses challenges, as many studies usecontrolled lab environments that do not reflect real-life co n-
ditions [21].To advance HAR with new radar systems and
benchmark datasets, this study introduces a novel dataset f rom
a low-resolution 60GHz mmWave FMCW radar and a frame-
work for its pre- and post-processing in HAR applications.
III. M ETHODOLOGY
This section provides an overview of the functional pro-
totype of our proposed framework, which is depicted in
Fig. 1. Each component is described in detail in the followin g
subsections.
A. Radar Setup and Data Collection
This study uses the BGT60TR13C, an FMCW radar system
from Infineon Technologies AG [24]. It has a single transmitt er
and three receivers, operating in the 58-63.5 GHz band with a
configurable chirp duration. The antennas are in an L-shaped
configuration, with RX1 and RX3 for azimuth and RX2 and
RX3 for elevation angle measurements [25]. Table I lists the
radar configuration parameters used in this study.
TABLE I: Radar configuration and specification.
Parameters Value
Radar Model BGT60TR13C
Start Frequency 61 GHz
End Frequency 62 GHz
Transmit Output Power 5 dBm
ADC Sampling Rate 2 Msps
Frame Rate 10
Chirps Per Frame 128
Number of Tx Antennas 1
Number of Rx Antennas 3
Range resolution 15 cm
Max Unambiguous Range 4.8 m
To collect data, the radar-based sensor is arranged to ensur e
optimal bedroom coverage, minimizing side-lobe interfere nce
and maximizing performance. It is mounted at a height of
210 cm and angled downward at 30degrees. Three healthy
subjects participated in the study, performing activities over
16trials. Two subjects completed six trials each, and the
third completed four. Each trial began with a 1-minute data
collection in an empty room. Subjects then performed: walki ng
for2minutes, sitting on a bed for 2minutes, lying on the
bed for5minutes, and lying on the floor for 5minutes.
In some sessions, they also sat on a chair for 2minutes.
Data was collected continuously, capturing both steady-st ate
activities and transitions. Transitions were grouped into a
single “Transition” class to address classification challe nges
and balance the dataset. The data was categorized into seven
activity classes: Empty Room, Walking, Sitting on the Bed,
Sitting on a Chair, Lying on the Bed, Lying on the Floor,
and Transition. The data collection protocol was approved
by the University of Waterloo’s Research Ethics Committee.
All activities complied with applicable safety and ethical
standards. The dataset will be made available at GitHub1upon
acceptance of this manuscript.
1https://github.com/
2
Page 3:
Fig. 1: Overview of the proposed framework for FMCW radar-ba sed HAR.
B. Data Preprocessing and Feature Extraction
In FMCW radar systems, chirp sequences are transmitted
via theTX antenna, with reflections captured by RX an-
tennas. The received data forms a three-dimensional array o f
dimensions C×N×M, whereCrepresents the number of
channels, Ndenotes the number of chirps per frame, and M
indicates the number of samples per chirp. The data structur e
comprises fast time rows (single chirp/range bin data) and slow
time columns (same-sample data across chirps) [26].
1) Blackman-Harris Window: The preprocessing pipeline
involves DC bias removal to eliminate low-frequency noise
and artifacts [27]. Subsequently, a Blackman-Harris windo w
function is applied to mitigate spectral leakage in the fre-
quency domain. This windowing technique gradually attenu-
ates signal amplitudes at the boundaries, reducing abrupt t ran-
sitions that could introduce spurious frequency component s
during Fourier transformation. This approach enhances the
fidelity of spectral analysis, particularly when employing the
Fast Fourier Transform (FFT).
2) Range-FFT Map: Range detection is performed by
computing the FFT along the fast-time axis, where spectral
peaks correspond to target distances [28]. The resulting Ra nge-
FFT map provides a frequency-domain representation of targ et
reflections. The radar transmits consecutive chirps separa ted
by a fixed time interval to estimate target velocity. Each
reflected chirp undergoes a Range-FFT to determine the targe t
position. While the peaks in the Range-FFT spectrum align
for both chirps, their phases differ due to target movement.
This phase shift provides information about the velocity of
the target [29], [30].
3) Moving Target Indicator (MTI): The signal consists of
two main types of reflections. The first is clutter, which
refers to echoes from stationary objects in the environment .
The second type originates from moving objects, particular ly
individuals engaged in daily activities. A clutter removal
algorithm is employed to reduce the impact of clutter.
The MTI implements linear filtering to suppress clutter
while preserving dynamic target signatures. In FMCW radar
systems, the Finite Impulse Response (FIR) implementationoffers a good balance of simplicity and effectiveness [31]. At
each time step, the maximum absolute value across the slow
time dimension for each range bin is denoted as ri,max . The
MTI filter output tiis then calculated as a weighted average
of this peak value and the previous filter output ti−1, using a
weighting factor α:
ti=α·ri,max+(1−α)·ti−1 (1)
At the initial time step ( t0), the filter output tiis initialized
to zero. For each range bin, the MTI filter removes the
influence of stationary objects by subtracting ti−1fromri,max ,
resulting in the filtered FFT value ri,filt [32]:
ri,filt=|ri,max−ti−1| (2)
This method of subtracting an estimate of stationary back-
ground clutter effectively eliminates static targets whil e having
minimal effect on slow-moving objects. FIR MTI filters are
preferred for their simple design, adjustable parameters, and
linear phase response [31], [32].
4) Range-Doppler Map: The processing sequence contin-
ues with a second FFT applied along the vertical axis to
extract Doppler information for each channel. The output is
the Range-Doppler map (RDM).
5) Capon Algorithm: Three-dimensional object localization
requires precise determination of range, as well as vertica l
and horizontal angles. The Capon algorithm, also known as
the MVDR beamformer, is applied to the RD map to enable
accurate position estimation and support the generation of RA
and RE maps. It is designed to estimate the Angle of Arrival
(AoA) in noisy and interference-prone environments [33],
[34]. The algorithm minimizes the output power of an array
while maintaining a distortionless response at the desired an-
gle, effectively enhancing the signal-to-interference-p lus-noise
ratio by suppressing unwanted signals and focusing on the
target direction. This process is achieved through constra ined
optimization. For more details on the Capon beamforming
algorithm, refer to our previous work [35].
3
Page 4:
Fig. 2: Examples of a participant performing the activities with corresponding feature maps. ( A1) walking, ( A2) sitting on the
bed, (A3) sitting on the chair, ( A4) lying down on the bed, ( A5) lying down on the floor, ( A6) empty room, ( A7) transition.
The final 3D data structure, called the data cube, is formed
by combining the RD, RA, and RE feature maps. Before
feeding data to DL models, segmentation is done on sets of five
consecutive data cubes to balance granularity and efficienc y.
If all five data cubes have the same label activity, they are
merged into a single segment. If activities differ, such as t hree
matching and two different, the matching cubes are excluded ,
and segmentation pauses. It resumes with the next group of
data cubes for the subsequent activity, ensuring consisten cy
and preventing conflicting labels from merging.
Figure 2 illustrates examples of a participant performing
various activities, along with the corresponding feature m aps.
As shown in the figure, when the subject is moving, there
is significantly less clutter and noise, allowing us to clear ly
observe the moving subject (Fig.2 A1toA5). This clarity is
achieved through our implementation of the MTI. In contrast ,
the feature map for the ‘Empty room’ (Fig.2 A6) shows visible
clutter and noise, which are echoes from stationary objects .
C. Activity Recognition Models
In this study, we evaluated the performance of two con-
ventional ML classifiers—Support Vector Machines (SVM)
and Multi-Layer Perceptrons (MLP)—for recognizing vari-
ous activities. To address the limitations of ML models, we
further implemented LSTM, CNN and ConvLSTM models.
The configurations of these models are outlined below, with
unspecified parameters set to their default values:
•SVM [36]: The kernel was set to a radial basis function
(RBF) with regularization term C= 10 and probabil-
ity=True.
•MLP [37]: The network consisted of two hidden layers
with 128 and 64 neurons, and Rectified Linear Unit
(ReLU) activation function. Training was performed us-
ing the Adaptive moment estimation (Adam) optimizer
with a learning rate of 1×10−3, a maximum of 300
iterations, and random state=42.
•CNN [22]: The model consists of four 3D convolutional
blocks with 8, 16, 32, and 64 filters respectively, eachusing a kernel size of ( 3×3×3) and Exponential Linear
Unit (ELU) activation function. Each block is followed by
a1×2×2max pooling layer. The network includes two
fully connected layers: one with 128 neurons and another
with 7, 6, 5, or 4 neurons (corresponding to the number of
activities being classified). Dropout is applied after each
layer at progressively increasing rates from 0.2 to 0.5,
excluding the output layer. The network is trained using
the Adam optimizer with a cross-entropy loss function, a
learning rate of 1×10−3, and a maximum of 100 epochs
with early stopping (patience = 10 epochs).
•LSTM [1]: The architecture comprised two Bi-LSTM
layers with 256 units and a dropout of 0.5. The network
includes two fully connected layers: one with 128 neurons
and another with 7, 6, 5, or 4 neurons (corresponding to
the number of activities being classified). The model was
trained using the Adam optimizer with a decay factor of
0.9 and an initial learning rate of 1×10−3. This learning
rate was reduced to 10% of its initial value at the 200th
epoch, with training continuing for a maximum of 400
epochs with early stopping (patience = 40 epochs).
•ConvLSTM [38]: The architecture consists of one Con-
vLSTM block with 32 filters, kernel size of ( 3×3), and
ReLU activation function. This is followed by batch nor-
malization, 3D MaxPooling ( 1×2×2), and dropout (0.3).
The output is connected to a fully connected layer with
64 neurons and ReLU activation, followed by dropout
(0.5). The final fully connected layer contains 7, 6, 5,
or 4 neurons (corresponding to the number of activities
being classified) with softmax activation. The model was
trained using Stochastic Gradient Descent (SGD) with
momentum 0.9 and weight decay 1×10−4, optimizing the
categorical cross-entropy loss function. Training utiliz ed
a learning rate of 1×10−4, batch size of 64, and a
maximum of 100 epochs with early stopping (patience
= 10 epochs).
These hyperparameters were determined via a systematic
grid search to balance computational efficiency and model
performance. The models employed preprocessed RD, RA,
4
Page 5:
and RE data from the FMCW radar as input features. Also, to
accelerate training and reduce overfitting, we applied prin cipal
component analysis (PCA) with 100 components before feed-
ing the data to the ML models. Our framework processes the
spatiotemporal characteristics of FMCW radar outputs thro ugh
three-dimensional (3D) input tensors.
IV. E XPERIMENTAL EVALUATION
This section presents experimental results obtained from
an FMCW radar dataset, including analyses of feature maps,
activities, and ML and DL models. We detail the setup, result s,
and performance analysis.
A. Experimental Setup
To validate our approach, we employed two distinct strate-
gies. In the first strategy, Cross-Scene Validation (CSV), w e
used80% of the data from 14 of the 16 distinct scenes
for training and reserved the remaining 20% for validation.
The two remaining scenes were held out for testing. In
the second strategy, Leave-One-Person-Out Cross-Validat ion
(LOPO-CV), we repeated the procedure three times—each
time leaving one subject out—and computed the accuracy
andF1-score. Specifically, in each fold, we trained the model
using80% of the data from the two subjects, reserved the
remaining 20% for validation, and used the unseen subject’s
data for testing. For example, in the first iteration, the dat a of
the first subject was held out for testing, while data from the
second and third subjects was split for training and validat ion;
in the subsequent iterations, the roles were rotated accord ingly.
Tables II and III summarize the data distribution for each
validation strategy.
TABLE II: Activity sample counts under CSV approach.
Label Training Validation Testing Total Symbol
Walking 6,495 1,625 1,250 9,370 A1
Sitting on the Bed 5,130 1,285 705 7,120 A2
Sitting on the Chair 1,255 310 1,205 2,770 A3
Lying Down on the Bed 12,870 3,220 1,540 17,630 A4
Lying Down on the Floor 9,810 2,455 3,040 15,305 A5
Empty Room 3,405 850 640 4,895 A6
Transition 4,305 1,075 790 6,170 A7
Total 43,270 10,820 9,170 63,260
TABLE III: Activity sample counts per subject in LOPO-CV
approach. ( A1) walking, ( A2) sitting on the bed, ( A3) sitting
on the chair, ( A4) lying down on the bed, ( A5) lying down on
the floor, ( A6) empty room, ( A7) transition. ( S1) subject one,
(S2) subject two and ( S3) subject three.
Label S1S2S3
A1 3020 3800 2583
A2 1800 2661 2684
A3 1205 1568 -
A4 5327 6142 6181
A5 5993 6252 3082
A6 1514 2030 1382
A7 1749 2170 2298To evaluate the models, we developed a normalization tech-
nique. Mean and standard deviation were computed from each
feature channel using the combined training and validation
data. These parameters normalized both the training-valid ation
subset and the test set, ensuring uniform scaling and improv ing
the learning process. All computations were performed on a
high-performance system featuring an AMD Ryzen 7 6800H
processor, 32GB of system memory, and an NVIDIA GeForce
RTX 3060 GPU running on Windows 11. For model de-
velopment, we used Python 3.8.19 , Scikit-learn 1.3.2 , and
TensorFlow 2.10.
For comprehensive models evaluation, we employed differ-
ent metrics for each validation approach. In the CSV strat-
egy, we assessed each model performance through accuracy,
precision, recall, and F1-score. For the LOPO-CV approach,
we calculated accuracy and F1-score during classification
of different numbers of activity of each subject, ultimatel y
computing the average performance across all three subject s.
B. Experimental Results
1) CSV: The performance comparison of different models
evaluated over multiple activity categories with combinat ion
of RD, RA, and RE (RD+RA+RE) feature maps as models’
input is presented in Table IV.
Table IV illustrates a consistent performance pattern acro ss
all activity sets: DL models (CNN, LSTM, and ConvLSTM)
significantly outperform traditional ML approaches (SVM an d
MLP). In particular, ConvLSTM consistently achieves the
highest performance across all activity sets, with peak acc u-
racies of 90.51% for 7-activity classification and 97.87% for
4-activity classification. This trend indicates that ConvL STM’s
ability to capture both spatial and temporal features provi des
a substantial advantage for HAR tasks using radar data.
The classification results of the ConvLSTM model with
different feature inputs over various numbers of activitie s are
summarized in Tables V–VIII. Additionally, Fig. 3 shows the
confusion matrices for different tasks, while Fig. 4 illust rates
the training and validation loss curves of the ConvLSTM
model across different activity sets using the RD+RA+RE
feature maps.
As shown in Tables V–VIII, classification performance
improves as the number of classes decreases. Specifically,
theF1-score increases from 87.31% for seven classes to
98.05% for four classes. The RE map consistently achieves the
best results across all tasks among single-feature inputs. For
pairwise combinations, except for four classes, RD+RE maps
outperform other combinations. Integrating all three feat ures
yields the highest performance, slightly surpassing RD+RE
maps. This highlights the significance of combining motion
(Doppler) and spatial dimensions (azimuth and elevation) t o
comprehensively represent activities, thereby enhancing clas-
sification accuracy.
The training process, shown in Fig. 4, demonstrates a steady
decrease in both training and validation loss, reaching sui tably
low levels. This reflects effective model learning, the abse nce
of overfitting, and strong generalization capability.
5
Page 6:
(a) 7 activities.
(b) 6 activities.
(c) 5 activities.
(d) 4 activities.
Fig. 3: Confusion matrices for the ConvLSTM model on various activity sets under CSV (RD+RA+RE inputs).
(a) 7 activities.
(b) 6 activities.
(c) 5 activities.
(d) 4 activities.
Fig. 4: Training and validation loss curves for the ConvLSTM model under CSV (RD+RA+RE inputs).
TABLE IV: Performance metrics for various models under
CSV with RD+RA+RE inputs.
#Activities ModelMetrics (%)
Accuracy Precision Recall F1-score
7SVM 70.97 72.20 74.17 70.41
MLP 68.71 66.35 68.76 65.85
CNN 89.48 86.66 88.81 87.17
LSTM 88.50 85.09 87.35 85.64
ConvLSTM 90.51 86.75 88.51 87.31
6SVM 76.34 80.11 82.02 77.75
MLP 75.40 76.30 78.77 75.58
CNN 92.60 91.59 93.67 91.83
LSTM 94.03 92.87 95.19 93.55
ConvLSTM 95.29 94.08 96.02 94.73
5SVM 76.63 83.76 81.19 79.86
MLP 78.25 82.23 83.22 80.99
CNN 92.57 94.09 94.40 93.79
LSTM 93.99 95.08 95.08 94.79
ConvLSTM 96.06 96.86 96.15 96.39
4SVM 88.80 92.16 88.53 89.08
MLP 90.28 92.26 90.33 90.43
CNN 96.28 97.12 96.17 96.49
LSTM 96.91 97.61 96.79 97.11
ConvLSTM 97.87 98.39 97.80 98.05TABLE V: Comparison of ConvLSTM model performance for
7 activity classification under the CSV approach.
Metrics (%)Model Input
RD RA RE RD+RA RD+RE RA+RE RD+RA+RE
Accuracy 76.94 71.76 89.53 86.26 90.40 87.51 90.51
Precision 80.12 74.32 86.02 84.96 87.05 84.47 86.75
Recall 79.98 77.18 87.33 87.16 88.95 86.42 88.51
F1-score 77.66 72.95 86.12 85.51 87.70 84.59 87.31
TABLE VI: Comparison of ConvLSTM model performance
for 6 activity classification under the CSV approach.
Metrics (%)Model Input
RD RA RE RD+RA RD+RE RA+RE RD+RA+RE
Accuracy 82.70 78.22 94.09 92.60 95.58 91.23 95.29
Precision 85.53 83.60 92.79 92.69 94.43 90.66 94.08
Recall 87.80 86.96 95.14 94.74 96.43 93.67 96.02
F1-score 85.00 82.41 93.50 93.28 95.20 91.35 94.73
TABLE VII: Comparison of ConvLSTM model performance
for 5 activity classification under the CSV approach.
Metrics (%)Model Input
RD RA RE RD+RA RD+RE RA+RE RD+RA+RE
Accuracy 78.42 79.59 95.93 88.37 95.48 93.28 96.06
Precision 83.79 86.31 96.53 91.26 95.98 94.30 96.86
Recall 82.43 87.10 96.11 92.28 95.73 95.11 96.15
F1-score 80.79 84.26 96.24 90.71 95.67 94.43 96.39
2) LOPO-CV: We report LOPO-CV results for ConvL-
STM, which outperformed other models in the CSV approach.
RD+RA+RE feature maps were used, as they generally yielded
the best performance as well. Table IX shows ConvLSTM’s
performance in classifying activities using RD+RA+RE fea-
6
Page 7:
TABLE VIII: Comparison of ConvLSTM model performance
for 4 activity classification under the CSV approach.
Metrics (%)Model Input
RD RA RE RD+RA RD+RE RA+RE RD+RA+RE
Accuracy 86.49 95.43 96.81 96.49 97.45 97.66 97.87
Precision 87.28 95.94 97.43 97.33 97.92 98.22 98.39
Recall 86.89 95.53 96.67 96.30 97.37 97.49 97.80
F1-score 85.76 95.56 96.96 96.68 97.58 97.80 98.05
ture maps. Accuracy and F1-score improve as the number of
activities decreases, rising from 89.56% (87.15%) for 7 activ-
ities to93% (91.12%) for 4 activities. These results confirm
the model’s subject-independent robustness, generalizab ility,
and the effectiveness of RD+RA+RE maps. Compared to the
CSV approach, this method evaluates the model with a larger
number of test samples.
TABLE IX: Comparison of ConvLSTM model performance
presented as percentages using the LOPO-CV approach.
Test Subject7 Activity 6 Activity 5 Activity 4 Activity
Accuracy F1-score Accuracy F1-score Accuracy F1-score Accuracy F1-score
Subject 1 89.08 85.49 92.19 90.79 92.35 91.15 92.55 91.03
Subject 2 93.14 91.23 96.90 96.25 97.96 97.89 97.95 97.49
Subject 3 86.45 84.72 94.35 93.88 94.51 93.41 88.52 84.84
Average 89.56 87.15 94.48 93.64 94.94 94.15 93.00 91.12
V. D ISCUSSION
The results of this study demonstrate the effectiveness of
using FMCW radar for HAR through our proposed framework.
In this framework, we utilized RD, RA, and RE feature maps
across ML and DL models. The performance of the system is
validated through two distinct strategies: CSV and LOPO-CV .
The ConvLSTM model outperformed other models, achiev-
ing an accuracy of 90.51% and anF1-score of87.31% on CSV ,
and an accuracy of 89.56% and anF1-score of 87.15% on
LOPO-CV . ConvLSTM combines CNN and LSTM strengths
by integrating convolution into LSTM gates, capturing spa-
tiotemporal dependencies from RD, RA, and RE data. It
excels in distinguishing similar activities where both spa tial
and motion cues are crucial, such as “sitting on a bed”
versus “sitting on a chair.” The steady decrease in training
and validation loss, as illustrated in Fig. 4, further confir ms
the model’s generalization ability and resistance to overfi tting.
These findings highlight the potential of the proposed appro ach
for scalable, non-intrusive, and privacy-preserving acti vity
monitoring in real-world scenarios.
One of the main contributions of this study is the introduc-
tion of a novel approach that directly feeds multi-dimensio nal
radar feature maps into the model. Unlike conventional ap-
proaches that process feature maps as images, our method
preserves the spatial and temporal structures of the data by
treating RD, RA, and RE feature maps as data vectors. This
approach successfully develops a robust HAR model using
data from only three subjects, making it particularly suita ble
for real-world applications where large-scale data collec tion is
often infeasible.Our findings advance the current state of knowledge in
the field by demonstrating that the proposed solution can
achieve strong performance even when trained on a limited
dataset. This is especially valuable for applications in he alth-
care monitoring and smart environments, where non-intrusi ve
activity monitoring is essential. The practical implicati ons of
our work include the potential for real-time activity recog nition
and monitoring in various domains, such as elderly care,
rehabilitation, and smart home automation.
VI. C ONCLUSION
In this paper, we proposed an FMCW radar-based frame-
work for HAR in home-like environments. The study demon-
strated the feasibility and advantages of integrating mult i-
dimensional feature maps as data vectors to effectively cap ture
the temporal and spatial characteristics of human activiti es. By
leveraging the advanced capabilities of FMCW radar systems
in conjunction with DL models, particularly the ConvLSTM
architecture, we successfully extracted spatiotemporal p atterns
from radar feature maps, thereby surpassing traditional ML
and DL approaches in performance.
Our results emphasize the benefits of using multiple feature
maps for improved recognition accuracy, with RE proving
the most effective, highlighting the role of vertical spati al
information in precise activity recognition. Furthermore , the
ConvLSTM model effectively captured both motion and spa-
tial dependencies, demonstrating robust performance acro ss
diverse and challenging scenarios and when tested on unseen
data. The proposed solution demonstrated strong performan ce
even when trained on a limited dataset, making it suitable
for real-world applications in domains such as healthcare a nd
smart homes that require non-intrusive activity monitorin g.
This advantage is particularly valuable when comprehensiv e
data collection may be impractical due to domain-specific
constraints.
Despite promising results, the proposed approach has cer-
tain limitations. Therefore, future research should focus on
addressing these limitations such as the limited dataset di -
versity and a restricted range of activities. Additionally , the
computational cost of these models has not been calculated,
though it is a crucial metric for large-scale implementatio n,
real-time applications, and deployment on edge devices. Ad -
dressing these limitations is essential to enhance the robu stness
and applicability of the proposed framework in real-world
scenarios.
In conclusion, this study presents a novel FMCW radar-
based framework for HAR that leverages multi-dimensional
feature maps and advanced DL models. The proposed ap-
proach demonstrates strong performance and practical appl i-
cability, paving the way for future research and developmen t
in the field of non-intrusive activity monitoring.
ACKNOWLEDGMENTS
The authors would like to thank all participants who par-
ticipated in this study. The authors also thank M¨ alardalen
University, the Schlegel-UW Research Institute for Aging,
Gold Sentintel Inc, NSERC, and MITACS for supporting this
study.
7
Page 8:
REFERENCES
[1] H. Li, A. Shrestha, H. Heidari, J. Le Kernec, and F. Fioran elli, “Bi-lstm
network for multimodal continuous human activity recognit ion and fall
detection,” IEEE Sensors Journal , vol. 20, no. 3, pp. 1191–1201, 2020.
[2] S. Hu, S. Cao, N. Toosizadeh, J. Barton, M. G. Hector, and M . J.
Fain, “Radar-based fall detection: A survey [survey],” IEEE Robotics
& Automation Magazine , vol. 31, no. 3, pp. 170–185, 2024.
[3] I. Ullmann, R. G. Guendel, N. C. Kruse, F. Fioranelli, and A. Yarovoy,
“A survey on radar-based continuous human activity recogni tion,” IEEE
Journal of Microwaves , vol. 3, pp. 938–950, 7 2023.
[4] S. Ahmed and S. H. Cho, “Machine learning for healthcare r adars:
Recent progresses in human vital sign measurement and activ ity recog-
nition,” IEEE Communications Surveys & Tutorials , vol. 26, no. 1, pp.
461–495, 2024.
[5] M. Walsh, J. Barton, B. O’Flynn, C. O’Mathuna, A. Hickey, and
J. Kellett, “On the relationship between cummulative movem ent, clinical
scores and clinical outcomes,” in SENSORS, 2012 IEEE . IEEE, 2012,
pp. 1–4.
[6] S. Zolfaghari, S. Suravee, D. Riboni, and K. Yordanova, “ Sensor-based
locomotion data mining for supporting the diagnosis of neur odegenera-
tive disorders: A survey,” ACM Computing Surveys , vol. 56, 8 2023.
[7] Y . Zhao, H. Zhou, S. Lu, Y . Liu, X. An, and Q. Liu, “Human act ivity
recognition based on non-contact radar data and improved pc a method,”
Applied Sciences (Switzerland) , vol. 12, 7 2022.
[8] M. Li, P. Li, S. Tian, K. Tang, and X. Chen, “Estimation of t emporal
gait parameters using a human body electrostatic sensing-b ased method,”
Sensors (Switzerland) , vol. 18, 6 2018.
[9] A. Rezaei, M. C. Stevens, A. Argha, A. Mascheroni, A. Puia tti, and
N. H. Lovell, “An unobtrusive human activity recognition sy stem using
low resolution thermal sensors, machine and deep learning, ”IEEE
Transactions on Biomedical Engineering , vol. 70, no. 1, pp. 115–124,
2023.
[10] S. Zhang, Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li, “A re view
on human activity recognition using vision-based method,” Journal of
Healthcare Engineering , vol. 2017, no. 1, p. 3090343, 2017.
[11] Y . Ma, G. Zhou, and S. Wang, “Wifi sensing with channel sta te
information: A survey,” ACM Comput. Surv. , vol. 52, no. 3, jun 2019.
[Online]. Available: https://doi.org/10.1145/3310194
[12] F. J. Abdu, Y . Zhang, and Z. Deng, “Activity classificati on based on
feature fusion of fmcw radar human motion micro-doppler sig natures,”
IEEE Sensors Journal , vol. 22, no. 9, pp. 8648–8662, 2022.
[13] A. K. Seifert, M. Grimmer, and A. M. Zoubir, “Doppler rad ar for the
extraction of biomechanical parameters in gait analysis,” IEEE Journal
of Biomedical and Health Informatics , vol. 25, pp. 547–558, 2 2021.
[14] P. Miazek, A. ˙Zmudzi´ nska, P. Karczmarek, and A. Kiersztyn, “Human
behavior analysis using radar data. a survey,” IEEE Access , 2024.
[15] K. Papadopoulos and M. Jelali, “A comparative study on r ecent progress
of machine learning-based human activity recognition with radar,”
Applied Sciences , vol. 13, no. 23, 2023.
[16] A. Pearce, J. A. Zhang, R. Xu, and K. Wu, “Multi-object tr acking with
mmwave radar: A review,” Electronics , vol. 12, no. 2, 2023. [Online].
Available: https://www.mdpi.com/2079-9292/12/2/308
[17] H. Abedi, A. Ansariyan, P. P. Morita, A. Wong, J. Boger, a nd G. Shaker,
“Ai-powered noncontact in-home gait monitoring and activi ty recogni-
tion system based on mm-wave fmcw radar and cloud computing, ”IEEE
Internet of Things Journal , vol. 10, no. 11, p. 9465–9481, Jun. 2023.
[18] F. Fioranelli, S. A. Shah, H. Li, A. Shrestha, S. Yang, an d J. Le Kernec.
(2019) Radar signatures of human activities.
[19] W.-Y . Kim and D.-H. Seo, “Radar-based human activity re cog-
nition combining range–time–doppler maps and range-distr ibuted-
convolutional neural networks,” IEEE Transactions on Geoscience and
Remote Sensing , vol. 60, pp. 1–11, 2022.
[20] W.-L. Hsu, J.-X. Liu, C.-C. Yang, and J.-S. Leu, “A fall d etection system
based on fmcw radar range-doppler image and bi-lstm deep lea rning,”
IEEE Sensors Journal , vol. 23, no. 18, pp. 22 031–22 039, 2023.
[21] C. Ding, L. Zhang, H. Chen, H. Hong, X. Zhu, and C. Li, “Hum an mo-
tion recognition with spatial-temporal-convlstm network using dynamic
range-doppler frames based on portable fmcw radar,” IEEE Transactions
on Microwave Theory and Techniques , vol. 70, no. 11, pp. 5029–5038,
2022.
[22] G. Bhavanasi, L. Werthen-Brabants, T. Dhaene, and I. Co uckuyt, “Pa-
tient activity recognition using radar sensors and machine learning,”
Neural Computing and Applications , vol. 34, no. 18, pp. 16 033–16 048,
2022.[23] C. Ding, L. Zhang, H. Chen, H. Hong, X. Zhu, and F. Fiorane lli,
“Sparsity-based human activity recognition with pointnet using a
portable fmcw radar,” IEEE Internet of Things Journal , vol. 10, no. 11,
pp. 10 024–10 037, 2023.
[24] I. T. AG, “Bgt60tr13c: Xensivtm 60ghz radar
sensor for advanced sensing,” Infineon Technologies,
accessed on 29th January 2024. [Online]. Available:
https://www.infineon.com/cms/en/product/sensor/radar -sensors/radar-sensors-for-iot/60ghz
[25] “Bgt60tr13c - xensiv™ 60ghz radar sensor for advanced s ensing,”
https://www.infineon.com/cms/en/product/sensor/radar -sensors/radar-sensors-for-iot/60ghz
2023.
[26] Z. Peng and C. Li, “Portable microwave radar systems for short-range
localization and life tracking: A review,” 3 2019.
[27] X. WANG, K. LEI, X. Yang, M. Li, and X. Wang, “Harmonic ana lysis
based on blackman-harris self-multiplication window,” in 2020 5th Asia
Conference on Power and Electrical Engineering (ACPEE) , 2020, pp.
2165–2169.
[28] X. Zeng, H. S. L. B´ aruson, and A. Sundvall, “Walking ste p monitoring
with a millimeter-wave radar in real-life environment for d isease and
fall prevention for the elderly,” Sensors , vol. 22, no. 24, p. 9901, 2022.
[29] C. Lovescu and S. Rao, “The fundamentals of millimeter w ave radar
sensors,” Texas Instruments, Julio , 2020.
[30] H. Abedi, C. Magnier, V . Mazumdar, and G. Shaker, “Impro ving pas-
senger safety in cars using novel radar signal processing,” Engineering
Reports , vol. 3, 12 2021.
[31] M. Ash, M. Ritchie, and K. Chetty, “On the application of digital
moving target indication techniques to short-range fmcw ra dar data,”
IEEE Sensors Journal , vol. 18, no. 10, pp. 4167–4175, 2018.
[32] C. Will, P. Vaishnav, A. Chakraborty, and A. Santra, “Hu man target
detection, tracking, and classification using 24-ghz fmcw r adar,” IEEE
Sensors Journal , vol. 19, no. 17, pp. 7283–7299, 2019.
[33] J. Capon, “High-resolution frequency-wavenumber spe ctrum analysis,”
Proceedings of the IEEE , vol. 57, no. 8, pp. 1408–1418, 1969.
[34] M. Alizadeh, H. Abedi, and G. Shaker, “Low-cost low-pow er in-
vehicle occupant detection with mm-wave fmcw radar,” in 2019 IEEE
SENSORS , IEEE. Montreal, QC, Canada: IEEE, Oct. 27–30 2019, pp.
1–4.
[35] H. Abedi, S. Luo, V . Mazumdar, M. M. Y . R. Riad, and G. Shak er,
“Ai-powered in-vehicle passenger monitoring using low-co st mm-wave
radar,” IEEE Access , vol. 10, pp. 18 998–19 012, 2021.
[36] C. Cortes and V . Vapnik, “Support-vector networks,” Machine learning ,
vol. 20, pp. 273–297, 1995.
[37] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learn ing repre-
sentations by back-propagating errors,” nature , vol. 323, no. 6088, pp.
533–536, 1986.
[38] X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.- c.
Woo, “Convolutional lstm network: A machine learning appro ach for
precipitation nowcasting,” Advances in neural information processing
systems , vol. 28, 2015.
8