Paper Content:
Page 1:
MPTSNet: Integrating Multiscale Periodic Local Patterns and Global
Dependencies for Multivariate Time Series Classification
Yang Mu1, Muhammad Shahzad1, Xiao Xiang Zhu1, 2*
1Technical University of Munich, Germany
2Munich Center for Machine Learning (MCML), Germany
{yang.mu, muhammad.shahzad, xiaoxiang.zhu }@tum.de
Abstract
Multivariate Time Series Classification (MTSC) is crucial in
extensive practical applications, such as environmental mon-
itoring, medical EEG analysis, and action recognition. Real-
world time series datasets typically exhibit complex dynam-
ics. To capture this complexity, RNN-based, CNN-based,
Transformer-based, and hybrid models have been proposed.
Unfortunately, current deep learning-based methods often
neglect the simultaneous construction of local features and
global dependencies at different time scales, lacking suffi-
cient feature extraction capabilities to achieve satisfactory
classification accuracy. To address these challenges, we pro-
pose a novel Multiscale Periodic Time Series Network (MPT-
SNet), which integrates multiscale local patterns and global
correlations to fully exploit the inherent information in time
series. Recognizing the multi-periodicity and complex vari-
able correlations in time series, we use the Fourier trans-
form to extract primary periods, enabling us to decompose
data into multiscale periodic segments. Leveraging the in-
herent strengths of CNN and attention mechanism, we intro-
duce the PeriodicBlock, which adaptively captures local pat-
terns and global dependencies while offering enhanced inter-
pretability through attention integration across different pe-
riodic scales. The experiments on UEA benchmark datasets
demonstrate that the proposed MPTSNet outperforms 21 ex-
isting advanced baselines in the MTSC tasks.
Code — https://github.com/MUYang99/MPTSNet
Datasets — https://timeseriesclassification.com/dataset.php
Introduction
Multivariate time series classification (MTSC) has gar-
nered significant attention due to its widespread applications
across various domains. This classification technique plays a
crucial role in analyzing complex, multi-dimensional tempo-
ral data, offering insights that are invaluable in fields ranging
from healthcare (Wang et al. 2022; An et al. 2023), human
activity recognition (Yang et al. 2015; Li et al. 2023a), traf-
fic (Zhao et al. 2024) to environmental monitoring (Ienco
and Gaetano 2007; Rußwurm et al. 2023) and industrial pro-
cesses (Farahani et al. 2023).
*Corresponding author
Copyright © 2025, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Figure 1: Varying correlations between the subsequences of
two variables across different periodic scales in MTS data.
The challenges in solving MTSC tasks stem from two pri-
mary factors. Firstly, time series data inherently possesses
periodic characteristics, presenting similar trends across dif-
ferent periods, which leads to data redundancy. Simulta-
neously, hidden information overlaps and interacts across
multiple periods (Wu et al. 2023), complicating the ex-
ploration of time series data. Secondly, time series subse-
quences among different variables exhibit varying degrees
of correlation or even opposite correlations at different peri-
odic scales (Cai et al. 2024) in MTS data, as shown in Fig. 1.
Thus, the extraction of local intra-period features and the
capture of global inter-period dependencies at multiple peri-
odic scales become paramount for effective MTSC.
Traditional approaches to time series classification, such
as bag-of-patterns (Baydogan, Runger, and Tuv 2013) or
shapelet-based methods (Lines et al. 2012), typically re-
quire a preprocessing step to transform time series into a
wide range of subsequences or patterns as candidate fea-
tures. Subsequently, discriminative subsequences are se-
lected from these candidates for classification purposes.
However, this approach often results in an expansive feature
space, which not only complicates the feature selection pro-
cess but may also lead to decreased accuracy (Sch ¨afer and
Leser 2017), particularly in multivariate settings where the
complexity of data is inherently higher.
Recently, deep neural networks have demonstrated re-arXiv:2503.05582v1 [cs.LG] 7 Mar 2025
Page 2:
Figure 2: Multiscale periodic feature analysis of time se-
ries data using FFT. By decomposing time series into multi-
ple frequency components, it reveals both local patterns and
global dependencies across different periodic scales.
markable potential in various time series tasks, outperform-
ing traditional methods in many aspects. RNN-based meth-
ods (Karim et al. 2017; Yu, Kim, and Mechefske 2021) uti-
lize loops within their architecture to maintain and propagate
information across time steps, but limit the ability to capture
long-term dependencies due to the vanishing gradient prob-
lem. CNN-based methods (Cui, Chen, and Chen 2016; Zhao
et al. 2017) excel at learning spatial hierarchies of features
through convolutional filters, the inductive bias allows them
to capture local patterns effectively, but they are less adept
at modeling global features comprehensively. Transformer-
based methods (Wen et al. 2022; Zuo et al. 2023) leverage
self-attention mechanisms to model dependencies between
different positions in the input sequence, adeptly handling
long-range dependencies. However, they struggle to effec-
tively extract features from adjacent time points that exhibit
localized pattern characteristics. While these models have
shown promise, they often fall short in simultaneously con-
structing local features and global dependencies within time
series data across different periodic scales, resulting in sub-
optimal feature extraction.
To address these limitations, we propose a general frame-
work called Multiscale Periodic Time Series Network
(MPTSNet). The core concept, illustrated in Fig. 2, involves
transforming time-domain data into frequency-domain rep-
resentations using the Fast Fourier Transform (FFT). This
approach decomposes complex time series into multiple fre-
quency components, revealing both local intra-period pat-
terns and global inter-period dependencies across various
periodic scales. The architecture of MPTSNet is supported
by the PeriodicBlock, which decomposes time series into
multiscale periodic segments. These segments are then pro-cessed by two key components sequentially: an Inception-
based Local Extractor, which captures corresponding local
features, and an Attention-based Global Capturer, which ex-
tracts global dependencies. This modular architecture en-
ables MPTSNet to effectively analyze time series data at
multiple periodic scales. Our contributions can be summa-
rized in three key aspects:
• We propose an end-to-end general framework MPTSNet
that addresses the multi-periodicity of time series and
complex correlations among variable subsequences at
different time scales. This novel architecture is capable
of extracting multiscale periodic features effectively.
• Inspired by the strengths of CNN in capturing local fea-
tures and attention mechanism in modeling global de-
pendencies, we design a PeriodicBlock comprising an
Inception-based Local Extractor and an Attention-based
Global Capturer to extract both local features and global
dependencies from time series data.
• Our extensive experiments on real-world datasets
demonstrate that the proposed MPTSNet outperforms
existing advanced baselines in MTSC tasks. Further-
more, MPTSNet offers enhanced interpretability through
its ability to localize temporal patterns with integrated
global attention across different periodic scales.
Related Work
Multivariate Time Series Classification
Recent years have witnessed the emergence of various
deep learning models for MTSC. These can be broadly
categorized into three main groups: 1) CNN-based mod-
els. The Multichannel Deep Convolutional Neural Network
(MCDCNN) (Zheng et al. 2014) pioneered the applica-
tion of CNNs to MTSC by capturing intra-variable fea-
tures through one-dimensional convolutions and combin-
ing them with fully connected layers. More recently, OS-
CNN (Tang et al. 2022) has been proposed, featuring an in-
novative Omni-Scale block that uses a set of prime num-
bers as convolution kernel sizes to effectively cover recep-
tive fields across all scales, optimizing performance across
different datasets. 2) RNN/CNN Hybrid models. Models
such as LSTM-FCN (Karim et al. 2017) and MLSTM-
FCN (Karim et al. 2019) have been introduced, combin-
ing LSTM layers to capture short- and long-term depen-
dencies with stacked CNN layers to extract features from
the time series. 3) Transformer-based models. These models
have gained prominence in General Time Series Analysis ,
addressing multiple tasks including classification, imputa-
tion, short-term and long-term forecasting, and anomaly de-
tection (Chen et al. 2024). Notable examples include FED-
former (Zhou et al. 2022), Flowformer (Wu et al. 2022),
PatchTST (Nie et al. 2023), and Crossformer (Zhang and
Yan 2023), which have been developed and refined in re-
cent years, leveraging their excellent scaling behaviors. Ad-
ditionally, MLP-based (Zhang et al. 2022; Li et al. 2023b),
GNN-based (Liu et al. 2024) and MIL-based (Chen et al.
2024) models have also achieved impressive performance.
Since MIL-based models employ a binary paradigm for
Page 3:
multi-class problems, unlike other approaches, they are ex-
cluded from quantitative comparisons.
However, a key assumption in many of these models is
that the correlations between different variables remain con-
stant across various time resolutions. This assumption often
leads to inadequate representation of pairwise relationships
between variables at different periodic scales.
Local and Global Modeling for Sequence
Modeling local patterns and global dependencies has proven
effective for correlation learning and feature extraction in
sequence modeling (Wang et al. 2023). In speech recogni-
tion, the SOTA model Interformer (Lai et al. 2023) employs
convolution to extract local information and self-attention to
capture long-term dependencies in parallel branches. How-
ever, this parallel processing of local and global features
might introduce redundancy, potentially reducing efficiency.
In general time series analysis, TimesNet (Wu et al. 2023)
transforms 1D time series into various 2D tensors using
CNN-based layers to capture both local and global vari-
ations. Directly capturing local and global features with
CNNs may lead to a loss in precision. Additionally, while
TimesNet provides visualizations of the 2D transformations,
interpreting these in the context of the original 1D time
series can be challenging. In time series forecasting, MS-
GNet (Cai et al. 2024) employs a graph convolution module
for inter-series correlation learning and a multi-head atten-
tion module for intra-series correlation learning. However,
it introduces increased complexity and computational cost
due to its adaptive graph convolution module; MICN (Wang
et al. 2023) combines local feature extraction and global cor-
relation modeling using a convolution-based structure; how-
ever, its performance can be highly sensitive to the choice
of hyperparameters, such as scale sizes in the multi-scale
convolution, requiring meticulous tuning for optimal results.
Moreover, the use of full CNN layers in MICN sacrifices the
interpretability.
To address these existing limitations, we propose MPT-
SNet, which sequentially applies convolutional and atten-
tion modules to capture local and global features. By requir-
ing only the number of extraction scales, other parameters
are adaptively configured based on time series characteris-
tics, reducing sensitivity to parameter tuning and avoiding
redundant computations. Moreover, this approach enhances
the interpretability of MTSC by integrating multi-scale at-
tention maps.
Methodology
Problem Formulation
MTSC involves the analysis of a set of time series data,
where each series is composed of multiple variables ob-
served over time. A MTS data is formally defined as X=
{x1, x2, . . . , x d} ∈Rd×l, where dis the number of vari-
ables and lis the length of the series. In the context of
MTSC, the goal is to classify these time series into prede-
fined categories. Given a dataset X={X1, X2, . . . , X m} ∈
Rm×d×lwith corresponding labels η={y1, y2, . . . , y m},
where mdenotes the number of time series, the task is totrain a classifier f(·)capable of predicting the label yfor
new, unlabeled MTS data.
Model Architecture Overview
The overview of our proposed MPTSNet is shown in Fig. 3,
which is designed to extract local patterns and global de-
pendencies at various periodic scales. Initially, the model
identifies the main periods from the entire training dataset
using FFT. The core component, PeriodicBlock, processes
time series data in parallel. It begins by segmenting batch
data into multiple scales based on the identified main pe-
riods. Subsequently, the Local Extractor and Global Cap-
turer are employed within the PeriodicBlock to extract intra-
period local and inter-period global features sequentially, as
depicted in the right panel of Fig. 3. To account for poten-
tial discrepancies in frequency-amplitude relationships be-
tween batch data and the overall training set, FFT is also
applied to each batch. This process queries the amplitudes
corresponding to the main periods, which are then used as
weights calculated by Softmax to aggregate multiscale pe-
riodic features. The PeriodicBlock outputs a weighted sum
of features extracted from different periodic scales. Residual
connections between stacked PeriodicBlocks enhance infor-
mation flow. Finally, the extracted features are fed into the
Classification head to generate predicted labels.
Main Periods Identification
Grounded in the inherent characteristics of time series data,
we aim to enhance MTSC accuracy by exploiting two fun-
damental properties: multi-periodicity and complex corre-
lations among variables across different periodic scales, as
mentioned before. The identification of main periods is a
critical step in our methodology, as it serves as the founda-
tion for subsequent multiscale analysis.
Inspired by the work (Wu et al. 2023), we employ the FFT
to identify the principal periods. This technique effectively
converts time-domain data into the frequency domain, al-
lowing for a comprehensive analysis of the underlying peri-
odic structures. The process can be formalized as follows:
AMP =Avg(Amplitudes (FFT(X∈Rd×l))), (1)
where X∈Rd×lrepresents the input time series, dde-
notes the number of variables, and lis the sequence length.
The function FFT (·)computes the Fast Fourier Transform,
while Amplitudes (·)calculates the amplitude values. Avg (·)
averages the amplitudes across all variables, resulting in
AMP ∈Rl, which represents the mean amplitude for each
frequency.
f1, . . . , f k= arg Topk
f∗∈{1,...,⌊T
2⌋}(AMP ), (2)
pi=l/fi, i∈ {1, . . . , k }, (3)
To focus on the most significant periodic components, we
select the top kamplitudes using the arg Topk function.
This operation yields the frequencies f1, . . . , f k, which cor-
respond to the kmost prominent periods p1, . . . , p k. The pa-
rameter kis a hyperparameter that determines the number of
periodic scales to consider in subsequent analyses.
Page 4:
Figure 3: Overview of the Multiscale Periodic Time Series Network (MPTSNet). The model processes multivariate time series
data through FFT to identify the main periods. The data is then segmented into multiscale periodic components, where Local
Extractor and Global Capturer in PeriodicBlock extract features sequentially at different scales. The features are aggregated by
corresponding amplitudes and passed through the Classification head for final prediction.
The identification of these main periods establishes the
groundwork for multiscale analysis. This approach allows
us to decompose the original time series into multiple peri-
odic components, and such decomposition is crucial for un-
derstanding the complex temporal dynamics present in MTS
data.
The extracted periods serve as the basis for reshaping the
input data into multiple representations, each correspond-
ing to a different time scale. This transformation can be ex-
pressed as:
X′i=Reshapepi,fi(Pad(X)), i∈ {1, . . . , k }. (4)
where Pad (·)extends the time series with zeros to en-
sure compatibility with the reshaping operation, and X′i∈
Rd×pi×firepresents the i-th reshaped time series based on
thei-th periodic scale.
PeriodicBlock
The PeriodicBlock serves as the core component of our
MPTSNet architecture, designed to extract both local and
global features from the reshaped time series data at differ-
ent periodic scales. Prior to processing by the PeriodicBlock,
each reshaped tensor X′i∈Rd×pi×fiis embedded into a
higher-dimensional space:
E′i=Embedding (X′i), i∈ {1, . . . , k }. (5)
where E′i∈Rdembed×pi×fianddembed is the embedding
dimension.
Local Extractor To capture specific and comprehensive
intra-period local patterns, we propose a multiscale adaptive
convolution module inspired by the Inception architecture.
For each of the fiembeddings Ei∈Rdembed×pi, this mod-
ule comprises parallel 1D convolutional layers with varying
kernel sizes:Li,ker
j =Conv1D ker(Ei), ker ∈ {1,3,5,7,9,11},(6)
where Conv1D kerrepresents a 1D convolutional layer
with kernel size ker. The outputs of these convolutions are
concatenated along the channel dimension:
Li
j=Mean (Li,ker
j), j∈ {1, . . . , f i}. (7)
This multi-kernel size Inception-based approach allows
the model to capture local patterns at various granularities,
enhancing its ability to detect relevant features.
Global Capturer To model inter-period dependencies and
adapt to different time steps for multiscale periods, we de-
signed a Global Capturer based on the multi-head attention
mechanism. The Global Capturer transforms each local fea-
tureLiinto query, key, and value representations:
Qi=WQLi, Ki=WKLi, Vi=WVLi, (8)
where WQ,WK, and WVare learnable weight matrices.
The multi-head attention is then computed as:
Attention (Qi, Ki, Vi) =SoftmaxQi(Ki)T
√dk
Vi,(9)
Gi=Concat (head 1, . . . , headh)WO. (10)
where his the number of attention heads, dkis the dimen-
sion of each head, and WOis a learnable output projection
matrix. This mechanism allows the model to capture com-
plex global dependencies across all time series segments.
Adaptive Aggregation and Classification To fuse the k
multiscale periodic features, we employ an adaptive aggre-
gation using weights calculated from the queried ampli-
tudes:
Page 5:
Data/ModelLSTNet LSSL FEDf. Flowf. SCINet Dlinear PatchTST MICN TimesNet Crossf. M.TCNOursSIG. ’18 ICLR’22 ICLR’22 ICLR’22 NIPS’22 AAAI’23 ICLR’23 ICLR’23 ICLR’23 ICLR’23 ICLR’24
EthanolConcentration 39.9 31.1 31.2 33.8 34.4 36.2 32.8 35.3 35.7 38.0 36.3 43.3
FaceDetection 65.7 66.7 66.0 67.6 68.9 68.0 68.3 65.2 68.6 68.7 70.8 69.8
Handwriting 25.8 24.6 28.0 33.8 23.6 27.0 29.6 25.5 32.1 28.8 30.6 34.4
Heartbeat 77.1 72.7 73.7 77.6 77.5 75.1 74.9 74.7 78.0 77.6 77.2 75.6
JapaneseV owels 98.1 98.4 98.4 98.9 96.0 96.2 97.5 94.6 98.4 99.1 98.8 98.6
PEMS-SF 86.7 86.1 80.9 86.0 83.8 75.1 89.3 85.5 89.6 85.9 89.1 94.2
SelfRegulationSCP1 84.0 90.8 88.7 92.5 92.5 87.3 90.7 86.0 91.8 92.1 93.4 92.8
SelfRegulationSCP2 52.8 52.2 54.4 56.1 57.2 50.5 57.8 53.6 57.2 58.3 60.3 57.2
SpokenArabicDigits 100 100 100 98.8 98.1 81.4 98.3 97.1 99.0 97.9 98.7 99.5
UWaveGestureLibrary 87.8 85.9 85.3 86.6 85.1 82.1 85.8 82.8 85.3 85.3 86.7 88.1
Ours 1-to-1-Wins 3 4 9 3 3 8 1 10 5 7 5 -
Ours 1-to-1-Draws 1 2 0 1 0 0 0 0 1 0 0 -
Ours 1-to-1-Losses 6 4 1 6 7 2 9 0 4 3 5 -
Avg. accuracy ( ↑) 71.8 70.9 70.7 73.2 71.7 67.9 72.5 70.0 73.6 73.2 74.2 75.4
Avg. rank ( ↓) 6 7.5 7.6 4.6 6.7 8.6 6.1 9.2 4.1 4.5 3 2.4
Table 1: Setting 1 experimental results. Performance comparison with the recent advanced General Time Series Analysis frame-
works on 10 UEA datasets. The best results are highlighted in bold, while the second best are underlined .
αi=Softmax (AMP i), i∈ {1, . . . , k }, (11)
whereAMP irepresents the amplitude corresponding to
thei-th selected frequency. The aggregated feature represen-
tation is then computed as:
Z=kX
i=1αiGi. (12)
This adaptive weighting scheme ensures that the model
emphasizes the most relevant periodic components in the fi-
nal representation.
The aggregated features Zare then passed through the
Classification head, typically consisting of fully connected
layers followed by a softmax activation:
y=Softmax (WcZ+bc), (13)
where Wcandbcare the weight matrix and bias vector of
the classification layer, respectively, and yis the predicted
class probability distribution.
By combining the Local Extractor for intra-period local
pattern extraction, the Global Capturer for modeling inter-
period global dependencies, and the adaptive aggregation
mechanism, the PeriodicBlock effectively captures the com-
plex temporal dynamics present in multivariate time series
data across multiple periodic scales.
Experiments
Datasets
The public UEA benchmark datasets (Bagnall et al. 2018),
collected from various real-world applications, constitute a
comprehensive archive for multivariate time series classi-
fication across several domains, including human activity,
speech, medical EEG, and audio data, etc. We utilize the
UEA benchmark datasets to evaluate the performance of the
proposed MPTSNet. These datasets vary in length, dimen-
sions, and the size of their training/testing sets.Baselines
Setting 1 Experiments Based on the experimental results
of papers (Luo and Wang 2024; Wu et al. 2023), this set of
experiments is conducted on 10 UEA datasets and compared
with several recent advanced General Time Series Analy-
sisframeworks, which address tasks including classification,
imputation, forecasting, and anomaly detection. The com-
parison includes ModernTCN (Luo and Wang 2024), Cross-
former (Zhang and Yan 2023), TimesNet (Wu et al. 2023),
MICN (Wang et al. 2023), PatchTST (Nie et al. 2023), Dlin-
ear (Zeng et al. 2023), SCINet (Liu et al. 2022), Flow-
former (Wu et al. 2022), FEDformer (Zhou et al. 2022),
LSSL (Gu, Goel, and R ´e 2021), and LSTNet (Lai et al.
2018).
Setting 2 Experiments Following the results of pa-
pers (Liu et al. 2024; Li et al. 2021), this set of ex-
periments is conducted on 25 UEA datasets and com-
pared with recent advanced MTSC-dedicated models, in-
cluding TodyNet (Liu et al. 2024), OS-CNN and MOS-
CNN (Tang et al. 2022), ShapeNet (Li et al. 2021), Tap-
Net (Zhang et al. 2020), MLSTM-FCN (Karim et al. 2019),
and WEASEL+MUSE (Sch ¨afer and Leser 2017). The tradi-
tional methods EDI, DTWI, and DTWD based on Euclidean
Distance, dynamic time warping, and the nearest neighbor
classifier are also included.
Implementation Details
The benchmark results for all baseline methods are sourced
from their respective publications, ensuring consistent train-
ing parameters across comparisons. Our proposed model
is implemented using a computational infrastructure con-
sisting of a server running Ubuntu 20.04.3 LTS, equipped
with 8 NVIDIA GeForce RTX 3090 GPUs. The perfor-
mance is evaluated by computing the accuracy, 1-to-1 Win-
s/Draws/Losses, average accuracy, and average rank.
Page 6:
Data/Model EDI DTWI DTWDW.+MUSE M.-FCN TapNet ShapeNet OS-CNN MOS-CNN TodyNetOursarxiv’17 Neur. ’19 AAAI’20 AAAI’21 ICLR’22 ICLR’22 Info. ’24
ArticularyWordRecognition 97.0 98.0 98.7 99.0 97.3 98.7 98.7 98.8 99.1 98.7 97.7
AtrialFibrillation 26.7 26.7 20.0 33.3 26.7 33.3 40.0 23.3 18.3 46.7 53.3
BasicMotions 67.5 100 97.5 100 95.0 100 100 100 100 100 100
Cricket 94.4 98.6 100 100 91.7 95.8 98.6 99.3 99.0 100 94.4
DuckDuckGeese 27.5 55.0 60.0 57.5 67.5 57.5 72.5 54.0 61.5 58.0 68.0
Epilepsy 66.7 97.8 96.4 100 76.1 97.1 98.7 98.0 99.6 97.1 97.1
EthanolConcentration 29.3 30.4 32.3 13.3 37.3 32.3 31.2 24.0 41.5 35.0 43.3
ERing 13.3 13.3 13.3 43.0 13.3 13.3 13.3 88.1 91.5 91.5 94.4
FaceDetection 51.9 51.3 52.9 54.5 54.5 55.6 60.2 57.5 59.7 62.7 69.8
FingerMovements 55.0 52.0 53.0 49.0 58.0 53.0 58.9 56.8 56.8 67.6 64.0
HandMovementDirection 27.9 30.6 23.1 36.5 36.5 37.8 33.8 44.3 36.1 64.9 63.5
Handwriting 37.1 50.9 60.7 60.5 28.6 35.7 45.1 66.8 67.7 43.6 34.4
Heartbeat 62.0 65.9 71.7 72.7 66.3 75.1 75.6 48.9 60.4 75.6 75.6
Libras 83.3 89.4 87.2 87.8 85.6 85.0 85.6 95.0 96.5 85.0 87.2
LSST 45.6 57.5 55.1 59.0 37.3 56.8 59.0 41.3 52.1 61.5 60.4
MotorImagery 51.0 39.0 50.0 50.0 51.0 59.0 61.0 53.5 51.5 64.0 65.0
NATOPS 86.0 85.0 88.3 87.0 88.9 93.9 88.3 96.8 95.1 97.2 94.4
PenDigits 97.3 93.9 97.7 94.8 97.8 98.0 97.7 98.5 98.3 98.7 98.9
PEMS-SF 70.5 73.4 71.1 N/A 69.9 75.1 75.1 76.0 76.4 78.0 94.2
PhonemeSpectra 10.4 15.1 15.1 19.0 11.0 17.5 29.8 29.9 29.5 30.9 14.4
RacketSports 86.8 84.2 80.3 93.4 80.3 86.8 88.2 87.7 92.9 80.3 87.5
SelfRegulationSCP1 77.1 76.5 77.5 71.0 87.4 65.2 78.2 83.5 82.9 89.8 92.8
SelfRegulationSCP2 48.3 53.3 53.9 46.0 47.2 55.0 57.8 53.2 51.0 55.0 57.2
StandWalkJump 20.0 33.3 20.0 33.3 6.7 40.0 53.3 38.3 38.3 46.7 53.3
UWaveGestureLibrary 88.1 86.9 90.3 91.6 89.1 89.4 90.6 92.7 92.6 85.0 88.1
Ours 1-to-1-Wins 15 13 14 10 19 18 12 12 15 14 -
Ours 1-to-1-Draws 2 2 2 2 1 4 1 3 2 3 -
Ours 1-to-1-Losses 8 10 9 13 5 3 12 10 8 8 -
Avg. accuracy ( ↑) 56.8 62.3 62.6 62.1 60.0 64.3 67.6 68.2 69.9 72.6 74.0
Avg. rank ( ↓) 7.52 6.48 5.8 5.36 6.4 5.24 3.84 4.28 3.8 3.16 3.12
Table 2: Setting 2 experimental results. Performance comparison with the recent advanced MTSC-dedicated models on 25 UEA
datasets. In the table, ’N/A’ indicates that the results for the corresponding method could not be obtained due to memory or
computational limitations (Liu et al. 2024).
Experimental Results
Setting 1 Results Table 1 presents the performance com-
parison of our proposed method with 11 advanced General
Time Series Analysis frameworks across 10 diverse MTSC
datasets. Our approach demonstrates superior performance,
achieving an average accuracy of 75.4%, and ranks first in
terms of average rank, outperforming all competitors. A de-
tailed analysis reveals that our method consistently excels
on challenging datasets. For instance, on the EthanolCon-
centration and PEMS-SF datasets, known for their complex
patterns and high dimensionality, our method significantly
outperforms other approaches. In direct comparison to the
recently state-of-the-art ModernTCN, our method achieves
a substantial improvement of 1.2% in average accuracy and
0.6 in average rank, underscoring its effectiveness in han-
dling complex multivariate time series data.
Setting 2 Results Our method consistently demonstrates
superior performance across the 25 UEA datasets compared
to 10 advanced MTSC-dedicated models in Table 2. It se-
cures a solid win-loss record, achieving the highest aver-
age accuracy of 74% and the lowest average rank of 3.12.Notably, our approach surpasses competitors in eleven of
the twenty-five datasets. Moreover, it demonstrates strong
generalization capabilities by yielding competitive results
across diverse domains, including human activity recogni-
tion, healthcare, and speech recognition.
Ablation Studies
Model Design Variants We conducted ablation studies on
MPTSNet with three variants across 10 datasets in Setting 1:
•w/o-LocalExt : Removes the Local Extractor, capturing
only global dependencies at each time point.
•w/o-GlobalCap : Removes the Global Capturer, relying
solely on local patterns for each segment.
•w/o-MP : Omitted the multi-scale FFT decomposition,
applying Local Extractor and Global Capturer to the en-
tire time series instead.
The results of our ablation studies shown in Table 3
provide significant insights into the contributions of dif-
ferent components of the MPTSNet model. The full MPT-
SNet model achieved the highest average accuracy of 0.754,
demonstrating the effectiveness of integrating local pattern
Page 7:
Figure 4: Interpretability visualization of the MPTSNet model on the Epilepsy dataset. Left: attention maps at different periodic
scales (single example); Right: composite attention maps by weighted summing the individual scale maps. MPTSNet captures
varying local patterns across periodic scales while achieving enhanced interpretability by the integration of multi-scale attention.
extraction, global dependency modeling, and multiscale pe-
riodic decomposition. Removing the Local Extractor (w/o-
LocalExt) and the Global Capturer (w/o-GlobalCap) re-
sulted in reduced accuracies of 0.737 and 0.731, respec-
tively, highlighting their critical roles in capturing local and
global features. The most significant performance drop was
observed without Multiscale Periodic decomposition (w/o-
MP), with accuracy falling to 0.709, underscoring the im-
portance of multiscale periodic feature analysis.
Model Design Variants Accuracy Setting of kAccuracy
MPTSNet 0.754 k=7 0.751
w/o-LocalExt 0.737 k=5 0.754
w/o-GlobalCap 0.731 k=3 0.742
w/o-MP 0.709 k=1 0.734
Table 3: The ablation study was conducted on 10 UEA
datasets within Setting 1 experiments. Average accuracies
are reported, with the best performance indicated in bold.
Setting of kThe hyperparameter kis the sole parameter
that needs to be set in MPTSNet to determine the number
of periodic scales. Experimental results in Table 3 show a
consistent improvement in average accuracy as kincreases
from 1 (0.734) to 5 (0.754), followed by a slight decline at
k=7 (0.751). This trend suggests that incorporating multiple
periodic scales enhances the model capacity to capture com-
plex temporal patterns, with k=5 providing an optimal bal-
ance between information extraction and noise avoidance.
The marginal performance decrease at k=7 indicates that ex-
cessive periodic scale extraction may introduce noise, par-
ticularly for datasets with less pronounced periodicity. These
findings underscore the importance of judicious kselection,
balancing the trade-off between capturing sufficient periodic
information and mitigating the risk of overfitting. While k=5
emerges as the optimal value for the diverse datasets ex-
amined, further research into adaptive k-selection methods
could potentially enhance the generalization capabilities of
MPTSNet across varied time series data.Enhanced Interpretability
Our proposed MPTSNet model offers a unique insight into
the decision-making process through its interpretability vi-
sualization. By integrating global attention across various
periodic scales and weighting them accordingly, we can pin-
point the specific time intervals that contribute most signif-
icantly to the classification of each time series. As an ex-
ample, we illustrate the visualization of the attention maps
using one of the datasets, the Epilepsy dataset in Fig. 4.
This dataset, sourced from the UEA archive, encompasses
wrist motion recordings from multiple participants engaged
in four distinct activities: Walking, Running, Sawing, and
Seizure Mimicking. Our analysis is conducted on three ran-
domly selected samples from each class.
The left portion of the figure presents the attention maps
from different periodic scales, which clearly demonstrate
that the model attention is drawn to varying temporal lo-
cations across different periodic scales. To obtain a holistic
view, we computed composite attention maps by combin-
ing individual scale attention maps, weighted by their cor-
responding amplitudes. These composite maps, presented in
the right panel, are juxtaposed with the original signals. The
resulting visualizations reveal distinct patterns for different
classes while maintaining similarities within the same class.
This observation underscores the ability of MPTSNet to dif-
ferentiate between various time series classes by identifying
and exploiting key temporal patterns and extracting mean-
ingful representations.
Conclusion
To improve the performance of multivariate time series
classification, we introduce MPTSNet, a novel deep learn-
ing framework. Our approach incorporates periodicity as a
fundamental time scale and leverages the complementary
strengths of CNN and attention mechanisms to thoroughly
learn multiscale local and global features within time series
data. Extensive experiments demonstrate the effectiveness
of MPTSNet in various real-world applications. By visual-
izing attention maps, we provide insights into how MPT-
SNet identifies crucial temporal patterns, enabling more in-
terpretable and reliable classification results.
Page 8:
Acknowledgments
The work is jointly supported by the German Federal Min-
istry of Education and Research (BMBF) in the framework
of the international future AI lab ”AI4EO – Artificial In-
telligence for Earth Observation: Reasoning, Uncertainties,
Ethics and Beyond” (grant number: 01DD20001), by Ger-
man Federal Ministry for Economic Affairs and Climate
Action in the framework of the ”national center of excel-
lence ML4Earth” (grant number: 50EE2201C), by the Euro-
pean Union’s Horizon Europe research and innovations ac-
tions programme in the framework of the ”Multi-source and
Multi-scale Earth observation and Novel Machine Learning
Methods for Mineral Exploration and Mine Site Monitor-
ing (MultiMiner)” (grant number: 101091374), by the Ex-
cellence Strategy of the Federal Government and the L ¨ander
through the TUM Innovation Network EarthCare and by
Munich Center for Machine Learning.
References
An, Q.; Rahman, S.; Zhou, J.; and Kang, J. J. 2023. A com-
prehensive review on machine learning in healthcare indus-
try: classification, restrictions, opportunities and challenges.
Sensors , 23(9): 4178.
Bagnall, A.; Dau, H. A.; Lines, J.; Flynn, M.; Large, J.;
Bostrom, A.; Southam, P.; and Keogh, E. 2018. The UEA
multivariate time series classification archive, 2018. arXiv
preprint arXiv:1811.00075 .
Baydogan, M. G.; Runger, G.; and Tuv, E. 2013. A bag-of-
features framework to classify time series. IEEE transac-
tions on pattern analysis and machine intelligence , 35(11):
2796–2802.
Cai, W.; Liang, Y .; Liu, X.; Feng, J.; and Wu, Y . 2024. Ms-
gnet: Learning multi-scale inter-series correlations for mul-
tivariate time series forecasting. In Proceedings of the AAAI
Conference on Artificial Intelligence , volume 38, 11141–
11149.
Chen, X.; Qiu, P.; Zhu, W.; Li, H.; Wang, H.; Sotiras, A.;
Wang, Y .; and Razi, A. 2024. TimeMIL: Advancing Multi-
variate Time Series Classification via a Time-aware Multiple
Instance Learning. arXiv preprint arXiv:2405.03140 .
Cui, Z.; Chen, W.; and Chen, Y . 2016. Multi-scale convo-
lutional neural networks for time series classification. arXiv
preprint arXiv:1603.06995 .
Farahani, M. A.; McCormick, M.; Gianinny, R.; Hudacheck,
F.; Harik, R.; Liu, Z.; and Wuest, T. 2023. Time-series pat-
tern recognition in Smart Manufacturing Systems: A liter-
ature review and ontology. Journal of Manufacturing Sys-
tems, 69: 208–241.
Gu, A.; Goel, K.; and R ´e, C. 2021. Efficiently modeling
long sequences with structured state spaces. arXiv preprint
arXiv:2111.00396 .
Ienco, D.; and Gaetano, R. 2007. Tiselac: time series land
cover classification challenge. TiSeLaC: Time Series Land
Cover Classification Challenge , 2.
Karim, F.; Majumdar, S.; Darabi, H.; and Chen, S. 2017.
LSTM fully convolutional networks for time series classifi-
cation. IEEE access , 6: 1662–1669.Karim, F.; Majumdar, S.; Darabi, H.; and Harford, S.
2019. Multivariate LSTM-FCNs for time series classifica-
tion. Neural networks , 116: 237–245.
Lai, G.; Chang, W.-C.; Yang, Y .; and Liu, H. 2018. Modeling
long-and short-term temporal patterns with deep neural net-
works. In The 41st international ACM SIGIR conference on
research & development in information retrieval , 95–104.
Lai, Z.-H.; Zhang, T.-H.; Liu, Q.; Qian, X.; Wei, L.-F.; Chen,
S.-L.; Chen, F.; and Yin, X.-C. 2023. InterFormer: Interac-
tive Local and Global Features Fusion for Automatic Speech
Recognition. arXiv preprint arXiv:2305.16342 .
Li, G.; Choi, B.; Xu, J.; Bhowmick, S. S.; Chun, K.-P.; and
Wong, G. L.-H. 2021. Shapenet: A shapelet-neural network
approach for multivariate time series classification. In Pro-
ceedings of the AAAI conference on artificial intelligence ,
volume 35, 8375–8383.
Li, Y .; Yang, G.; Su, Z.; Li, S.; and Wang, Y . 2023a. Human
activity recognition based on multienvironment sensor data.
Information Fusion , 91: 47–63.
Li, Z.; Rao, Z.; Pan, L.; and Xu, Z. 2023b. Mts-mixers:
Multivariate time series forecasting via factorized temporal
and channel mixing. arXiv preprint arXiv:2302.04501 .
Lines, J.; Davis, L. M.; Hills, J.; and Bagnall, A. 2012. A
shapelet transform for time series classification. In Proceed-
ings of the 18th ACM SIGKDD international conference on
Knowledge discovery and data mining , 289–297.
Liu, H.; Yang, D.; Liu, X.; Chen, X.; Liang, Z.; Wang, H.;
Cui, Y .; and Gu, J. 2024. Todynet: temporal dynamic graph
neural network for multivariate time series classification. In-
formation Sciences , 120914.
Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; and
Xu, Q. 2022. Scinet: Time series modeling and forecasting
with sample convolution and interaction. Advances in Neu-
ral Information Processing Systems , 35: 5816–5828.
Luo, D.; and Wang, X. 2024. Moderntcn: A modern pure
convolution structure for general time series analysis. In
The Twelfth International Conference on Learning Repre-
sentations .
Nie, Y .; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J.
2023. A Time Series is Worth 64 Words: Long-term Fore-
casting with Transformers. In The Eleventh International
Conference on Learning Representations .
Rußwurm, M.; Courty, N.; Emonet, R.; Lef `evre, S.; Tuia, D.;
and Tavenard, R. 2023. End-to-end learned early classifica-
tion of time series for in-season crop type mapping. ISPRS
Journal of Photogrammetry and Remote Sensing , 196: 445–
456.
Sch¨afer, P.; and Leser, U. 2017. Multivariate time se-
ries classification with WEASEL+ MUSE. arXiv preprint
arXiv:1711.11343 .
Tang, W.; Long, G.; Liu, L.; Zhou, T.; Blumenstein, M.; and
Jiang, J. 2022. Omni-Scale CNNs: a simple and effective
kernel size configuration for time series classification. In
International Conference on Learning Representations .
Page 9:
Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; and Xiao,
Y . 2023. Micn: Multi-scale local and global context model-
ing for long-term series forecasting. In The eleventh inter-
national conference on learning representations .
Wang, W. K.; Chen, I.; Hershkovich, L.; Yang, J.; Shetty, A.;
Singh, G.; Jiang, Y .; Kotla, A.; Shang, J. Z.; Yerrabelli, R.;
et al. 2022. A systematic review of time series classification
techniques used in biomedical applications. Sensors , 22(20):
8016.
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; and
Sun, L. 2022. Transformers in time series: A survey. arXiv
preprint arXiv:2202.07125 .
Wu, H.; Hu, T.; Liu, Y .; Zhou, H.; Wang, J.; and Long, M.
2023. TimesNet: Temporal 2D-Variation Modeling for Gen-
eral Time Series Analysis. In The Eleventh International
Conference on Learning Representations .
Wu, H.; Wu, J.; Xu, J.; Wang, J.; and Long, M. 2022. Flow-
former: Linearizing transformers with conservation flows.
arXiv preprint arXiv:2202.06258 .
Yang, J.; Nguyen, M. N.; San, P. P.; Li, X.; and Krish-
naswamy, S. 2015. Deep convolutional neural networks on
multichannel time series for human activity recognition. In
Ijcai, volume 15, 3995–4001. Buenos Aires, Argentina.
Yu, W.; Kim, I. Y .; and Mechefske, C. 2021. Analysis of
different RNN autoencoder variants for time series classifi-
cation and machine prognostics. Mechanical Systems and
Signal Processing , 149: 107322.
Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2023. Are trans-
formers effective for time series forecasting? In Proceedings
of the AAAI conference on artificial intelligence , volume 37,
11121–11128.
Zhang, T.; Zhang, Y .; Cao, W.; Bian, J.; Yi, X.; Zheng, S.;
and Li, J. 2022. Less is more: Fast multivariate time se-
ries forecasting with light sampling-oriented mlp structures.
arXiv preprint arXiv:2207.01186 .
Zhang, X.; Gao, Y .; Lin, J.; and Lu, C.-T. 2020. Tapnet:
Multivariate time series classification with attentional proto-
typical network. In Proceedings of the AAAI conference on
artificial intelligence , volume 34, 6845–6852.
Zhang, Y .; and Yan, J. 2023. Crossformer: Transformer uti-
lizing cross-dimension dependency for multivariate time se-
ries forecasting. In The eleventh international conference on
learning representations .
Zhao, B.; Lu, H.; Chen, S.; Liu, J.; and Wu, D. 2017. Convo-
lutional neural networks for time series classification. Jour-
nal of systems engineering and electronics , 28(1): 162–169.
Zhao, J.; Li, Q.; Hong, Y .; and Shen, M. 2024. MetaRock-
ETC: Adaptive Encrypted Traffic Classification in Complex
Network Environments via Time Series Analysis and Meta-
Learning. IEEE Transactions on Network and Service Man-
agement .
Zheng, Y .; Liu, Q.; Chen, E.; Ge, Y .; and Zhao, J. L. 2014.
Time series classification using multi-channels deep convo-
lutional neural networks. In International conference on
web-age information management , 298–310. Springer.Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; and Jin, R.
2022. Fedformer: Frequency enhanced decomposed trans-
former for long-term series forecasting. In International
conference on machine learning , 27268–27286. PMLR.
Zuo, R.; Li, G.; Choi, B.; Bhowmick, S. S.; Mah, D. N.-
y.; and Wong, G. L. 2023. SVP-T: a shape-level variable-
position transformer for multivariate time series classifica-
tion. In Proceedings of the AAAI Conference on Artificial
Intelligence , volume 37, 11497–11505.