loader
Generating audio...

arxiv

Paper 2503.05582

MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification

Authors: Yang Mu, Muhammad Shahzad, Xiao Xiang Zhu

Published: 2025-03-07

Abstract:

Multivariate Time Series Classification (MTSC) is crucial in extensive practical applications, such as environmental monitoring, medical EEG analysis, and action recognition. Real-world time series datasets typically exhibit complex dynamics. To capture this complexity, RNN-based, CNN-based, Transformer-based, and hybrid models have been proposed. Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking sufficient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we propose a novel Multiscale Periodic Time Series Network (MPTSNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. Recognizing the multi-periodicity and complex variable correlations in time series, we use the Fourier transform to extract primary periods, enabling us to decompose data into multiscale periodic segments. Leveraging the inherent strengths of CNN and attention mechanism, we introduce the PeriodicBlock, which adaptively captures local patterns and global dependencies while offering enhanced interpretability through attention integration across different periodic scales. The experiments on UEA benchmark datasets demonstrate that the proposed MPTSNet outperforms 21 existing advanced baselines in the MTSC tasks.

Paper Content:
Page 1: MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification Yang Mu1, Muhammad Shahzad1, Xiao Xiang Zhu1, 2* 1Technical University of Munich, Germany 2Munich Center for Machine Learning (MCML), Germany {yang.mu, muhammad.shahzad, xiaoxiang.zhu }@tum.de Abstract Multivariate Time Series Classification (MTSC) is crucial in extensive practical applications, such as environmental mon- itoring, medical EEG analysis, and action recognition. Real- world time series datasets typically exhibit complex dynam- ics. To capture this complexity, RNN-based, CNN-based, Transformer-based, and hybrid models have been proposed. Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking suffi- cient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we pro- pose a novel Multiscale Periodic Time Series Network (MPT- SNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. Recognizing the multi-periodicity and complex vari- able correlations in time series, we use the Fourier trans- form to extract primary periods, enabling us to decompose data into multiscale periodic segments. Leveraging the in- herent strengths of CNN and attention mechanism, we intro- duce the PeriodicBlock, which adaptively captures local pat- terns and global dependencies while offering enhanced inter- pretability through attention integration across different pe- riodic scales. The experiments on UEA benchmark datasets demonstrate that the proposed MPTSNet outperforms 21 ex- isting advanced baselines in the MTSC tasks. Code — https://github.com/MUYang99/MPTSNet Datasets — https://timeseriesclassification.com/dataset.php Introduction Multivariate time series classification (MTSC) has gar- nered significant attention due to its widespread applications across various domains. This classification technique plays a crucial role in analyzing complex, multi-dimensional tempo- ral data, offering insights that are invaluable in fields ranging from healthcare (Wang et al. 2022; An et al. 2023), human activity recognition (Yang et al. 2015; Li et al. 2023a), traf- fic (Zhao et al. 2024) to environmental monitoring (Ienco and Gaetano 2007; Rußwurm et al. 2023) and industrial pro- cesses (Farahani et al. 2023). *Corresponding author Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Varying correlations between the subsequences of two variables across different periodic scales in MTS data. The challenges in solving MTSC tasks stem from two pri- mary factors. Firstly, time series data inherently possesses periodic characteristics, presenting similar trends across dif- ferent periods, which leads to data redundancy. Simulta- neously, hidden information overlaps and interacts across multiple periods (Wu et al. 2023), complicating the ex- ploration of time series data. Secondly, time series subse- quences among different variables exhibit varying degrees of correlation or even opposite correlations at different peri- odic scales (Cai et al. 2024) in MTS data, as shown in Fig. 1. Thus, the extraction of local intra-period features and the capture of global inter-period dependencies at multiple peri- odic scales become paramount for effective MTSC. Traditional approaches to time series classification, such as bag-of-patterns (Baydogan, Runger, and Tuv 2013) or shapelet-based methods (Lines et al. 2012), typically re- quire a preprocessing step to transform time series into a wide range of subsequences or patterns as candidate fea- tures. Subsequently, discriminative subsequences are se- lected from these candidates for classification purposes. However, this approach often results in an expansive feature space, which not only complicates the feature selection pro- cess but may also lead to decreased accuracy (Sch ¨afer and Leser 2017), particularly in multivariate settings where the complexity of data is inherently higher. Recently, deep neural networks have demonstrated re-arXiv:2503.05582v1 [cs.LG] 7 Mar 2025 Page 2: Figure 2: Multiscale periodic feature analysis of time se- ries data using FFT. By decomposing time series into multi- ple frequency components, it reveals both local patterns and global dependencies across different periodic scales. markable potential in various time series tasks, outperform- ing traditional methods in many aspects. RNN-based meth- ods (Karim et al. 2017; Yu, Kim, and Mechefske 2021) uti- lize loops within their architecture to maintain and propagate information across time steps, but limit the ability to capture long-term dependencies due to the vanishing gradient prob- lem. CNN-based methods (Cui, Chen, and Chen 2016; Zhao et al. 2017) excel at learning spatial hierarchies of features through convolutional filters, the inductive bias allows them to capture local patterns effectively, but they are less adept at modeling global features comprehensively. Transformer- based methods (Wen et al. 2022; Zuo et al. 2023) leverage self-attention mechanisms to model dependencies between different positions in the input sequence, adeptly handling long-range dependencies. However, they struggle to effec- tively extract features from adjacent time points that exhibit localized pattern characteristics. While these models have shown promise, they often fall short in simultaneously con- structing local features and global dependencies within time series data across different periodic scales, resulting in sub- optimal feature extraction. To address these limitations, we propose a general frame- work called Multiscale Periodic Time Series Network (MPTSNet). The core concept, illustrated in Fig. 2, involves transforming time-domain data into frequency-domain rep- resentations using the Fast Fourier Transform (FFT). This approach decomposes complex time series into multiple fre- quency components, revealing both local intra-period pat- terns and global inter-period dependencies across various periodic scales. The architecture of MPTSNet is supported by the PeriodicBlock, which decomposes time series into multiscale periodic segments. These segments are then pro-cessed by two key components sequentially: an Inception- based Local Extractor, which captures corresponding local features, and an Attention-based Global Capturer, which ex- tracts global dependencies. This modular architecture en- ables MPTSNet to effectively analyze time series data at multiple periodic scales. Our contributions can be summa- rized in three key aspects: • We propose an end-to-end general framework MPTSNet that addresses the multi-periodicity of time series and complex correlations among variable subsequences at different time scales. This novel architecture is capable of extracting multiscale periodic features effectively. • Inspired by the strengths of CNN in capturing local fea- tures and attention mechanism in modeling global de- pendencies, we design a PeriodicBlock comprising an Inception-based Local Extractor and an Attention-based Global Capturer to extract both local features and global dependencies from time series data. • Our extensive experiments on real-world datasets demonstrate that the proposed MPTSNet outperforms existing advanced baselines in MTSC tasks. Further- more, MPTSNet offers enhanced interpretability through its ability to localize temporal patterns with integrated global attention across different periodic scales. Related Work Multivariate Time Series Classification Recent years have witnessed the emergence of various deep learning models for MTSC. These can be broadly categorized into three main groups: 1) CNN-based mod- els. The Multichannel Deep Convolutional Neural Network (MCDCNN) (Zheng et al. 2014) pioneered the applica- tion of CNNs to MTSC by capturing intra-variable fea- tures through one-dimensional convolutions and combin- ing them with fully connected layers. More recently, OS- CNN (Tang et al. 2022) has been proposed, featuring an in- novative Omni-Scale block that uses a set of prime num- bers as convolution kernel sizes to effectively cover recep- tive fields across all scales, optimizing performance across different datasets. 2) RNN/CNN Hybrid models. Models such as LSTM-FCN (Karim et al. 2017) and MLSTM- FCN (Karim et al. 2019) have been introduced, combin- ing LSTM layers to capture short- and long-term depen- dencies with stacked CNN layers to extract features from the time series. 3) Transformer-based models. These models have gained prominence in General Time Series Analysis , addressing multiple tasks including classification, imputa- tion, short-term and long-term forecasting, and anomaly de- tection (Chen et al. 2024). Notable examples include FED- former (Zhou et al. 2022), Flowformer (Wu et al. 2022), PatchTST (Nie et al. 2023), and Crossformer (Zhang and Yan 2023), which have been developed and refined in re- cent years, leveraging their excellent scaling behaviors. Ad- ditionally, MLP-based (Zhang et al. 2022; Li et al. 2023b), GNN-based (Liu et al. 2024) and MIL-based (Chen et al. 2024) models have also achieved impressive performance. Since MIL-based models employ a binary paradigm for Page 3: multi-class problems, unlike other approaches, they are ex- cluded from quantitative comparisons. However, a key assumption in many of these models is that the correlations between different variables remain con- stant across various time resolutions. This assumption often leads to inadequate representation of pairwise relationships between variables at different periodic scales. Local and Global Modeling for Sequence Modeling local patterns and global dependencies has proven effective for correlation learning and feature extraction in sequence modeling (Wang et al. 2023). In speech recogni- tion, the SOTA model Interformer (Lai et al. 2023) employs convolution to extract local information and self-attention to capture long-term dependencies in parallel branches. How- ever, this parallel processing of local and global features might introduce redundancy, potentially reducing efficiency. In general time series analysis, TimesNet (Wu et al. 2023) transforms 1D time series into various 2D tensors using CNN-based layers to capture both local and global vari- ations. Directly capturing local and global features with CNNs may lead to a loss in precision. Additionally, while TimesNet provides visualizations of the 2D transformations, interpreting these in the context of the original 1D time series can be challenging. In time series forecasting, MS- GNet (Cai et al. 2024) employs a graph convolution module for inter-series correlation learning and a multi-head atten- tion module for intra-series correlation learning. However, it introduces increased complexity and computational cost due to its adaptive graph convolution module; MICN (Wang et al. 2023) combines local feature extraction and global cor- relation modeling using a convolution-based structure; how- ever, its performance can be highly sensitive to the choice of hyperparameters, such as scale sizes in the multi-scale convolution, requiring meticulous tuning for optimal results. Moreover, the use of full CNN layers in MICN sacrifices the interpretability. To address these existing limitations, we propose MPT- SNet, which sequentially applies convolutional and atten- tion modules to capture local and global features. By requir- ing only the number of extraction scales, other parameters are adaptively configured based on time series characteris- tics, reducing sensitivity to parameter tuning and avoiding redundant computations. Moreover, this approach enhances the interpretability of MTSC by integrating multi-scale at- tention maps. Methodology Problem Formulation MTSC involves the analysis of a set of time series data, where each series is composed of multiple variables ob- served over time. A MTS data is formally defined as X= {x1, x2, . . . , x d} ∈Rd×l, where dis the number of vari- ables and lis the length of the series. In the context of MTSC, the goal is to classify these time series into prede- fined categories. Given a dataset X={X1, X2, . . . , X m} ∈ Rm×d×lwith corresponding labels η={y1, y2, . . . , y m}, where mdenotes the number of time series, the task is totrain a classifier f(·)capable of predicting the label yfor new, unlabeled MTS data. Model Architecture Overview The overview of our proposed MPTSNet is shown in Fig. 3, which is designed to extract local patterns and global de- pendencies at various periodic scales. Initially, the model identifies the main periods from the entire training dataset using FFT. The core component, PeriodicBlock, processes time series data in parallel. It begins by segmenting batch data into multiple scales based on the identified main pe- riods. Subsequently, the Local Extractor and Global Cap- turer are employed within the PeriodicBlock to extract intra- period local and inter-period global features sequentially, as depicted in the right panel of Fig. 3. To account for poten- tial discrepancies in frequency-amplitude relationships be- tween batch data and the overall training set, FFT is also applied to each batch. This process queries the amplitudes corresponding to the main periods, which are then used as weights calculated by Softmax to aggregate multiscale pe- riodic features. The PeriodicBlock outputs a weighted sum of features extracted from different periodic scales. Residual connections between stacked PeriodicBlocks enhance infor- mation flow. Finally, the extracted features are fed into the Classification head to generate predicted labels. Main Periods Identification Grounded in the inherent characteristics of time series data, we aim to enhance MTSC accuracy by exploiting two fun- damental properties: multi-periodicity and complex corre- lations among variables across different periodic scales, as mentioned before. The identification of main periods is a critical step in our methodology, as it serves as the founda- tion for subsequent multiscale analysis. Inspired by the work (Wu et al. 2023), we employ the FFT to identify the principal periods. This technique effectively converts time-domain data into the frequency domain, al- lowing for a comprehensive analysis of the underlying peri- odic structures. The process can be formalized as follows: AMP =Avg(Amplitudes (FFT(X∈Rd×l))), (1) where X∈Rd×lrepresents the input time series, dde- notes the number of variables, and lis the sequence length. The function FFT (·)computes the Fast Fourier Transform, while Amplitudes (·)calculates the amplitude values. Avg (·) averages the amplitudes across all variables, resulting in AMP ∈Rl, which represents the mean amplitude for each frequency. f1, . . . , f k= arg Topk f∗∈{1,...,⌊T 2⌋}(AMP ), (2) pi=l/fi, i∈ {1, . . . , k }, (3) To focus on the most significant periodic components, we select the top kamplitudes using the arg Topk function. This operation yields the frequencies f1, . . . , f k, which cor- respond to the kmost prominent periods p1, . . . , p k. The pa- rameter kis a hyperparameter that determines the number of periodic scales to consider in subsequent analyses. Page 4: Figure 3: Overview of the Multiscale Periodic Time Series Network (MPTSNet). The model processes multivariate time series data through FFT to identify the main periods. The data is then segmented into multiscale periodic components, where Local Extractor and Global Capturer in PeriodicBlock extract features sequentially at different scales. The features are aggregated by corresponding amplitudes and passed through the Classification head for final prediction. The identification of these main periods establishes the groundwork for multiscale analysis. This approach allows us to decompose the original time series into multiple peri- odic components, and such decomposition is crucial for un- derstanding the complex temporal dynamics present in MTS data. The extracted periods serve as the basis for reshaping the input data into multiple representations, each correspond- ing to a different time scale. This transformation can be ex- pressed as: X′i=Reshapepi,fi(Pad(X)), i∈ {1, . . . , k }. (4) where Pad (·)extends the time series with zeros to en- sure compatibility with the reshaping operation, and X′i∈ Rd×pi×firepresents the i-th reshaped time series based on thei-th periodic scale. PeriodicBlock The PeriodicBlock serves as the core component of our MPTSNet architecture, designed to extract both local and global features from the reshaped time series data at differ- ent periodic scales. Prior to processing by the PeriodicBlock, each reshaped tensor X′i∈Rd×pi×fiis embedded into a higher-dimensional space: E′i=Embedding (X′i), i∈ {1, . . . , k }. (5) where E′i∈Rdembed×pi×fianddembed is the embedding dimension. Local Extractor To capture specific and comprehensive intra-period local patterns, we propose a multiscale adaptive convolution module inspired by the Inception architecture. For each of the fiembeddings Ei∈Rdembed×pi, this mod- ule comprises parallel 1D convolutional layers with varying kernel sizes:Li,ker j =Conv1D ker(Ei), ker ∈ {1,3,5,7,9,11},(6) where Conv1D kerrepresents a 1D convolutional layer with kernel size ker. The outputs of these convolutions are concatenated along the channel dimension: Li j=Mean (Li,ker j), j∈ {1, . . . , f i}. (7) This multi-kernel size Inception-based approach allows the model to capture local patterns at various granularities, enhancing its ability to detect relevant features. Global Capturer To model inter-period dependencies and adapt to different time steps for multiscale periods, we de- signed a Global Capturer based on the multi-head attention mechanism. The Global Capturer transforms each local fea- tureLiinto query, key, and value representations: Qi=WQLi, Ki=WKLi, Vi=WVLi, (8) where WQ,WK, and WVare learnable weight matrices. The multi-head attention is then computed as: Attention (Qi, Ki, Vi) =SoftmaxQi(Ki)T √dk Vi,(9) Gi=Concat (head 1, . . . , headh)WO. (10) where his the number of attention heads, dkis the dimen- sion of each head, and WOis a learnable output projection matrix. This mechanism allows the model to capture com- plex global dependencies across all time series segments. Adaptive Aggregation and Classification To fuse the k multiscale periodic features, we employ an adaptive aggre- gation using weights calculated from the queried ampli- tudes: Page 5: Data/ModelLSTNet LSSL FEDf. Flowf. SCINet Dlinear PatchTST MICN TimesNet Crossf. M.TCNOursSIG. ’18 ICLR’22 ICLR’22 ICLR’22 NIPS’22 AAAI’23 ICLR’23 ICLR’23 ICLR’23 ICLR’23 ICLR’24 EthanolConcentration 39.9 31.1 31.2 33.8 34.4 36.2 32.8 35.3 35.7 38.0 36.3 43.3 FaceDetection 65.7 66.7 66.0 67.6 68.9 68.0 68.3 65.2 68.6 68.7 70.8 69.8 Handwriting 25.8 24.6 28.0 33.8 23.6 27.0 29.6 25.5 32.1 28.8 30.6 34.4 Heartbeat 77.1 72.7 73.7 77.6 77.5 75.1 74.9 74.7 78.0 77.6 77.2 75.6 JapaneseV owels 98.1 98.4 98.4 98.9 96.0 96.2 97.5 94.6 98.4 99.1 98.8 98.6 PEMS-SF 86.7 86.1 80.9 86.0 83.8 75.1 89.3 85.5 89.6 85.9 89.1 94.2 SelfRegulationSCP1 84.0 90.8 88.7 92.5 92.5 87.3 90.7 86.0 91.8 92.1 93.4 92.8 SelfRegulationSCP2 52.8 52.2 54.4 56.1 57.2 50.5 57.8 53.6 57.2 58.3 60.3 57.2 SpokenArabicDigits 100 100 100 98.8 98.1 81.4 98.3 97.1 99.0 97.9 98.7 99.5 UWaveGestureLibrary 87.8 85.9 85.3 86.6 85.1 82.1 85.8 82.8 85.3 85.3 86.7 88.1 Ours 1-to-1-Wins 3 4 9 3 3 8 1 10 5 7 5 - Ours 1-to-1-Draws 1 2 0 1 0 0 0 0 1 0 0 - Ours 1-to-1-Losses 6 4 1 6 7 2 9 0 4 3 5 - Avg. accuracy ( ↑) 71.8 70.9 70.7 73.2 71.7 67.9 72.5 70.0 73.6 73.2 74.2 75.4 Avg. rank ( ↓) 6 7.5 7.6 4.6 6.7 8.6 6.1 9.2 4.1 4.5 3 2.4 Table 1: Setting 1 experimental results. Performance comparison with the recent advanced General Time Series Analysis frame- works on 10 UEA datasets. The best results are highlighted in bold, while the second best are underlined . αi=Softmax (AMP i), i∈ {1, . . . , k }, (11) whereAMP irepresents the amplitude corresponding to thei-th selected frequency. The aggregated feature represen- tation is then computed as: Z=kX i=1αiGi. (12) This adaptive weighting scheme ensures that the model emphasizes the most relevant periodic components in the fi- nal representation. The aggregated features Zare then passed through the Classification head, typically consisting of fully connected layers followed by a softmax activation: y=Softmax (WcZ+bc), (13) where Wcandbcare the weight matrix and bias vector of the classification layer, respectively, and yis the predicted class probability distribution. By combining the Local Extractor for intra-period local pattern extraction, the Global Capturer for modeling inter- period global dependencies, and the adaptive aggregation mechanism, the PeriodicBlock effectively captures the com- plex temporal dynamics present in multivariate time series data across multiple periodic scales. Experiments Datasets The public UEA benchmark datasets (Bagnall et al. 2018), collected from various real-world applications, constitute a comprehensive archive for multivariate time series classi- fication across several domains, including human activity, speech, medical EEG, and audio data, etc. We utilize the UEA benchmark datasets to evaluate the performance of the proposed MPTSNet. These datasets vary in length, dimen- sions, and the size of their training/testing sets.Baselines Setting 1 Experiments Based on the experimental results of papers (Luo and Wang 2024; Wu et al. 2023), this set of experiments is conducted on 10 UEA datasets and compared with several recent advanced General Time Series Analy- sisframeworks, which address tasks including classification, imputation, forecasting, and anomaly detection. The com- parison includes ModernTCN (Luo and Wang 2024), Cross- former (Zhang and Yan 2023), TimesNet (Wu et al. 2023), MICN (Wang et al. 2023), PatchTST (Nie et al. 2023), Dlin- ear (Zeng et al. 2023), SCINet (Liu et al. 2022), Flow- former (Wu et al. 2022), FEDformer (Zhou et al. 2022), LSSL (Gu, Goel, and R ´e 2021), and LSTNet (Lai et al. 2018). Setting 2 Experiments Following the results of pa- pers (Liu et al. 2024; Li et al. 2021), this set of ex- periments is conducted on 25 UEA datasets and com- pared with recent advanced MTSC-dedicated models, in- cluding TodyNet (Liu et al. 2024), OS-CNN and MOS- CNN (Tang et al. 2022), ShapeNet (Li et al. 2021), Tap- Net (Zhang et al. 2020), MLSTM-FCN (Karim et al. 2019), and WEASEL+MUSE (Sch ¨afer and Leser 2017). The tradi- tional methods EDI, DTWI, and DTWD based on Euclidean Distance, dynamic time warping, and the nearest neighbor classifier are also included. Implementation Details The benchmark results for all baseline methods are sourced from their respective publications, ensuring consistent train- ing parameters across comparisons. Our proposed model is implemented using a computational infrastructure con- sisting of a server running Ubuntu 20.04.3 LTS, equipped with 8 NVIDIA GeForce RTX 3090 GPUs. The perfor- mance is evaluated by computing the accuracy, 1-to-1 Win- s/Draws/Losses, average accuracy, and average rank. Page 6: Data/Model EDI DTWI DTWDW.+MUSE M.-FCN TapNet ShapeNet OS-CNN MOS-CNN TodyNetOursarxiv’17 Neur. ’19 AAAI’20 AAAI’21 ICLR’22 ICLR’22 Info. ’24 ArticularyWordRecognition 97.0 98.0 98.7 99.0 97.3 98.7 98.7 98.8 99.1 98.7 97.7 AtrialFibrillation 26.7 26.7 20.0 33.3 26.7 33.3 40.0 23.3 18.3 46.7 53.3 BasicMotions 67.5 100 97.5 100 95.0 100 100 100 100 100 100 Cricket 94.4 98.6 100 100 91.7 95.8 98.6 99.3 99.0 100 94.4 DuckDuckGeese 27.5 55.0 60.0 57.5 67.5 57.5 72.5 54.0 61.5 58.0 68.0 Epilepsy 66.7 97.8 96.4 100 76.1 97.1 98.7 98.0 99.6 97.1 97.1 EthanolConcentration 29.3 30.4 32.3 13.3 37.3 32.3 31.2 24.0 41.5 35.0 43.3 ERing 13.3 13.3 13.3 43.0 13.3 13.3 13.3 88.1 91.5 91.5 94.4 FaceDetection 51.9 51.3 52.9 54.5 54.5 55.6 60.2 57.5 59.7 62.7 69.8 FingerMovements 55.0 52.0 53.0 49.0 58.0 53.0 58.9 56.8 56.8 67.6 64.0 HandMovementDirection 27.9 30.6 23.1 36.5 36.5 37.8 33.8 44.3 36.1 64.9 63.5 Handwriting 37.1 50.9 60.7 60.5 28.6 35.7 45.1 66.8 67.7 43.6 34.4 Heartbeat 62.0 65.9 71.7 72.7 66.3 75.1 75.6 48.9 60.4 75.6 75.6 Libras 83.3 89.4 87.2 87.8 85.6 85.0 85.6 95.0 96.5 85.0 87.2 LSST 45.6 57.5 55.1 59.0 37.3 56.8 59.0 41.3 52.1 61.5 60.4 MotorImagery 51.0 39.0 50.0 50.0 51.0 59.0 61.0 53.5 51.5 64.0 65.0 NATOPS 86.0 85.0 88.3 87.0 88.9 93.9 88.3 96.8 95.1 97.2 94.4 PenDigits 97.3 93.9 97.7 94.8 97.8 98.0 97.7 98.5 98.3 98.7 98.9 PEMS-SF 70.5 73.4 71.1 N/A 69.9 75.1 75.1 76.0 76.4 78.0 94.2 PhonemeSpectra 10.4 15.1 15.1 19.0 11.0 17.5 29.8 29.9 29.5 30.9 14.4 RacketSports 86.8 84.2 80.3 93.4 80.3 86.8 88.2 87.7 92.9 80.3 87.5 SelfRegulationSCP1 77.1 76.5 77.5 71.0 87.4 65.2 78.2 83.5 82.9 89.8 92.8 SelfRegulationSCP2 48.3 53.3 53.9 46.0 47.2 55.0 57.8 53.2 51.0 55.0 57.2 StandWalkJump 20.0 33.3 20.0 33.3 6.7 40.0 53.3 38.3 38.3 46.7 53.3 UWaveGestureLibrary 88.1 86.9 90.3 91.6 89.1 89.4 90.6 92.7 92.6 85.0 88.1 Ours 1-to-1-Wins 15 13 14 10 19 18 12 12 15 14 - Ours 1-to-1-Draws 2 2 2 2 1 4 1 3 2 3 - Ours 1-to-1-Losses 8 10 9 13 5 3 12 10 8 8 - Avg. accuracy ( ↑) 56.8 62.3 62.6 62.1 60.0 64.3 67.6 68.2 69.9 72.6 74.0 Avg. rank ( ↓) 7.52 6.48 5.8 5.36 6.4 5.24 3.84 4.28 3.8 3.16 3.12 Table 2: Setting 2 experimental results. Performance comparison with the recent advanced MTSC-dedicated models on 25 UEA datasets. In the table, ’N/A’ indicates that the results for the corresponding method could not be obtained due to memory or computational limitations (Liu et al. 2024). Experimental Results Setting 1 Results Table 1 presents the performance com- parison of our proposed method with 11 advanced General Time Series Analysis frameworks across 10 diverse MTSC datasets. Our approach demonstrates superior performance, achieving an average accuracy of 75.4%, and ranks first in terms of average rank, outperforming all competitors. A de- tailed analysis reveals that our method consistently excels on challenging datasets. For instance, on the EthanolCon- centration and PEMS-SF datasets, known for their complex patterns and high dimensionality, our method significantly outperforms other approaches. In direct comparison to the recently state-of-the-art ModernTCN, our method achieves a substantial improvement of 1.2% in average accuracy and 0.6 in average rank, underscoring its effectiveness in han- dling complex multivariate time series data. Setting 2 Results Our method consistently demonstrates superior performance across the 25 UEA datasets compared to 10 advanced MTSC-dedicated models in Table 2. It se- cures a solid win-loss record, achieving the highest aver- age accuracy of 74% and the lowest average rank of 3.12.Notably, our approach surpasses competitors in eleven of the twenty-five datasets. Moreover, it demonstrates strong generalization capabilities by yielding competitive results across diverse domains, including human activity recogni- tion, healthcare, and speech recognition. Ablation Studies Model Design Variants We conducted ablation studies on MPTSNet with three variants across 10 datasets in Setting 1: •w/o-LocalExt : Removes the Local Extractor, capturing only global dependencies at each time point. •w/o-GlobalCap : Removes the Global Capturer, relying solely on local patterns for each segment. •w/o-MP : Omitted the multi-scale FFT decomposition, applying Local Extractor and Global Capturer to the en- tire time series instead. The results of our ablation studies shown in Table 3 provide significant insights into the contributions of dif- ferent components of the MPTSNet model. The full MPT- SNet model achieved the highest average accuracy of 0.754, demonstrating the effectiveness of integrating local pattern Page 7: Figure 4: Interpretability visualization of the MPTSNet model on the Epilepsy dataset. Left: attention maps at different periodic scales (single example); Right: composite attention maps by weighted summing the individual scale maps. MPTSNet captures varying local patterns across periodic scales while achieving enhanced interpretability by the integration of multi-scale attention. extraction, global dependency modeling, and multiscale pe- riodic decomposition. Removing the Local Extractor (w/o- LocalExt) and the Global Capturer (w/o-GlobalCap) re- sulted in reduced accuracies of 0.737 and 0.731, respec- tively, highlighting their critical roles in capturing local and global features. The most significant performance drop was observed without Multiscale Periodic decomposition (w/o- MP), with accuracy falling to 0.709, underscoring the im- portance of multiscale periodic feature analysis. Model Design Variants Accuracy Setting of kAccuracy MPTSNet 0.754 k=7 0.751 w/o-LocalExt 0.737 k=5 0.754 w/o-GlobalCap 0.731 k=3 0.742 w/o-MP 0.709 k=1 0.734 Table 3: The ablation study was conducted on 10 UEA datasets within Setting 1 experiments. Average accuracies are reported, with the best performance indicated in bold. Setting of kThe hyperparameter kis the sole parameter that needs to be set in MPTSNet to determine the number of periodic scales. Experimental results in Table 3 show a consistent improvement in average accuracy as kincreases from 1 (0.734) to 5 (0.754), followed by a slight decline at k=7 (0.751). This trend suggests that incorporating multiple periodic scales enhances the model capacity to capture com- plex temporal patterns, with k=5 providing an optimal bal- ance between information extraction and noise avoidance. The marginal performance decrease at k=7 indicates that ex- cessive periodic scale extraction may introduce noise, par- ticularly for datasets with less pronounced periodicity. These findings underscore the importance of judicious kselection, balancing the trade-off between capturing sufficient periodic information and mitigating the risk of overfitting. While k=5 emerges as the optimal value for the diverse datasets ex- amined, further research into adaptive k-selection methods could potentially enhance the generalization capabilities of MPTSNet across varied time series data.Enhanced Interpretability Our proposed MPTSNet model offers a unique insight into the decision-making process through its interpretability vi- sualization. By integrating global attention across various periodic scales and weighting them accordingly, we can pin- point the specific time intervals that contribute most signif- icantly to the classification of each time series. As an ex- ample, we illustrate the visualization of the attention maps using one of the datasets, the Epilepsy dataset in Fig. 4. This dataset, sourced from the UEA archive, encompasses wrist motion recordings from multiple participants engaged in four distinct activities: Walking, Running, Sawing, and Seizure Mimicking. Our analysis is conducted on three ran- domly selected samples from each class. The left portion of the figure presents the attention maps from different periodic scales, which clearly demonstrate that the model attention is drawn to varying temporal lo- cations across different periodic scales. To obtain a holistic view, we computed composite attention maps by combin- ing individual scale attention maps, weighted by their cor- responding amplitudes. These composite maps, presented in the right panel, are juxtaposed with the original signals. The resulting visualizations reveal distinct patterns for different classes while maintaining similarities within the same class. This observation underscores the ability of MPTSNet to dif- ferentiate between various time series classes by identifying and exploiting key temporal patterns and extracting mean- ingful representations. Conclusion To improve the performance of multivariate time series classification, we introduce MPTSNet, a novel deep learn- ing framework. Our approach incorporates periodicity as a fundamental time scale and leverages the complementary strengths of CNN and attention mechanisms to thoroughly learn multiscale local and global features within time series data. Extensive experiments demonstrate the effectiveness of MPTSNet in various real-world applications. By visual- izing attention maps, we provide insights into how MPT- SNet identifies crucial temporal patterns, enabling more in- terpretable and reliable classification results. Page 8: Acknowledgments The work is jointly supported by the German Federal Min- istry of Education and Research (BMBF) in the framework of the international future AI lab ”AI4EO – Artificial In- telligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond” (grant number: 01DD20001), by Ger- man Federal Ministry for Economic Affairs and Climate Action in the framework of the ”national center of excel- lence ML4Earth” (grant number: 50EE2201C), by the Euro- pean Union’s Horizon Europe research and innovations ac- tions programme in the framework of the ”Multi-source and Multi-scale Earth observation and Novel Machine Learning Methods for Mineral Exploration and Mine Site Monitor- ing (MultiMiner)” (grant number: 101091374), by the Ex- cellence Strategy of the Federal Government and the L ¨ander through the TUM Innovation Network EarthCare and by Munich Center for Machine Learning. References An, Q.; Rahman, S.; Zhou, J.; and Kang, J. J. 2023. A com- prehensive review on machine learning in healthcare indus- try: classification, restrictions, opportunities and challenges. Sensors , 23(9): 4178. Bagnall, A.; Dau, H. A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; and Keogh, E. 2018. The UEA multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075 . Baydogan, M. G.; Runger, G.; and Tuv, E. 2013. A bag-of- features framework to classify time series. IEEE transac- tions on pattern analysis and machine intelligence , 35(11): 2796–2802. Cai, W.; Liang, Y .; Liu, X.; Feng, J.; and Wu, Y . 2024. Ms- gnet: Learning multi-scale inter-series correlations for mul- tivariate time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 38, 11141– 11149. Chen, X.; Qiu, P.; Zhu, W.; Li, H.; Wang, H.; Sotiras, A.; Wang, Y .; and Razi, A. 2024. TimeMIL: Advancing Multi- variate Time Series Classification via a Time-aware Multiple Instance Learning. arXiv preprint arXiv:2405.03140 . Cui, Z.; Chen, W.; and Chen, Y . 2016. Multi-scale convo- lutional neural networks for time series classification. arXiv preprint arXiv:1603.06995 . Farahani, M. A.; McCormick, M.; Gianinny, R.; Hudacheck, F.; Harik, R.; Liu, Z.; and Wuest, T. 2023. Time-series pat- tern recognition in Smart Manufacturing Systems: A liter- ature review and ontology. Journal of Manufacturing Sys- tems, 69: 208–241. Gu, A.; Goel, K.; and R ´e, C. 2021. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 . Ienco, D.; and Gaetano, R. 2007. Tiselac: time series land cover classification challenge. TiSeLaC: Time Series Land Cover Classification Challenge , 2. Karim, F.; Majumdar, S.; Darabi, H.; and Chen, S. 2017. LSTM fully convolutional networks for time series classifi- cation. IEEE access , 6: 1662–1669.Karim, F.; Majumdar, S.; Darabi, H.; and Harford, S. 2019. Multivariate LSTM-FCNs for time series classifica- tion. Neural networks , 116: 237–245. Lai, G.; Chang, W.-C.; Yang, Y .; and Liu, H. 2018. Modeling long-and short-term temporal patterns with deep neural net- works. In The 41st international ACM SIGIR conference on research & development in information retrieval , 95–104. Lai, Z.-H.; Zhang, T.-H.; Liu, Q.; Qian, X.; Wei, L.-F.; Chen, S.-L.; Chen, F.; and Yin, X.-C. 2023. InterFormer: Interac- tive Local and Global Features Fusion for Automatic Speech Recognition. arXiv preprint arXiv:2305.16342 . Li, G.; Choi, B.; Xu, J.; Bhowmick, S. S.; Chun, K.-P.; and Wong, G. L.-H. 2021. Shapenet: A shapelet-neural network approach for multivariate time series classification. In Pro- ceedings of the AAAI conference on artificial intelligence , volume 35, 8375–8383. Li, Y .; Yang, G.; Su, Z.; Li, S.; and Wang, Y . 2023a. Human activity recognition based on multienvironment sensor data. Information Fusion , 91: 47–63. Li, Z.; Rao, Z.; Pan, L.; and Xu, Z. 2023b. Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. arXiv preprint arXiv:2302.04501 . Lines, J.; Davis, L. M.; Hills, J.; and Bagnall, A. 2012. A shapelet transform for time series classification. In Proceed- ings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , 289–297. Liu, H.; Yang, D.; Liu, X.; Chen, X.; Liang, Z.; Wang, H.; Cui, Y .; and Gu, J. 2024. Todynet: temporal dynamic graph neural network for multivariate time series classification. In- formation Sciences , 120914. Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; and Xu, Q. 2022. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neu- ral Information Processing Systems , 35: 5816–5828. Luo, D.; and Wang, X. 2024. Moderntcn: A modern pure convolution structure for general time series analysis. In The Twelfth International Conference on Learning Repre- sentations . Nie, Y .; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Fore- casting with Transformers. In The Eleventh International Conference on Learning Representations . Rußwurm, M.; Courty, N.; Emonet, R.; Lef `evre, S.; Tuia, D.; and Tavenard, R. 2023. End-to-end learned early classifica- tion of time series for in-season crop type mapping. ISPRS Journal of Photogrammetry and Remote Sensing , 196: 445– 456. Sch¨afer, P.; and Leser, U. 2017. Multivariate time se- ries classification with WEASEL+ MUSE. arXiv preprint arXiv:1711.11343 . Tang, W.; Long, G.; Liu, L.; Zhou, T.; Blumenstein, M.; and Jiang, J. 2022. Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification. In International Conference on Learning Representations . Page 9: Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; and Xiao, Y . 2023. Micn: Multi-scale local and global context model- ing for long-term series forecasting. In The eleventh inter- national conference on learning representations . Wang, W. K.; Chen, I.; Hershkovich, L.; Yang, J.; Shetty, A.; Singh, G.; Jiang, Y .; Kotla, A.; Shang, J. Z.; Yerrabelli, R.; et al. 2022. A systematic review of time series classification techniques used in biomedical applications. Sensors , 22(20): 8016. Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; and Sun, L. 2022. Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 . Wu, H.; Hu, T.; Liu, Y .; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for Gen- eral Time Series Analysis. In The Eleventh International Conference on Learning Representations . Wu, H.; Wu, J.; Xu, J.; Wang, J.; and Long, M. 2022. Flow- former: Linearizing transformers with conservation flows. arXiv preprint arXiv:2202.06258 . Yang, J.; Nguyen, M. N.; San, P. P.; Li, X.; and Krish- naswamy, S. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Ijcai, volume 15, 3995–4001. Buenos Aires, Argentina. Yu, W.; Kim, I. Y .; and Mechefske, C. 2021. Analysis of different RNN autoencoder variants for time series classifi- cation and machine prognostics. Mechanical Systems and Signal Processing , 149: 107322. Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2023. Are trans- formers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence , volume 37, 11121–11128. Zhang, T.; Zhang, Y .; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; and Li, J. 2022. Less is more: Fast multivariate time se- ries forecasting with light sampling-oriented mlp structures. arXiv preprint arXiv:2207.01186 . Zhang, X.; Gao, Y .; Lin, J.; and Lu, C.-T. 2020. Tapnet: Multivariate time series classification with attentional proto- typical network. In Proceedings of the AAAI conference on artificial intelligence , volume 34, 6845–6852. Zhang, Y .; and Yan, J. 2023. Crossformer: Transformer uti- lizing cross-dimension dependency for multivariate time se- ries forecasting. In The eleventh international conference on learning representations . Zhao, B.; Lu, H.; Chen, S.; Liu, J.; and Wu, D. 2017. Convo- lutional neural networks for time series classification. Jour- nal of systems engineering and electronics , 28(1): 162–169. Zhao, J.; Li, Q.; Hong, Y .; and Shen, M. 2024. MetaRock- ETC: Adaptive Encrypted Traffic Classification in Complex Network Environments via Time Series Analysis and Meta- Learning. IEEE Transactions on Network and Service Man- agement . Zheng, Y .; Liu, Q.; Chen, E.; Ge, Y .; and Zhao, J. L. 2014. Time series classification using multi-channels deep convo- lutional neural networks. In International conference on web-age information management , 298–310. Springer.Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; and Jin, R. 2022. Fedformer: Frequency enhanced decomposed trans- former for long-term series forecasting. In International conference on machine learning , 27268–27286. PMLR. Zuo, R.; Li, G.; Choi, B.; Bhowmick, S. S.; Mah, D. N.- y.; and Wong, G. L. 2023. SVP-T: a shape-level variable- position transformer for multivariate time series classifica- tion. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 37, 11497–11505.

---