TriD-MAE: Pre-Trained Model for Multivariate Time Series

2023 ZIBS Forum, on the theme of "Innovating for the Future: The Frontier in Business Excellence", was successfully held on January 13. The forum focused on topics such as data intelligence, environmental science, financial technology, digital innovation, and corporate restructuring. It shared the latest research findings, innovative discoveries, and practical experiences related to cutting-edge business developments, providing insightful perspectives and strategies for the sustainable development of the global business, technology, and education sectors.

In the Young Scholars Forum, ZIBS Haina Research Scientist LI Chao presented on "TriD-MAE: A Generic Pre-Trained Model for Multivariate Time Series with Missing Values", providing an effective tool for handling complex time series data with missing information. Here's the review:

Background and Motivation

In this work, we mainly focus on the multivariate time series, which is the most common data set in the industry field. And we need to use this data to do better analysis in prediction, classification, and anomaly detection to know whether this system is good or not and what its state is.

So, we must use the complete data set to do this research. However, in the actual system we can see in the left part, the sensors just collect the data from the environment itself and then transmit the data through the PLC to the RTU (Remote Transmission Unit) and then transmit this data to these companies. But when something goes wrong, for example, when the sensors go wrong, there is the data missing, and when the RTU goes wrong, there is the line missing. Maybe there's a time slot and all the data set here is missing. So it's called the line missing or the block missing. When they all go to a big failure, there will be an entire missing. With this data missing, we cannot do the original analysis because we don't have the complete data set.

So what can we do? There are two phases of methods:

Missing data imputation;

Downstream task training.

When working on the missing imputation, we need to impute the incomplete data to a full data set, and then we need to train this data for different downstream tasks.
However, here we find some issues:

Biased Imputation;

Training Redundancy (Prediction, Classification, Abnormal Detection, etc.);

Tricky Missing Patterns in MTS Data (line missing/block missing).

First, no matter what matter you use here, it's a way to induce bias. So, there's a best imputation problem. Second, when you are training, there is a training redundancy problem because when you do the data imputation, you already do the feature instruction. When you do the prediction, maybe the classification, you do it again. Third, we have the tricky missing patterns in MTS data, such as line missing and block missing. With the missing, many methods can't deal with that situation.

So here we propose to make a new pipeline called generic pre-trained pipeline.

Main Contribution of TriD-MAE

The first generic MTS missing data representation learning framework serving as a precursor to other specialized downstream models;

The mentioned biased-imputation problem is thoroughly avoided by our proposed DPE mechanism, which theoretically manages all missing patterns;

The proposed TriD-TCN structure, performing as the fundamental unit, is capable of generating solid transferable encoded embeddings, facilitating downstream convergence quickly.

Problem Statement

We have input X, and we have a missing binary matrix here, which is the binary masking matrix M. We use an auto-encoder to do this work. The most important thing is to project this X into a hidden representation H through an encoder, reconstruct it by a decoder, and intend to make H cover as much info the complete data as possible.

Model Architecture

We basically introduced several special designed architectures.

The Dilated Causal Convolution Network (DCCN). It is to extract and integrate both local and global sequential information because we have multiple time servers here;

The TriD-TCN Unit, which can handle high-dimensional multivariate long-time series;

Based on the TriD-TCN Unit, we found the TriD-TCN Block. This block works as an independent computing module in the AE and DPE;

Dynamic Positional Embedding (DPE), which can combine time-varying and variable-varying characteristics;

The Mask Learning & Loss Function. We use mask to deal with the missing pattern, and we have designed a special loss function. It is used to reconstruct the arrow of the masking part, and we can reconstruct the arrow of all the observer parts.

Experiments

We have tested this method in the public data set, including the Appliances Energy Prediction Data, Beijing Multi-Site, Air-Quality Data, and SML2010 in UCI Machine Learning Repository.

We compared it with several downstream models, including DARNN64, TPA-RNN64, Transformer128, Informer128, and Autoformer128. They have very good performance in time sequence.

We compare it with the traditional missing handle method, including multiple imputation, BRITS (Bidirectional Recurrent Imputation for Time Series), and E2GAN.

We validate our method, which is basically focused on the missing pattern of this system with different missing rates. We saw that the missing rate was from 10% to 80%. We can see that when the missing rate is very low, our method performs equally with the traditional sort method. But when the missing rate increases to 80%, our method outperforms.

We also do several other comparisons with the traditional pipeline. We compare our pipeline, in which some sophisticated downstream models are involved in the fine-tuning stage, with the traditional pipeline to verify the “Extreme Performance” of the proposed framework. Our TriD-MAE performs as the precursor of these downstream models, and the downstream model shares the same hyperparameters as the ones in the traditional pipeline.

Conclusion

TriD-MAE contributions:

The first generic MTS missing data representation learning framework serving as a precursor to other specialized downstream models;

The mentioned biased-imputation problem is thoroughly avoided by our proposed DPE mechanism theoretically managing all missing patterns;

The proposed TriD-TCN structure performing as the fundamental unit is capable of generating solid transferable encoded embeddings facilitating downstream convergence quickly.

Our future work is to extend to Spatial-Temporal Data and to strengthen the capability to deal with long sequences.

*This article is based on the speech made by ZIBS Haina Research Scientist LI Chao at ZIBS 2023 Forum. The views and opinions expressed in this article are those of the speaker and do not necessarily reflect the views or positions of ZIBS.

LI Chao

Haina Research Scientist

Dr. LI Chao is currently a Haina Research Scientist at ZIBS. He obtained his BSc (2012) and PhD (2019) degrees from Zhejiang University. He spent 4 years at the College of Control Science and Engineering as a research assistant at Zhejiang University. His primary research interests include data mining and its applications in smart city and industrial intelligence. His work has been published in leading academic journals such as IEEE TITS, TMC, TCSS, National Science Open and Communications Physics, among others.

TriD-MAE: Pre-Trained Model for Multivariate Time Series

About ZIBS

Contact us

Get the latest ZIBS Program Brochure

Latest Events

TriD-MAE: Pre-Trained Model for Multivariate Time Series

TriD-MAE: Pre-Trained Model for Multivariate Time Series

About ZIBS

Contact us

Get the latest ZIBS Program Brochure