loader
Generating audio...

arxiv

Paper 2412.14738

Training Graph Neural Networks Using Non-Robust Samples

Authors: Yongyu Wang

Published: 2024-12-19

Abstract:

Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph -- structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the original training set to construct a smaller yet more effective training set for model training. These samples are used to help improve the model's ability to correctly process data in noisy environments. We have evaluated our approach on three of the most classical GNN models -- GCN, GAT, and GraphSAGE -- as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the training of Graph Neural Networks compared to using randomly sampled training sets of the same size from the original training set and the larger original full training set.

Paper Content:
Page 1: Training Graph Neural Networks Using Non-Robust Samples Yongyu Wang JD Logistics wangyongyu1@jd.com Abstract Graph Neural Networks (GNNs) are a highly ef- fective neural network architecture for processing graph-structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph struc- ture, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior per- formance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the origi- nal training set to construct a smaller yet more effective training set for model training. These samples are used to help improve the model’s ability to correctly process data in noisy environ- ments. We have evaluated our approach on three of the most classical GNN models—GCN, GAT, and GraphSAGE—as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the training of Graph Neural Networks compared to using ran- domly sampled training sets of the same size from the original training set and the larger original full training set. 1. Introduction The rise of Graph Neural Networks (GNNs) has brought about a transformative shift in the field of machine learning, particularly for tasks involving graph-structured data (Kipf & Welling, 2016; Veli ˇckovi ´c et al., 2017; Hu et al., 2020; Zhou et al., 2020). These networks have demonstrated suc- cess across a wide array of real-world scenarios, including recommendation systems (Fan et al., 2019), traffic forecast- ing (Yu et al., 2017), chip design (Mirhoseini et al., 2021),language processing (Song et al., 2019) and social network analytics(Ying et al., 2018). The unique advantage of GNNs lies in their ability to effectively leverage the graph that represents the relationships between data points to optimize their feature representations. This enables GNNs to fully uti- lize the crucial graph topology inherent in graph-structured data, unlike traditional neural networks, which optimize data representations based solely on feature vectors, thus overlooking the valuable graph structure. However, like other neural networks, GNNs are susceptible to noise in the feature space(Goodfellow et al., 2018; Fawzi et al., 2018). Additionally, the introduction of graph struc- ture introduces another layer of interference, as GNNs are also affected by noise within the graph itself. Research has shown that even slight modifications to the graph—such as adding, removing, or rearranging a small number of edges—can significantly impact the performance of GNNs (Z¨ugner et al., 2018; Xu et al., 2019). The feature vectors contaminated by noise, along with er- roneous and redundant edges in the graph topology, can be viewed as adversarial attacks that mislead the GNNs. To ad- dress this issue, researchers have proposed various methods to reduce the impact of noise on the model’s performance. These approaches can be divided into two categories: re- active and proactive defenses. Reactive defenses focus on identifying adversarial examples within the model’s inputs, as explored in previous studies(Feinman et al., 2017; Metzen et al., 2017; Yang et al., 2020). On the other hand, proactive defenses aim to strengthen the model’s robustness, making it less susceptible to adversarial attacks, as demonstrated by methods proposed in works such as (Agarwal et al., 2019; Liu et al., 2018). However, due to the diversity of noise types and model structures, developing effective methods to defend against noise across various scenarios remains a significant chal- lenge. Unlike existing approaches that focus on mitigating the effects of noise, we argue that, given the inevitability of noise, the focus should shift toward how to better leverage it. From this new perspective, this paper proposes a framework that first identifies the most noise-sensitive training samples and then uses these samples to construct a smaller but more 1arXiv:2412.14738v6 [cs.LG] 21 Jan 2025 Page 2: Training Graph Neural Networks Using Non-Robust Samples effective training set for training the GNN model. This approach enhances the model’s ability to correctly process noisy data. We evaluated our proposed training method using three classical GNN models—GCN (Kipf & Welling, 2016), GAT (Veli ˇckovi ´c et al., 2017), and GraphSAGE (Hamil- ton et al., 2017)—on three widely used benchmark datasets for GNN performance evaluation: Cora, Citeseer, and PubMed. The experimental results demonstrate that our method achieves the best training performance compared to two other approaches: a training set of the same size con- structed by random sampling from the original full training set, and the larger original full training set. 2. Preliminaries 2.1. The Vulnerability of GNNs to Noise The vulnerability of Graph Neural Networks (GNNs) to noise refers to the difficulty GNNs face in maintaining con- sistent outputs when exposed to perturbations in the graph structure, such as the presence of incorrect or redundant edges in the input graph, or errors in the feature representa- tions within the input feature matrix. (Sharma et al., 2023) investigated the task- and model-agnostic vulnerabilities of GNNs, showing that these networks are very vulnerable to adversarial perturbations, regardless of the downstream task. 2.2. Spectral Graph Theory Spectral Graph Theory is a mathematical method that uses the eigenvalues and eigenvectors of the Laplacian matrix of a graph to study its properties. Given a graph G= (V, E, w ), where Vrepresents the set of vertices, Eis the set of edges, andwis a function assigning positive weights to each edge. The Laplacian matrix of graph G, which is a symmetric diagonally dominant (SDD) matrix, is defined as follows: L(p, q) =  −w(p, q) if(p, q)∈E,P (p,t)∈Ew(p, t)ifp=q, 0 otherwise .(1) For better numerical stability and to make the Laplacian ma- trix scale-invariant, the normalized Laplacian matrix Lnorm can be computed as: Lnorm=D−1/2LD−1/2 According to spectral graph theory, analyzing a graph often requires only one or a few specific eigenvalues and their corresponding eigenvectors (Chung, 1997). For example, when analyzing the clustering structure of a graph usingspectral graph theory, only the smallest eigenvalues and their corresponding eigenvectors are useful(V on Luxburg, 2007). Therefore, for a Laplacian matrix, it is unnecessary to compute all the eigenpairs through direct eigen decompo- sition, and for a large-scale graph, the computational cost would be too high. The Courant-Fischer Minimax Theo- rem(Golub & Van Loan, 2013) allows the eigenvalues to be computed iteratively without explicitly calculating all the eigenvalues. Specifically: The k-th largest eigenvalue of the Laplacian matrix L∈ R|V|×|V|can be computed as follows: λk(L) = min dim(U)=kmax x∈U, x̸=0xTLx xTx In many practical applications, the problem-solving process involves not just a single matrix, but rather the relationship between two matrices. Therefore, the Courant-Fischer Min- imax Theorem is extended to the generalized case, resulting in the following Generalized Courant-Fischer Minimax The- orem: Consider two Laplacian matrices LX∈R|V|×|V|and LY∈R|V|×|V|, where it holds that null(LY)⊆null(LX). Under the condition 1≤k≤rank(LY), thek-th largest eigenvalue of the matrix L+ YLXcan be computed as follows: λk(L+ YLX) = min dim(U)=k, U⊥null(LY)max x∈UxTLXx xTLYx 3. Method Figure 1 shows the proposed method for training GNNs using top non-robust training samples in the training set, which consists of four key phases: Phase (1) is GNN embed- ding, which uses the original input to train a GNN and takes the output of the hidden layer as the embedding result of the data’s feature matrix; In Phase (2), we use k-NN graphs to capture the underlying manifolds of the input feature matrix and the embedded feature matrix; In Phase (3), we calculate each node’s level of non-robustness using the input mani- fold and the output manifold; In Phase (4), we use the top non-robust samples selected from the training set to train the GNN. In the rest of this section, we provide a detailed description of these four phases. 3.1. Phase 1: GNN Embedding (Cheng et al., 2021) proposed leveraging a bijective distance mapping between the manifolds of a model’s input and output data to analyze the adversarial robustness of each data point for a given model. To obtain the output data of the GNN model, we first train 2 Page 3: Training Graph Neural Networks Using Non-Robust Samples Figure 1. Overview of the proposed method. the GNN using the original feature matrix and input graph. Once trained, the GNN is then used to transform the input feature matrix. Since the final layer of a GNN is typically task-specific—such as a softmax layer for multi-class classi- fication tasks—we take the output of the hidden layer before the final output layer as the embedding result. 3.2. Phase 2: Manifold Capturing The manifold of the data is typically captured using graphs, and k-nearest neighbor graphs are the most widely used for this purpose. In this phase, we use k-nearest neighbor (k-NN) graphs to capture the underlying manifolds of the input feature matrix and the embedded feature matrix. The conventional k-NN graph construction algorithm exhibits a time complexity of O(|N|2), where |N|is the total number of nodes in the graph. Such quadratic complexity makes it impractical for large-scale datasets. To overcome this limita- tion, an approximate kNN graph construction algorithm canbe employed (Malkov & Yashunin, 2018). This approach leverages an extension of the probabilistic skip list structure to achieve a reduced time complexity of O(|N|log|N|), which is more suitable for large-scale data processing. 3.3. Phase 3: Adversarial Robustness Evaluation LetLXandLYdenote the Laplacian matrices correspond- ing to the k-NN graphs of the input feature matrix and the embedded feature matrix constructed in phase 2, re- spectively. Building on the theoretical framework intro- duced in (Cheng et al., 2021), we leverage the largest gen- eralized eigenvalues and their associated eigenvectors of L+ YLXto assess the robustness of each individual sample to noise during the processing by a given model. Specif- ically, we calculate the weighted eigensubspace matrix Vs∈R|V|×sto perform spectral embedding of the input manifold GX= (V, EX), where |V|represents the total 3 Page 4: Training Graph Neural Networks Using Non-Robust Samples number of nodes. The matrix Vsis expressed as: Vs=v1√ζ1, v 2√ζ2, . . . , v s√ζs , where ζ1≥ζ2≥ ··· ≥ ζsare the top seigenvalues of L+ YLX, and v1, v2, . . . , v sare the corresponding eigenvec- tors. Using Vs, we project the nodes of GXinto an s-dimensional space, representing each node pas the p-th row of Vs. Ac- cording to (Cheng et al., 2021), the stability of an edge (p, q)∈EXis quantified by computing the spectral embed- ding distance between nodes pandq: ∥V⊤ sep,q∥2 2. To assess the robustness of a node p, a metric referred to as the Spade score is defined as the average embedding distance to the neighbors within the input manifold: Spade (p) =1 |NX(p)|X q∈NX(p)∥V⊤ sep,q∥2 2, where NX(p)represents the set of neighbors of node pin GX. A higher Spade (p)value suggests that node pis more prone to adversarial attacks. 3.4. Phase 4: GNN Training For the samples in the training set, we rank them in descend- ing order based on their Spade scores and select the top portion to form a new, smaller training set. The points in this new training set are those that are highly susceptible to noise and likely to mislead the GNN. By training the GNN using only these noise-sensitive samples, we are effectively enhancing the GNN’s ability to resist being misled by such samples. 4. Experiments We have evaluated our framework on three of the most popular GNN models and three classic benchmark datasets. 4.1. Experimental Setup 4.1.1. D ATA SETS We evaluate the proposed method on three standard citation network benchmarks, where nodes represent documents, edges correspond to citations, and each node is associated with a sparse bag-of-words feature vector and a class la- bel. The key statistics of these datasets are summarized in Table 1. •Cora : A dataset of scientific publications categorized into seven classes.•Citeseer : A dataset of scientific publications that is sparser than Cora and exhibits a more uneven class distribution, making it more challenging for graph- based classification methods. •PubMed : A large biomedical dataset where nodes are represented by TF-IDF word vectors from a 500-word vocabulary. Table 1. Statistics of datasets used in our experiments. Dataset Nodes Edges Classes Features Cora 2,708 5,429 7 1,433 Citeseer 3,327 4,732 6 3,703 PubMed 19,717 44,338 3 500 4.1.2. GNN M ODELS We use the following three commonly used GNN models for our experiments: •Graph Convolutional Network (GCN): A widely used model that applies convolution-like operations to graph-structured data by aggregating information from neighboring nodes using a layer-wise propagation rule. •Graph Attention Network (GAT): Introduces an attention mechanism to assign different importance weights to neighboring nodes during the message- passing process. •GraphSAGE: An inductive model that learns aggregation functions to generate node embeddings by sampling and aggregating features from a fixed number of neighboring nodes. 4.1.3. D ETAILED MODEL IMPLEMENTATION AND PARAMETER SETTINGS •GCN: We use the same GCN architecture and parame- ter settings for all three datasets. The model consists of two graph convolutional layers: the first layer trans- forms node features to a 64-dimensional space, and the second layer maps them to the number of output classes, with ReLU activation applied after the first convolution. For the Cora dataset, the learning rate is set to 0.01. For the Citeseer and PubMed datasets, it is set to 0.1. •GAT: For the Cora dataset, the GAT model consists of two layers: the first layer has 4 attention heads with 8 hidden units each, followed by an ELU activation, and the second layer has 1 attention head that outputs 4 Page 5: Training Graph Neural Networks Using Non-Robust Samples the number of class logits. For the Citeseer dataset, the GAT model also has two layers: the first layer uses 8 attention heads with 8 hidden units each, followed by an ELU activation, and the second layer has 1 attention head that outputs the class logits. For the PubMed dataset, the GAT model has the same structure as the Citeseer model but includes layer normalization after each layer and uses a Leaky ReLU activation after the first layer. For the Cora dataset, the learning rate is set to 0.01. For the Citeseer dataset, it is set to 0.1, and for the PubMed dataset, it is set to 0.05. •GraphSAGE: The GraphSAGE architecture for the Cora and Citeseer datasets consists of two layers: the first layer uses SAGEConv with 64 hidden units, fol- lowed by batch normalization, ReLU activation, and a dropout layer with a probability of 0.5. The second layer uses SAGEConv with 64 input features and out- puts the class labels. The architecture for the PubMed dataset also consists of two layers, but the first layer usesSAGEConv with 32 hidden units, followed by batch normalization, ReLU activation, and a dropout layer. The second layer uses SAGEConv with 32 in- put features and outputs the class labels. For all three datasets, the learning rate is set to 0.05. For all the experiments, the Adam optimizer is used with a weight decay of 5×10−4, and the loss function for multi- class classification is CrossEntropyLoss. The number of eigenvectors is set to 10, and the value of kin k-NN is set to 20. 4.1.4. R ESULTS We first perform an 80-20 split of the entire dataset by randomly sampling 80% of the samples to form the original full training set and using the remaining 20% as the test set. Then, we construct two different types of training sets: (1) forming the training set by selecting the top 80% of non- robust samples from the original full training set using our method; (2) forming the training set by randomly sampling 80% of the original full training set. We train the GNN models using the original full training set and the two smaller training sets mentioned above, and the classification accuracy across different epochs is shown in Table 2 and Figure 2. After training for five epochs, the comparison of the performance of the models trained with the three training sets is presented in Figure 3. In these charts, Random represents the training set constructed by randomly sampling 80% of the original full training set,Ours represents the training set constructed using our method, and Full represents the original full training set. From the experimental results, it can be observed that in all 9 combinations of the three GNN models and the threedatasets, the training set constructed using our method achieved the highest classification accuracy after 5 epochs of training. Moreover, the advantage of our method is partic- ularly evident on the largest dataset, PubMed. On PubMed, our method outperforms the second-best method by 14.1% , 7.4% , and 3.4% in accuracy for the GCN, GAT, and Graph- SAGE models, respectively. Our method not only significantly outperforms the randomly sampled training sets of the same size but also demonstrates substantial advantages compared to the full training sets, which are 20% larger, highlighting the effectiveness of our approach. The experimental results further show that ran- domly selecting 80% of the training samples from the full training set yields almost no difference in performance com- pared to using the full set. However, when 80% of the samples are selected using our method, which prioritizes the most non-robust nodes, the training performance is sig- nificantly improved. This demonstrates that focusing on non-robust samples during training, rather than uniformly sampling all nodes, enables the model to better adapt to unavoidable noise, resulting in superior performance on the test set. 5. Conclusion This work introduces a novel method for training Graph Neural Networks (GNNs) by utilizing the top non-robust training samples from the training set. First, we embed the data using a GNN to obtain feature representations, followed by the construction of k-nearest neighbor (k-NN) graphs to capture the underlying data manifolds of both the input and embedded feature matrices. In the third phase, we assess each node’s robustness using spectral adversarial robustness evaluation, which quantifies their susceptibility to noise. Finally, we enhance the GNN’s robustness by training it exclusively on the small training set composed of the non- robust samples identified from the training set. Experimental results demonstrate that our method significantly improves the training of the three most popular GNN models on three widely-used benchmark datasets. 5 Page 6: Training Graph Neural Networks Using Non-Robust Samples Table 2. Classification Accuracy Across Epochs for GCN, GAT, and GraphSAGE on Cora, Citeseer, and PubMed Model Dataset Training Set Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5 GCNCora Random 0.2939 0.3290 0.5360 0.7209 0.7671 Ours 0.5416 0.6580 0.6525 0.7375 0.8041 Full 0.2939 0.3290 0.5527 0.7227 0.8041 Citeseer Random 0.2331 0.6030 0.6030 0.6256 0.6256 Ours 0.4797 0.5278 0.5609 0.6316 0.6947 Full 0.2406 0.5925 0.5925 0.6541 0.6541 PubMed Random 0.3918 0.4420 0.5836 0.6191 0.6191 Ours 0.3974 0.5181 0.7515 0.8052 0.8080 Full 0.3921 0.4484 0.6082 0.6082 0.6670 GATCora Random 0.3124 0.3623 0.4603 0.6488 0.7634 Ours 0.4954 0.6007 0.6506 0.7726 0.8299 Full 0.3216 0.3604 0.4713 0.6654 0.7782 Citeseer Random 0.6451 0.6451 0.6451 0.6917 0.6917 Ours 0.5594 0.5504 0.6105 0.7053 0.7504 Full 0.6331 0.6331 0.6617 0.7158 0.7158 PubMed Random 0.4027 0.4603 0.5787 0.5787 0.5787 Ours 0.4174 0.3977 0.4154 0.5580 0.6528 Full 0.3987 0.4415 0.5455 0.5455 0.5455 GraphSAGECora Random 0.4824 0.7689 0.7745 0.7893 0.8041 Ours 0.4991 0.7006 0.7486 0.8115 0.8244 Full 0.4880 0.7745 0.7837 0.8059 0.8244 Citeseer Random 0.4797 0.6737 0.6902 0.7038 0.7113 Ours 0.4857 0.6451 0.6692 0.7188 0.7624 Full 0.4812 0.6872 0.6872 0.6992 0.7218 PubMed Random 0.3974 0.6429 0.7406 0.7697 0.7735 Ours 0.3977 0.5945 0.6523 0.7210 0.8078 Full 0.3974 0.6378 0.7484 0.7652 0.7662 6 Page 7: Training Graph Neural Networks Using Non-Robust Samples (a) GCN (Cora) (b) GCN (Citeseer) (c) GCN (PubMed) (d) GAT (Cora) (e) GAT (Citeseer) (f) GAT (PubMed) (g) GraphSAGE (Cora) (h) GraphSAGE (Citeseer) (i) GraphSAGE (PubMed) Figure 2. Comparison of Classification Accuracy Across Epochs for GCN, GAT, and GraphSAGE Models on Cora, Citeseer, and PubMed Datasets 7 Page 8: Training Graph Neural Networks Using Non-Robust Samples (a) GCN (Cora) (b) GCN (Citeseer) (c) GCN (PubMed) (d) GAT (Cora) (e) GAT (Citeseer) (f) GAT (PubMed) (g) GraphSAGE (Cora) (h) GraphSAGE (Citeseer) (i) GraphSAGE (PubMed) Figure 3. Comparison of Classification Accuracy After 5 Epochs for Three Training Sets 8 Page 9: Training Graph Neural Networks Using Non-Robust Samples References Agarwal, C., Nguyen, A., and Schonfeld, D. Improving robustness to adversarial examples by encouraging dis- criminative features. In 2019 IEEE International Confer- ence on Image Processing (ICIP) , pp. 3801–3505. IEEE, 2019. Cheng, W., Deng, C., Zhao, Z., Cai, Y ., Zhang, Z., and Feng, Z. Spade: A spectral method for black-box adversarial robustness evaluation. In International Conference on Machine Learning , pp. 1814–1824. PMLR, 2021. Chung, F. R. Spectral graph theory , volume 92. American Mathematical Soc., 1997. Fan, W., Ma, Y ., Li, Q., He, Y ., Zhao, E., Tang, J., and Yin, D. Graph neural networks for social recommendation. In The world wide web conference , pp. 417–426, 2019. Fawzi, A., Fawzi, O., and Frossard, P. Analysis of classifiers’ robustness to adversarial perturbations. Machine learning , 107(3):481–508, 2018. Feinman, R., Curtin, R. R., Shintre, S., and Gardner, A. B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 , 2017. Golub, G. H. and Van Loan, C. F. Matrix computations . JHU press, 2013. Goodfellow, I., McDaniel, P., and Papernot, N. Making machine learning robust against adversarial inputs. Com- munications of the ACM , 61(7):56–66, 2018. Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre- sentation learning on large graphs. Advances in neural information processing systems , 30, 2017. Hu, Z., Dong, Y ., Wang, K., Chang, K.-W., and Sun, Y . Gpt- gnn: Generative pre-training of graph neural networks. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pp. 1857–1867, 2020. Kipf, T. N. and Welling, M. Semi-supervised classifica- tion with graph convolutional networks. arXiv preprint arXiv:1609.02907 , 2016. Liu, X., Li, Y ., Wu, C., and Hsieh, C.-J. Adv-bnn: Improved adversarial defense through robust bayesian neural net- work. arXiv preprint arXiv:1810.01279 , 2018. Malkov, Y . A. and Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence , 42(4):824– 836, 2018.Metzen, J. H., Genewein, T., Fischer, V ., and Bischoff, B. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 , 2017. Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W., Songhori, E., Wang, S., Lee, Y .-J., Johnson, E., Pathak, O., Nazi, A., et al. A graph placement methodology for fast chip design. Nature , 594(7862):207–212, 2021. Sharma, K., Verma, S., Medya, S., Bhattacharya, A., and Ranu, S. Task and model agnostic adversarial attack on graph neural networks. In Proceedings of the AAAI Con- ference on Artificial Intelligence , volume 37, pp. 15091– 15099, 2023. Song, M., Zhan, Z., and Haihong, E. Hierarchical schema representation for text-to-sql parsing with decomposing decoding. IEEE Access , 7:103706–103715, 2019. Veliˇckovi ´c, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y . Graph attention networks. arXiv preprint arXiv:1710.10903 , 2017. V on Luxburg, U. A tutorial on spectral clustering. Statistics and computing , 17:395–416, 2007. Xu, K., Chen, H., Liu, S., Chen, P.-Y ., Weng, T.-W., Hong, M., and Lin, X. Topology attack and defense for graph neural networks: An optimization perspective. arXiv preprint arXiv:1906.04214 , 2019. Yang, P., Chen, J., Hsieh, C.-J., Wang, J.-L., and Jordan, M. Ml-loo: Detecting adversarial examples with feature attribution. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pp. 6639–6647, 2020. Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph convolutional neural net- works for web-scale recommender systems. In Proceed- ings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining , pp. 974–983, 2018. Yu, B., Yin, H., and Zhu, Z. Spatio-temporal graph convo- lutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 , 2017. Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review of methods and applications. AI open , 1:57–81, 2020. Z¨ugner, D., Akbarnejad, A., and G ¨unnemann, S. Adversarial attacks on neural networks for graph data. In Proceed- ings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining , pp. 2847–2856, 2018. 9

---