Paper Content:
Page 1:
Training Graph Neural Networks Using Non-Robust Samples
Yongyu Wang
JD Logistics
wangyongyu1@jd.com
Abstract
Graph Neural Networks (GNNs) are a highly ef-
fective neural network architecture for processing
graph-structured data. Unlike traditional neural
networks that rely solely on the features of the
data as input, GNNs leverage both the graph struc-
ture, which represents the relationships between
data points, and the feature matrix of the data to
optimize their feature representation. This unique
capability enables GNNs to achieve superior per-
formance across various tasks. However, it also
makes GNNs more susceptible to noise from both
the graph structure and data features, which can
significantly increase the training difficulty and
degrade their performance. To address this issue,
this paper proposes a novel method for selecting
noise-sensitive training samples from the origi-
nal training set to construct a smaller yet more
effective training set for model training. These
samples are used to help improve the model’s
ability to correctly process data in noisy environ-
ments. We have evaluated our approach on three
of the most classical GNN models—GCN, GAT,
and GraphSAGE—as well as three widely used
benchmark datasets: Cora, Citeseer, and PubMed.
Our experiments demonstrate that the proposed
method can substantially boost the training of
Graph Neural Networks compared to using ran-
domly sampled training sets of the same size from
the original training set and the larger original full
training set.
1. Introduction
The rise of Graph Neural Networks (GNNs) has brought
about a transformative shift in the field of machine learning,
particularly for tasks involving graph-structured data (Kipf
& Welling, 2016; Veli ˇckovi ´c et al., 2017; Hu et al., 2020;
Zhou et al., 2020). These networks have demonstrated suc-
cess across a wide array of real-world scenarios, including
recommendation systems (Fan et al., 2019), traffic forecast-
ing (Yu et al., 2017), chip design (Mirhoseini et al., 2021),language processing (Song et al., 2019) and social network
analytics(Ying et al., 2018). The unique advantage of GNNs
lies in their ability to effectively leverage the graph that
represents the relationships between data points to optimize
their feature representations. This enables GNNs to fully uti-
lize the crucial graph topology inherent in graph-structured
data, unlike traditional neural networks, which optimize
data representations based solely on feature vectors, thus
overlooking the valuable graph structure.
However, like other neural networks, GNNs are susceptible
to noise in the feature space(Goodfellow et al., 2018; Fawzi
et al., 2018). Additionally, the introduction of graph struc-
ture introduces another layer of interference, as GNNs are
also affected by noise within the graph itself. Research has
shown that even slight modifications to the graph—such
as adding, removing, or rearranging a small number of
edges—can significantly impact the performance of GNNs
(Z¨ugner et al., 2018; Xu et al., 2019).
The feature vectors contaminated by noise, along with er-
roneous and redundant edges in the graph topology, can be
viewed as adversarial attacks that mislead the GNNs. To ad-
dress this issue, researchers have proposed various methods
to reduce the impact of noise on the model’s performance.
These approaches can be divided into two categories: re-
active and proactive defenses. Reactive defenses focus on
identifying adversarial examples within the model’s inputs,
as explored in previous studies(Feinman et al., 2017; Metzen
et al., 2017; Yang et al., 2020). On the other hand, proactive
defenses aim to strengthen the model’s robustness, making
it less susceptible to adversarial attacks, as demonstrated by
methods proposed in works such as (Agarwal et al., 2019;
Liu et al., 2018).
However, due to the diversity of noise types and model
structures, developing effective methods to defend against
noise across various scenarios remains a significant chal-
lenge. Unlike existing approaches that focus on mitigating
the effects of noise, we argue that, given the inevitability of
noise, the focus should shift toward how to better leverage it.
From this new perspective, this paper proposes a framework
that first identifies the most noise-sensitive training samples
and then uses these samples to construct a smaller but more
1arXiv:2412.14738v6 [cs.LG] 21 Jan 2025
Page 2:
Training Graph Neural Networks Using Non-Robust Samples
effective training set for training the GNN model. This
approach enhances the model’s ability to correctly process
noisy data.
We evaluated our proposed training method using three
classical GNN models—GCN (Kipf & Welling, 2016),
GAT (Veli ˇckovi ´c et al., 2017), and GraphSAGE (Hamil-
ton et al., 2017)—on three widely used benchmark datasets
for GNN performance evaluation: Cora, Citeseer, and
PubMed. The experimental results demonstrate that our
method achieves the best training performance compared to
two other approaches: a training set of the same size con-
structed by random sampling from the original full training
set, and the larger original full training set.
2. Preliminaries
2.1. The Vulnerability of GNNs to Noise
The vulnerability of Graph Neural Networks (GNNs) to
noise refers to the difficulty GNNs face in maintaining con-
sistent outputs when exposed to perturbations in the graph
structure, such as the presence of incorrect or redundant
edges in the input graph, or errors in the feature representa-
tions within the input feature matrix. (Sharma et al., 2023)
investigated the task- and model-agnostic vulnerabilities of
GNNs, showing that these networks are very vulnerable
to adversarial perturbations, regardless of the downstream
task.
2.2. Spectral Graph Theory
Spectral Graph Theory is a mathematical method that uses
the eigenvalues and eigenvectors of the Laplacian matrix of a
graph to study its properties. Given a graph G= (V, E, w ),
where Vrepresents the set of vertices, Eis the set of edges,
andwis a function assigning positive weights to each edge.
The Laplacian matrix of graph G, which is a symmetric
diagonally dominant (SDD) matrix, is defined as follows:
L(p, q) =
−w(p, q) if(p, q)∈E,P
(p,t)∈Ew(p, t)ifp=q,
0 otherwise .(1)
For better numerical stability and to make the Laplacian ma-
trix scale-invariant, the normalized Laplacian matrix Lnorm
can be computed as:
Lnorm=D−1/2LD−1/2
According to spectral graph theory, analyzing a graph often
requires only one or a few specific eigenvalues and their
corresponding eigenvectors (Chung, 1997). For example,
when analyzing the clustering structure of a graph usingspectral graph theory, only the smallest eigenvalues and
their corresponding eigenvectors are useful(V on Luxburg,
2007). Therefore, for a Laplacian matrix, it is unnecessary
to compute all the eigenpairs through direct eigen decompo-
sition, and for a large-scale graph, the computational cost
would be too high. The Courant-Fischer Minimax Theo-
rem(Golub & Van Loan, 2013) allows the eigenvalues to be
computed iteratively without explicitly calculating all the
eigenvalues. Specifically:
The k-th largest eigenvalue of the Laplacian matrix L∈
R|V|×|V|can be computed as follows:
λk(L) = min
dim(U)=kmax
x∈U, x̸=0xTLx
xTx
In many practical applications, the problem-solving process
involves not just a single matrix, but rather the relationship
between two matrices. Therefore, the Courant-Fischer Min-
imax Theorem is extended to the generalized case, resulting
in the following Generalized Courant-Fischer Minimax The-
orem:
Consider two Laplacian matrices LX∈R|V|×|V|and
LY∈R|V|×|V|, where it holds that null(LY)⊆null(LX).
Under the condition 1≤k≤rank(LY), thek-th largest
eigenvalue of the matrix L+
YLXcan be computed as follows:
λk(L+
YLX) = min
dim(U)=k, U⊥null(LY)max
x∈UxTLXx
xTLYx
3. Method
Figure 1 shows the proposed method for training GNNs
using top non-robust training samples in the training set,
which consists of four key phases: Phase (1) is GNN embed-
ding, which uses the original input to train a GNN and takes
the output of the hidden layer as the embedding result of the
data’s feature matrix; In Phase (2), we use k-NN graphs to
capture the underlying manifolds of the input feature matrix
and the embedded feature matrix; In Phase (3), we calculate
each node’s level of non-robustness using the input mani-
fold and the output manifold; In Phase (4), we use the top
non-robust samples selected from the training set to train
the GNN. In the rest of this section, we provide a detailed
description of these four phases.
3.1. Phase 1: GNN Embedding
(Cheng et al., 2021) proposed leveraging a bijective distance
mapping between the manifolds of a model’s input and
output data to analyze the adversarial robustness of each
data point for a given model.
To obtain the output data of the GNN model, we first train
2
Page 3:
Training Graph Neural Networks Using Non-Robust Samples
Figure 1. Overview of the proposed method.
the GNN using the original feature matrix and input graph.
Once trained, the GNN is then used to transform the input
feature matrix. Since the final layer of a GNN is typically
task-specific—such as a softmax layer for multi-class classi-
fication tasks—we take the output of the hidden layer before
the final output layer as the embedding result.
3.2. Phase 2: Manifold Capturing
The manifold of the data is typically captured using graphs,
and k-nearest neighbor graphs are the most widely used
for this purpose. In this phase, we use k-nearest neighbor
(k-NN) graphs to capture the underlying manifolds of the
input feature matrix and the embedded feature matrix. The
conventional k-NN graph construction algorithm exhibits a
time complexity of O(|N|2), where |N|is the total number
of nodes in the graph. Such quadratic complexity makes it
impractical for large-scale datasets. To overcome this limita-
tion, an approximate kNN graph construction algorithm canbe employed (Malkov & Yashunin, 2018). This approach
leverages an extension of the probabilistic skip list structure
to achieve a reduced time complexity of O(|N|log|N|),
which is more suitable for large-scale data processing.
3.3. Phase 3: Adversarial Robustness Evaluation
LetLXandLYdenote the Laplacian matrices correspond-
ing to the k-NN graphs of the input feature matrix and
the embedded feature matrix constructed in phase 2, re-
spectively. Building on the theoretical framework intro-
duced in (Cheng et al., 2021), we leverage the largest gen-
eralized eigenvalues and their associated eigenvectors of
L+
YLXto assess the robustness of each individual sample
to noise during the processing by a given model. Specif-
ically, we calculate the weighted eigensubspace matrix
Vs∈R|V|×sto perform spectral embedding of the input
manifold GX= (V, EX), where |V|represents the total
3
Page 4:
Training Graph Neural Networks Using Non-Robust Samples
number of nodes. The matrix Vsis expressed as:
Vs=v1√ζ1, v 2√ζ2, . . . , v s√ζs
,
where ζ1≥ζ2≥ ··· ≥ ζsare the top seigenvalues of
L+
YLX, and v1, v2, . . . , v sare the corresponding eigenvec-
tors.
Using Vs, we project the nodes of GXinto an s-dimensional
space, representing each node pas the p-th row of Vs. Ac-
cording to (Cheng et al., 2021), the stability of an edge
(p, q)∈EXis quantified by computing the spectral embed-
ding distance between nodes pandq:
∥V⊤
sep,q∥2
2.
To assess the robustness of a node p, a metric referred to
as the Spade score is defined as the average embedding
distance to the neighbors within the input manifold:
Spade (p) =1
|NX(p)|X
q∈NX(p)∥V⊤
sep,q∥2
2,
where NX(p)represents the set of neighbors of node pin
GX. A higher Spade (p)value suggests that node pis more
prone to adversarial attacks.
3.4. Phase 4: GNN Training
For the samples in the training set, we rank them in descend-
ing order based on their Spade scores and select the top
portion to form a new, smaller training set. The points in
this new training set are those that are highly susceptible to
noise and likely to mislead the GNN. By training the GNN
using only these noise-sensitive samples, we are effectively
enhancing the GNN’s ability to resist being misled by such
samples.
4. Experiments
We have evaluated our framework on three of the most
popular GNN models and three classic benchmark datasets.
4.1. Experimental Setup
4.1.1. D ATA SETS
We evaluate the proposed method on three standard citation
network benchmarks, where nodes represent documents,
edges correspond to citations, and each node is associated
with a sparse bag-of-words feature vector and a class la-
bel. The key statistics of these datasets are summarized in
Table 1.
•Cora : A dataset of scientific publications categorized
into seven classes.•Citeseer : A dataset of scientific publications that is
sparser than Cora and exhibits a more uneven class
distribution, making it more challenging for graph-
based classification methods.
•PubMed : A large biomedical dataset where nodes are
represented by TF-IDF word vectors from a 500-word
vocabulary.
Table 1. Statistics of datasets used in our experiments.
Dataset Nodes Edges Classes Features
Cora 2,708 5,429 7 1,433
Citeseer 3,327 4,732 6 3,703
PubMed 19,717 44,338 3 500
4.1.2. GNN M ODELS
We use the following three commonly used GNN models
for our experiments:
•Graph Convolutional Network (GCN): A widely
used model that applies convolution-like operations to
graph-structured data by aggregating information from
neighboring nodes using a layer-wise propagation rule.
•Graph Attention Network (GAT): Introduces an
attention mechanism to assign different importance
weights to neighboring nodes during the message-
passing process.
•GraphSAGE: An inductive model that learns
aggregation functions to generate node embeddings
by sampling and aggregating features from a fixed
number of neighboring nodes.
4.1.3. D ETAILED MODEL IMPLEMENTATION AND
PARAMETER SETTINGS
•GCN: We use the same GCN architecture and parame-
ter settings for all three datasets. The model consists
of two graph convolutional layers: the first layer trans-
forms node features to a 64-dimensional space, and
the second layer maps them to the number of output
classes, with ReLU activation applied after the first
convolution. For the Cora dataset, the learning rate is
set to 0.01. For the Citeseer and PubMed datasets, it is
set to 0.1.
•GAT: For the Cora dataset, the GAT model consists of
two layers: the first layer has 4 attention heads with
8 hidden units each, followed by an ELU activation,
and the second layer has 1 attention head that outputs
4
Page 5:
Training Graph Neural Networks Using Non-Robust Samples
the number of class logits. For the Citeseer dataset, the
GAT model also has two layers: the first layer uses 8
attention heads with 8 hidden units each, followed by
an ELU activation, and the second layer has 1 attention
head that outputs the class logits. For the PubMed
dataset, the GAT model has the same structure as the
Citeseer model but includes layer normalization after
each layer and uses a Leaky ReLU activation after the
first layer. For the Cora dataset, the learning rate is set
to 0.01. For the Citeseer dataset, it is set to 0.1, and for
the PubMed dataset, it is set to 0.05.
•GraphSAGE: The GraphSAGE architecture for the
Cora and Citeseer datasets consists of two layers: the
first layer uses SAGEConv with 64 hidden units, fol-
lowed by batch normalization, ReLU activation, and
a dropout layer with a probability of 0.5. The second
layer uses SAGEConv with 64 input features and out-
puts the class labels. The architecture for the PubMed
dataset also consists of two layers, but the first layer
usesSAGEConv with 32 hidden units, followed by
batch normalization, ReLU activation, and a dropout
layer. The second layer uses SAGEConv with 32 in-
put features and outputs the class labels. For all three
datasets, the learning rate is set to 0.05.
For all the experiments, the Adam optimizer is used with a
weight decay of 5×10−4, and the loss function for multi-
class classification is CrossEntropyLoss. The number of
eigenvectors is set to 10, and the value of kin k-NN is set
to 20.
4.1.4. R ESULTS
We first perform an 80-20 split of the entire dataset by
randomly sampling 80% of the samples to form the original
full training set and using the remaining 20% as the test set.
Then, we construct two different types of training sets: (1)
forming the training set by selecting the top 80% of non-
robust samples from the original full training set using our
method; (2) forming the training set by randomly sampling
80% of the original full training set.
We train the GNN models using the original full training
set and the two smaller training sets mentioned above, and
the classification accuracy across different epochs is shown
in Table 2 and Figure 2. After training for five epochs,
the comparison of the performance of the models trained
with the three training sets is presented in Figure 3. In
these charts, Random represents the training set constructed
by randomly sampling 80% of the original full training
set,Ours represents the training set constructed using our
method, and Full represents the original full training set.
From the experimental results, it can be observed that in all
9 combinations of the three GNN models and the threedatasets, the training set constructed using our method
achieved the highest classification accuracy after 5 epochs
of training. Moreover, the advantage of our method is partic-
ularly evident on the largest dataset, PubMed. On PubMed,
our method outperforms the second-best method by 14.1% ,
7.4% , and 3.4% in accuracy for the GCN, GAT, and Graph-
SAGE models, respectively.
Our method not only significantly outperforms the randomly
sampled training sets of the same size but also demonstrates
substantial advantages compared to the full training sets,
which are 20% larger, highlighting the effectiveness of our
approach. The experimental results further show that ran-
domly selecting 80% of the training samples from the full
training set yields almost no difference in performance com-
pared to using the full set. However, when 80% of the
samples are selected using our method, which prioritizes
the most non-robust nodes, the training performance is sig-
nificantly improved. This demonstrates that focusing on
non-robust samples during training, rather than uniformly
sampling all nodes, enables the model to better adapt to
unavoidable noise, resulting in superior performance on the
test set.
5. Conclusion
This work introduces a novel method for training Graph
Neural Networks (GNNs) by utilizing the top non-robust
training samples from the training set. First, we embed the
data using a GNN to obtain feature representations, followed
by the construction of k-nearest neighbor (k-NN) graphs to
capture the underlying data manifolds of both the input and
embedded feature matrices. In the third phase, we assess
each node’s robustness using spectral adversarial robustness
evaluation, which quantifies their susceptibility to noise.
Finally, we enhance the GNN’s robustness by training it
exclusively on the small training set composed of the non-
robust samples identified from the training set. Experimental
results demonstrate that our method significantly improves
the training of the three most popular GNN models on three
widely-used benchmark datasets.
5
Page 6:
Training Graph Neural Networks Using Non-Robust Samples
Table 2. Classification Accuracy Across Epochs for GCN, GAT, and GraphSAGE on Cora, Citeseer, and PubMed
Model Dataset Training Set Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5
GCNCora Random 0.2939 0.3290 0.5360 0.7209 0.7671
Ours 0.5416 0.6580 0.6525 0.7375 0.8041
Full 0.2939 0.3290 0.5527 0.7227 0.8041
Citeseer Random 0.2331 0.6030 0.6030 0.6256 0.6256
Ours 0.4797 0.5278 0.5609 0.6316 0.6947
Full 0.2406 0.5925 0.5925 0.6541 0.6541
PubMed Random 0.3918 0.4420 0.5836 0.6191 0.6191
Ours 0.3974 0.5181 0.7515 0.8052 0.8080
Full 0.3921 0.4484 0.6082 0.6082 0.6670
GATCora Random 0.3124 0.3623 0.4603 0.6488 0.7634
Ours 0.4954 0.6007 0.6506 0.7726 0.8299
Full 0.3216 0.3604 0.4713 0.6654 0.7782
Citeseer Random 0.6451 0.6451 0.6451 0.6917 0.6917
Ours 0.5594 0.5504 0.6105 0.7053 0.7504
Full 0.6331 0.6331 0.6617 0.7158 0.7158
PubMed Random 0.4027 0.4603 0.5787 0.5787 0.5787
Ours 0.4174 0.3977 0.4154 0.5580 0.6528
Full 0.3987 0.4415 0.5455 0.5455 0.5455
GraphSAGECora Random 0.4824 0.7689 0.7745 0.7893 0.8041
Ours 0.4991 0.7006 0.7486 0.8115 0.8244
Full 0.4880 0.7745 0.7837 0.8059 0.8244
Citeseer Random 0.4797 0.6737 0.6902 0.7038 0.7113
Ours 0.4857 0.6451 0.6692 0.7188 0.7624
Full 0.4812 0.6872 0.6872 0.6992 0.7218
PubMed Random 0.3974 0.6429 0.7406 0.7697 0.7735
Ours 0.3977 0.5945 0.6523 0.7210 0.8078
Full 0.3974 0.6378 0.7484 0.7652 0.7662
6
Page 7:
Training Graph Neural Networks Using Non-Robust Samples
(a) GCN (Cora)
(b) GCN (Citeseer)
(c) GCN (PubMed)
(d) GAT (Cora)
(e) GAT (Citeseer)
(f) GAT (PubMed)
(g) GraphSAGE (Cora)
(h) GraphSAGE (Citeseer)
(i) GraphSAGE (PubMed)
Figure 2. Comparison of Classification Accuracy Across Epochs for GCN, GAT, and GraphSAGE Models on Cora, Citeseer, and PubMed
Datasets
7
Page 8:
Training Graph Neural Networks Using Non-Robust Samples
(a) GCN (Cora)
(b) GCN (Citeseer)
(c) GCN (PubMed)
(d) GAT (Cora)
(e) GAT (Citeseer)
(f) GAT (PubMed)
(g) GraphSAGE (Cora)
(h) GraphSAGE (Citeseer)
(i) GraphSAGE (PubMed)
Figure 3. Comparison of Classification Accuracy After 5 Epochs for Three Training Sets
8
Page 9:
Training Graph Neural Networks Using Non-Robust Samples
References
Agarwal, C., Nguyen, A., and Schonfeld, D. Improving
robustness to adversarial examples by encouraging dis-
criminative features. In 2019 IEEE International Confer-
ence on Image Processing (ICIP) , pp. 3801–3505. IEEE,
2019.
Cheng, W., Deng, C., Zhao, Z., Cai, Y ., Zhang, Z., and Feng,
Z. Spade: A spectral method for black-box adversarial
robustness evaluation. In International Conference on
Machine Learning , pp. 1814–1824. PMLR, 2021.
Chung, F. R. Spectral graph theory , volume 92. American
Mathematical Soc., 1997.
Fan, W., Ma, Y ., Li, Q., He, Y ., Zhao, E., Tang, J., and Yin,
D. Graph neural networks for social recommendation. In
The world wide web conference , pp. 417–426, 2019.
Fawzi, A., Fawzi, O., and Frossard, P. Analysis of classifiers’
robustness to adversarial perturbations. Machine learning ,
107(3):481–508, 2018.
Feinman, R., Curtin, R. R., Shintre, S., and Gardner, A. B.
Detecting adversarial samples from artifacts. arXiv
preprint arXiv:1703.00410 , 2017.
Golub, G. H. and Van Loan, C. F. Matrix computations .
JHU press, 2013.
Goodfellow, I., McDaniel, P., and Papernot, N. Making
machine learning robust against adversarial inputs. Com-
munications of the ACM , 61(7):56–66, 2018.
Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre-
sentation learning on large graphs. Advances in neural
information processing systems , 30, 2017.
Hu, Z., Dong, Y ., Wang, K., Chang, K.-W., and Sun, Y . Gpt-
gnn: Generative pre-training of graph neural networks.
InProceedings of the 26th ACM SIGKDD international
conference on knowledge discovery & data mining , pp.
1857–1867, 2020.
Kipf, T. N. and Welling, M. Semi-supervised classifica-
tion with graph convolutional networks. arXiv preprint
arXiv:1609.02907 , 2016.
Liu, X., Li, Y ., Wu, C., and Hsieh, C.-J. Adv-bnn: Improved
adversarial defense through robust bayesian neural net-
work. arXiv preprint arXiv:1810.01279 , 2018.
Malkov, Y . A. and Yashunin, D. A. Efficient and robust
approximate nearest neighbor search using hierarchical
navigable small world graphs. IEEE transactions on
pattern analysis and machine intelligence , 42(4):824–
836, 2018.Metzen, J. H., Genewein, T., Fischer, V ., and Bischoff, B.
On detecting adversarial perturbations. arXiv preprint
arXiv:1702.04267 , 2017.
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W.,
Songhori, E., Wang, S., Lee, Y .-J., Johnson, E., Pathak,
O., Nazi, A., et al. A graph placement methodology for
fast chip design. Nature , 594(7862):207–212, 2021.
Sharma, K., Verma, S., Medya, S., Bhattacharya, A., and
Ranu, S. Task and model agnostic adversarial attack on
graph neural networks. In Proceedings of the AAAI Con-
ference on Artificial Intelligence , volume 37, pp. 15091–
15099, 2023.
Song, M., Zhan, Z., and Haihong, E. Hierarchical schema
representation for text-to-sql parsing with decomposing
decoding. IEEE Access , 7:103706–103715, 2019.
Veliˇckovi ´c, P., Cucurull, G., Casanova, A., Romero, A.,
Lio, P., and Bengio, Y . Graph attention networks. arXiv
preprint arXiv:1710.10903 , 2017.
V on Luxburg, U. A tutorial on spectral clustering. Statistics
and computing , 17:395–416, 2007.
Xu, K., Chen, H., Liu, S., Chen, P.-Y ., Weng, T.-W., Hong,
M., and Lin, X. Topology attack and defense for graph
neural networks: An optimization perspective. arXiv
preprint arXiv:1906.04214 , 2019.
Yang, P., Chen, J., Hsieh, C.-J., Wang, J.-L., and Jordan,
M. Ml-loo: Detecting adversarial examples with feature
attribution. In Proceedings of the AAAI Conference on
Artificial Intelligence , volume 34, pp. 6639–6647, 2020.
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton,
W. L., and Leskovec, J. Graph convolutional neural net-
works for web-scale recommender systems. In Proceed-
ings of the 24th ACM SIGKDD international conference
on knowledge discovery & data mining , pp. 974–983,
2018.
Yu, B., Yin, H., and Zhu, Z. Spatio-temporal graph convo-
lutional networks: A deep learning framework for traffic
forecasting. arXiv preprint arXiv:1709.04875 , 2017.
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang,
L., Li, C., and Sun, M. Graph neural networks: A review
of methods and applications. AI open , 1:57–81, 2020.
Z¨ugner, D., Akbarnejad, A., and G ¨unnemann, S. Adversarial
attacks on neural networks for graph data. In Proceed-
ings of the 24th ACM SIGKDD international conference
on knowledge discovery & data mining , pp. 2847–2856,
2018.
9