File size: 8,656 Bytes
d38bce3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
Graph Dataset 
=======================

We briefly introduce the dataset format of DeepRobust through self-contained examples.
In essence, DeepRobust-Graph provides the following main features:

.. contents::
    :local: 



Clean (Unattacked) Graphs for Node Classification
-----------------------
Graphs are ubiquitous data structures describing pairwise relations between entities.
A single clean graph in DeepRobust is described by an instance of :class:`deeprobust.graph.data.Dataset`, which holds the following attributes by default:

- :obj:`data.adj`: Graph adjacency matrix in scipy.sparse.csr_matrix format with shape :obj:`[num_nodes, num_nodes]`
- :obj:`data.features`: Node feature matrix with shape :obj:`[num_nodes, num_node_features]`
- :obj:`data.labels`: Target to train against (may have arbitrary shape), *e.g.*, node-level targets of shape :obj:`[num_nodes, *]`
- :obj:`data.train_idx`: Array of training node indices 
- :obj:`data.val_idx`: Array of validation node indices 
- :obj:`data.test_idx`: Array of test node indices 

By default, the loaded :obj:`deeprobust.graph.data.Dataset` will select the largest connect
component of the graph, but users specify different settings by giving different parameters. 

Currently DeepRobust supports the following datasets:
:obj:`Cora`,
:obj:`Cora-ML`,
:obj:`Citeseer`,
:obj:`Pubmed`,
:obj:`Polblogs`,
:obj:`ACM`,
:obj:`BlogCatalog`,
:obj:`Flickr`,
:obj:`UAI`.
More details about the datasets can be found `here <https://github.com/DSE-MSU/DeepRobust/tree/master/deeprobust/graph#supported-datasets>`_.


By default, the data splits are generated by :obj:`deeprobust.graph.utils.get_train_val_test`,
which randomly split the data into 10%/10%/80% for training/validaiton/test. You can also generate 
splits by yourself by using :obj:`deeprobust.graph.utils.get_train_val_test` or :obj:`deeprobust.graph.utils.get_train_val_test_gcn`. 
It is worth noting that there is parameter :obj:`setting` that can be passed into this class. It can be chosen from `["nettack", "gcn", "prognn"]`: 

- :obj:`setting="nettack"`: the data splits are 10%/10%/80% and using the largest connected component of the graph; 
- :obj:`setting="gcn"`: use the full graph and the data splits will be: 20 nodes per class for training, 500 nodes for validation and 1000 nodes for testing (randomly choosen);
- :obj:`setting="prognn"`: use the largest connected component and the data splits are provided by `ProGNN <https://github.com/ChandlerBang/Pro-GNN>`_ (10%/10%/80%);


.. note::
    The 'netack' and 'gcn' setting do not provide fixed split, i.e.,
    different random seed would return different data splits. 

.. note::
    If you hope to use the full graph, please use the 'gcn' setting. 

The following example shows how to load DeepRobust datasets

.. code-block:: python
   
   from deeprobust.graph.data import Dataset
   # loading cora dataset
   data = Dataset(root='/tmp/', name='cora', seed=15) 
   adj, features, labels = data.adj, data.features, data.labels
   idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
   # you can also split the data by yourself
   idx_train, idx_val, idx_test = get_train_val_test(adj.shape[0], val_size=0.1, test_size=0.8) 

   # loading acm dataset
   data = Dataset(root='/tmp/', name='acm', seed=15) 


DeepRobust also provides access to Amazon and Coauthor datasets loaded from Pytorch Geometric:
:obj:`Amazon-Computers`,
:obj:`Amazon-Photo`,
:obj:`Coauthor-CS`,
:obj:`Coauthor-Physics`.

Users can also easily create their own datasets by creating a class with the following attributes: :obj:`data.adj`, :obj:`data.features`, :obj:`data.labels`, :obj:`data.train_idx`, :obj:`data.val_idx`, :obj:`data.test_idx`.

Attacked Graphs for Node Classification
-----------------------
DeepRobust provides the attacked graphs perturbed by `metattack <https://openreview.net/pdf?id=Bylnx209YX>`_ and `nettack <https://arxiv.org/abs/1805.07984>`_. The graphs are attacked using authors' Tensorflow implementation, on random split using seed 15. The download link can be found in `ProGNN code <https://github.com/ChandlerBang/Pro-GNN/tree/master/splits>`_ and the performance of various GNNs can be found in `ProGNN paper <https://arxiv.org/abs/2005.10203>`_. They are instances of :class:`deeprobust.graph.data.PrePtbDataset` with only one attribute :obj:`adj`. Hence, :class:`deeprobust.graph.data.PrePtbDataset` is often used together with :class:`deeprobust.graph.data.Dataset` to obtain node features and labels. 

For metattack, DeepRobust provides attacked graphs for Cora, Citeseer, Polblogs and Pubmed, 
and the perturbation rate can be chosen from [0.05, 0.1, 0.15, 0.2, 0.25].

.. code-block:: python
   
   from deeprobust.graph.data import Dataset, PrePtbDataset
   # You can either use setting='prognn' or seed=15 to get the prognn splits 
   # data = Dataset(root='/tmp/', name='cora', seed=15) # since the attacked graph are generated under seed 15
   data = Dataset(root='/tmp/', name='cora', setting='prognn')    
   adj, features, labels = data.adj, data.features, data.labels
   idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
   # Load meta attacked data
   perturbed_data = PrePtbDataset(root='/tmp/',
					   name='cora',
					   attack_method='meta',
					   ptb_rate=0.05)
   perturbed_adj = perturbed_data.adj

For nettack, DeepRobust provides attacked graphs for Cora, Citeseer, Polblogs and Pubmed, 
and ptb_rate indicates the number of perturbations made on each node. 
It can be chosen from [1.0, 2.0, 3.0, 4.0, 5.0].

.. code-block:: python

   from deeprobust.graph.data import Dataset, PrePtbDataset
   # data = Dataset(root='/tmp/', name='cora', seed=15) 
   data = Dataset(root='/tmp/', name='cora', setting='prognn')    
   adj, features, labels = data.adj, data.features, data.labels
   idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
   # Load nettack attacked data
   perturbed_data = PrePtbDataset(root='/tmp/', name='cora',
					   attack_method='nettack',
					   ptb_rate=3.0) # here ptb_rate means number of perturbation per nodes
   perturbed_adj = perturbed_data.adj
   idx_test = perturbed_data.target_nodes



Converting Graph Data between DeepRobust and PyTorch Geometric 
-----------------------
Given the popularity of PyTorch Geometric in the graph representation learning community,
we also provide tools for converting data between DeepRobust and PyTorch Geometric. We can
use :class:`deeprobust.graph.data.Dpr2Pyg` to convert DeepRobust data to PyTorch Geometric 
and use :class:`deeprobust.graph.data.Pyg2Dpr` to convert Pytorch Geometric data to DeepRobust.
For example, we can first create an instance of the Dataset class and convert it to pytorch geometric data format.

.. code-block:: python

    from deeprobust.graph.data import Dataset, Dpr2Pyg, Pyg2Dpr
    data = Dataset(root='/tmp/', name='cora') # load clean graph
    pyg_data = Dpr2Pyg(data) # convert dpr to pyg
    print(pyg_data)
    print(pyg_data[0])
    dpr_data = Pyg2Dpr(pyg_data) # convert pyg to dpr
    print(dpr_data.adj)


Load OGB Datasets 
-----------------------
`Open Graph Benchmark (OGB) <https://ogb.stanford.edu/>`_ has provided various benchmark
datasets. DeepRobsut now provides interface to convert OGB dataset format (Pyg data format) 
to DeepRobust format.

.. code-block:: python

    from ogb.nodeproppred import PygNodePropPredDataset
    from deeprobust.graph.data import Pyg2Dpr
    pyg_data = PygNodePropPredDataset(name = 'ogbn-arxiv')
    dpr_data = Pyg2Dpr(pyg_data) # convert pyg to dpr
    

Load Pytorch Geometric Amazon and Coauthor Datasets
-----------------------
DeepRobust also provides access to the Amazon datasets and Coauthor datasets, i.e.,
`Amazon-Computers`, `Amazon-Photo`, `Coauthor-CS`, `Coauthor-Physics`, from Pytorch 
Geometric. Specifically, users can access them through 
:class:`deeprobust.graph.data.AmazonPyg` and :class:`deeprobust.graph.data.CoauthorPyg`. 
For example, we can directly load Amazon dataset from deeprobust in the format of pyg
as follows,

.. code-block:: python

    from deeprobust.graph.data import AmazonPyg
    computers = AmazonPyg(root='/tmp', name='computers')
    print(computers)
    print(computers[0])
    photo = AmazonPyg(root='/tmp', name='photo')
    print(photo)
    print(photo[0])


Similarly, we can also load Coauthor dataset,

.. code-block:: python

    from deeprobust.graph.data import CoauthorPyg
    cs = CoauthorPyg(root='/tmp', name='cs')
    print(cs)
    print(cs[0])
    physics = CoauthorPyg(root='/tmp', name='physics')
    print(physics)
    print(physics[0])