File size: 8,656 Bytes
d38bce3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
Graph Dataset
=======================
We briefly introduce the dataset format of DeepRobust through self-contained examples.
In essence, DeepRobust-Graph provides the following main features:
.. contents::
:local:
Clean (Unattacked) Graphs for Node Classification
-----------------------
Graphs are ubiquitous data structures describing pairwise relations between entities.
A single clean graph in DeepRobust is described by an instance of :class:`deeprobust.graph.data.Dataset`, which holds the following attributes by default:
- :obj:`data.adj`: Graph adjacency matrix in scipy.sparse.csr_matrix format with shape :obj:`[num_nodes, num_nodes]`
- :obj:`data.features`: Node feature matrix with shape :obj:`[num_nodes, num_node_features]`
- :obj:`data.labels`: Target to train against (may have arbitrary shape), *e.g.*, node-level targets of shape :obj:`[num_nodes, *]`
- :obj:`data.train_idx`: Array of training node indices
- :obj:`data.val_idx`: Array of validation node indices
- :obj:`data.test_idx`: Array of test node indices
By default, the loaded :obj:`deeprobust.graph.data.Dataset` will select the largest connect
component of the graph, but users specify different settings by giving different parameters.
Currently DeepRobust supports the following datasets:
:obj:`Cora`,
:obj:`Cora-ML`,
:obj:`Citeseer`,
:obj:`Pubmed`,
:obj:`Polblogs`,
:obj:`ACM`,
:obj:`BlogCatalog`,
:obj:`Flickr`,
:obj:`UAI`.
More details about the datasets can be found `here <https://github.com/DSE-MSU/DeepRobust/tree/master/deeprobust/graph#supported-datasets>`_.
By default, the data splits are generated by :obj:`deeprobust.graph.utils.get_train_val_test`,
which randomly split the data into 10%/10%/80% for training/validaiton/test. You can also generate
splits by yourself by using :obj:`deeprobust.graph.utils.get_train_val_test` or :obj:`deeprobust.graph.utils.get_train_val_test_gcn`.
It is worth noting that there is parameter :obj:`setting` that can be passed into this class. It can be chosen from `["nettack", "gcn", "prognn"]`:
- :obj:`setting="nettack"`: the data splits are 10%/10%/80% and using the largest connected component of the graph;
- :obj:`setting="gcn"`: use the full graph and the data splits will be: 20 nodes per class for training, 500 nodes for validation and 1000 nodes for testing (randomly choosen);
- :obj:`setting="prognn"`: use the largest connected component and the data splits are provided by `ProGNN <https://github.com/ChandlerBang/Pro-GNN>`_ (10%/10%/80%);
.. note::
The 'netack' and 'gcn' setting do not provide fixed split, i.e.,
different random seed would return different data splits.
.. note::
If you hope to use the full graph, please use the 'gcn' setting.
The following example shows how to load DeepRobust datasets
.. code-block:: python
from deeprobust.graph.data import Dataset
# loading cora dataset
data = Dataset(root='/tmp/', name='cora', seed=15)
adj, features, labels = data.adj, data.features, data.labels
idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
# you can also split the data by yourself
idx_train, idx_val, idx_test = get_train_val_test(adj.shape[0], val_size=0.1, test_size=0.8)
# loading acm dataset
data = Dataset(root='/tmp/', name='acm', seed=15)
DeepRobust also provides access to Amazon and Coauthor datasets loaded from Pytorch Geometric:
:obj:`Amazon-Computers`,
:obj:`Amazon-Photo`,
:obj:`Coauthor-CS`,
:obj:`Coauthor-Physics`.
Users can also easily create their own datasets by creating a class with the following attributes: :obj:`data.adj`, :obj:`data.features`, :obj:`data.labels`, :obj:`data.train_idx`, :obj:`data.val_idx`, :obj:`data.test_idx`.
Attacked Graphs for Node Classification
-----------------------
DeepRobust provides the attacked graphs perturbed by `metattack <https://openreview.net/pdf?id=Bylnx209YX>`_ and `nettack <https://arxiv.org/abs/1805.07984>`_. The graphs are attacked using authors' Tensorflow implementation, on random split using seed 15. The download link can be found in `ProGNN code <https://github.com/ChandlerBang/Pro-GNN/tree/master/splits>`_ and the performance of various GNNs can be found in `ProGNN paper <https://arxiv.org/abs/2005.10203>`_. They are instances of :class:`deeprobust.graph.data.PrePtbDataset` with only one attribute :obj:`adj`. Hence, :class:`deeprobust.graph.data.PrePtbDataset` is often used together with :class:`deeprobust.graph.data.Dataset` to obtain node features and labels.
For metattack, DeepRobust provides attacked graphs for Cora, Citeseer, Polblogs and Pubmed,
and the perturbation rate can be chosen from [0.05, 0.1, 0.15, 0.2, 0.25].
.. code-block:: python
from deeprobust.graph.data import Dataset, PrePtbDataset
# You can either use setting='prognn' or seed=15 to get the prognn splits
# data = Dataset(root='/tmp/', name='cora', seed=15) # since the attacked graph are generated under seed 15
data = Dataset(root='/tmp/', name='cora', setting='prognn')
adj, features, labels = data.adj, data.features, data.labels
idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
# Load meta attacked data
perturbed_data = PrePtbDataset(root='/tmp/',
name='cora',
attack_method='meta',
ptb_rate=0.05)
perturbed_adj = perturbed_data.adj
For nettack, DeepRobust provides attacked graphs for Cora, Citeseer, Polblogs and Pubmed,
and ptb_rate indicates the number of perturbations made on each node.
It can be chosen from [1.0, 2.0, 3.0, 4.0, 5.0].
.. code-block:: python
from deeprobust.graph.data import Dataset, PrePtbDataset
# data = Dataset(root='/tmp/', name='cora', seed=15)
data = Dataset(root='/tmp/', name='cora', setting='prognn')
adj, features, labels = data.adj, data.features, data.labels
idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
# Load nettack attacked data
perturbed_data = PrePtbDataset(root='/tmp/', name='cora',
attack_method='nettack',
ptb_rate=3.0) # here ptb_rate means number of perturbation per nodes
perturbed_adj = perturbed_data.adj
idx_test = perturbed_data.target_nodes
Converting Graph Data between DeepRobust and PyTorch Geometric
-----------------------
Given the popularity of PyTorch Geometric in the graph representation learning community,
we also provide tools for converting data between DeepRobust and PyTorch Geometric. We can
use :class:`deeprobust.graph.data.Dpr2Pyg` to convert DeepRobust data to PyTorch Geometric
and use :class:`deeprobust.graph.data.Pyg2Dpr` to convert Pytorch Geometric data to DeepRobust.
For example, we can first create an instance of the Dataset class and convert it to pytorch geometric data format.
.. code-block:: python
from deeprobust.graph.data import Dataset, Dpr2Pyg, Pyg2Dpr
data = Dataset(root='/tmp/', name='cora') # load clean graph
pyg_data = Dpr2Pyg(data) # convert dpr to pyg
print(pyg_data)
print(pyg_data[0])
dpr_data = Pyg2Dpr(pyg_data) # convert pyg to dpr
print(dpr_data.adj)
Load OGB Datasets
-----------------------
`Open Graph Benchmark (OGB) <https://ogb.stanford.edu/>`_ has provided various benchmark
datasets. DeepRobsut now provides interface to convert OGB dataset format (Pyg data format)
to DeepRobust format.
.. code-block:: python
from ogb.nodeproppred import PygNodePropPredDataset
from deeprobust.graph.data import Pyg2Dpr
pyg_data = PygNodePropPredDataset(name = 'ogbn-arxiv')
dpr_data = Pyg2Dpr(pyg_data) # convert pyg to dpr
Load Pytorch Geometric Amazon and Coauthor Datasets
-----------------------
DeepRobust also provides access to the Amazon datasets and Coauthor datasets, i.e.,
`Amazon-Computers`, `Amazon-Photo`, `Coauthor-CS`, `Coauthor-Physics`, from Pytorch
Geometric. Specifically, users can access them through
:class:`deeprobust.graph.data.AmazonPyg` and :class:`deeprobust.graph.data.CoauthorPyg`.
For example, we can directly load Amazon dataset from deeprobust in the format of pyg
as follows,
.. code-block:: python
from deeprobust.graph.data import AmazonPyg
computers = AmazonPyg(root='/tmp', name='computers')
print(computers)
print(computers[0])
photo = AmazonPyg(root='/tmp', name='photo')
print(photo)
print(photo[0])
Similarly, we can also load Coauthor dataset,
.. code-block:: python
from deeprobust.graph.data import CoauthorPyg
cs = CoauthorPyg(root='/tmp', name='cs')
print(cs)
print(cs[0])
physics = CoauthorPyg(root='/tmp', name='physics')
print(physics)
print(physics[0])
|