File size: 9,786 Bytes
354c06c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
# 🧠 Graph Neural Networks: A Comprehensive Implementation and Comparison

[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)](https://pytorch.org/)
[![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://python.org/)
[![MIT License](https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge)](https://choosealicense.com/licenses/mit/)

A complete implementation and comparison of three state-of-the-art Graph Neural Network architectures: **GCN**, **GraphSAGE**, and **GAT** on the Cora citation network dataset.

## 🎯 **Project Overview**

This project demonstrates the implementation and comparative analysis of Graph Neural Networks for node classification tasks. Using the Cora citation network dataset, we train and evaluate three different GNN architectures to understand their strengths and performance characteristics.

### **Key Results**
- **πŸ₯‡ GAT (Graph Attention Networks)**: 81.9% test accuracy
- **πŸ₯ˆ GCN (Graph Convolutional Networks)**: 79.3% test accuracy  
- **πŸ₯‰ GraphSAGE**: 76.8% test accuracy

## πŸ“Š **Dataset: Cora Citation Network**

- **2,708 nodes** (machine learning papers)
- **10,556 edges** (citation relationships)
- **1,433 features** per node (bag-of-words from abstracts)
- **7 classes** (research areas: Neural Networks, Rule Learning, etc.)
- **Semi-supervised setup**: 140 training, 500 validation, 1000 test nodes

![Graph Visualization](graph_visualization.png)
*Cora citation network structure with nodes colored by research area*

## πŸ—οΈ **Architecture Comparison**

| Model | Parameters | Key Innovation | Convergence | Test Accuracy |
|-------|------------|----------------|-------------|---------------|
| **GCN** | 46,119 | Spectral graph convolution | 90 epochs | 79.3% |
| **GraphSAGE** | 92,199 | Sampling and aggregation | 187 epochs | 76.8% |
| **GAT** | 369,429 | Multi-head attention | 46 epochs | **81.9%** |

## πŸ“ˆ **Training Results**

![Training Curves](training_curves.png)
*Loss and accuracy curves showing training progression for all three models*

### **Key Training Insights**
- **GAT**: Fastest convergence (46 epochs) with highest final accuracy
- **GCN**: Steady, reliable convergence with good performance
- **GraphSAGE**: Slower start but strong final performance, took longest to converge

## πŸ” **Learned Representations**

![Embeddings Visualization](embeddings_tsne.png)
*t-SNE visualization of learned node embeddings showing class separation quality*

The embeddings visualization reveals:
- **GAT**: Best class separation with clear clustering
- **GCN**: Good separation with some overlap
- **GraphSAGE**: Decent clustering with more mixed regions

## πŸš€ **Quick Start**

### **Installation**

```bash
# Clone the repository
git clone https://github.com/GruheshKurra/GraphNeuralNetworks-GNN-.git
cd GraphNeuralNetworks-GNN-

# Install dependencies
pip install torch torchvision torchaudio
pip install torch-geometric torch-scatter torch-sparse torch-cluster torch-spline-conv
pip install matplotlib seaborn pandas numpy scikit-learn networkx
```

### **For Apple Silicon Macs (M1/M2/M3/M4)**
```bash
# The code automatically detects and uses MPS acceleration
pip install torch torchvision torchaudio
pip install torch-geometric torch-scatter torch-sparse torch-cluster torch-spline-conv
pip install matplotlib seaborn pandas numpy scikit-learn networkx
```

### **For Google Colab**
```python
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install torch-geometric torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-2.0.0+cpu.html
!pip install matplotlib seaborn pandas numpy scikit-learn networkx
```

### **Run the Training**

```bash
python gnn_comparison.py
```

## πŸ“ˆ **Key Features**

- **🎯 Three GNN Architectures**: Complete implementations of GCN, GraphSAGE, and GAT
- **πŸ“Š Comprehensive Evaluation**: Accuracy, precision, recall, F1-score, confusion matrices
- **πŸ“ˆ Visualization**: Training curves, t-SNE embeddings, network structure
- **πŸ›‘οΈ Robust Training**: Early stopping, model checkpointing, cross-platform compatibility
- **πŸ“ Detailed Logging**: Complete training logs instead of code comments
- **πŸ’Ύ Artifact Saving**: Models, results, and visualizations saved automatically

## πŸ—‚οΈ **Project Structure**

```
β”œβ”€β”€ gnn_comparison.py           # Main training script
β”œβ”€β”€ best_*_model.pth           # Best model checkpoints
β”œβ”€β”€ *_full_model.pkl           # Complete model objects
β”œβ”€β”€ training_curves.png        # Loss and accuracy visualizations
β”œβ”€β”€ embeddings_tsne.png        # t-SNE embedding visualizations
β”œβ”€β”€ graph_visualization.png    # Network structure visualization
β”œβ”€β”€ results_summary.json       # Comprehensive metrics
β”œβ”€β”€ gnn_training.log          # Complete training logs
└── README.md                 # This file
```

## πŸ§ͺ **Methodology**

### **Model Architectures**

1. **Graph Convolutional Networks (GCN)**
   - Spectral approach to graph convolutions
   - Simple and effective baseline
   - Fast convergence with good performance

2. **GraphSAGE (Sample and Aggregate)**
   - Sampling-based approach for scalability
   - Inductive learning capability
   - Handles large graphs efficiently

3. **Graph Attention Networks (GAT)**
   - Multi-head attention mechanism
   - Dynamic neighbor weighting
   - Best performance but highest complexity

### **Training Configuration**

```python
config = {
    'hidden_dim': 32,        # Compact representation
    'num_layers': 2,         # Avoids over-smoothing
    'dropout': 0.5,          # Strong regularization
    'learning_rate': 0.001,  # Conservative learning
    'weight_decay': 5e-4,    # L2 regularization
    'epochs': 200,           # Maximum training
    'patience': 20,          # Early stopping
    'attention_heads': 8     # Multi-head attention (GAT)
}
```

## πŸ“Š **Results Analysis**

### **Performance Metrics**

| Model | Test Acc | Precision | Recall | F1-Score | Parameters |
|-------|----------|-----------|--------|----------|------------|
| GCN | 79.3% | 0.791 | 0.793 | 0.792 | 46K |
| GraphSAGE | 76.8% | 0.765 | 0.768 | 0.766 | 92K |
| GAT | **81.9%** | **0.819** | **0.819** | **0.819** | 369K |

### **Key Insights**

1. **GAT's Superior Performance**: Attention mechanism provides significant advantage
2. **Efficiency vs Performance**: GCN offers good performance with fewer parameters
3. **Convergence Speed**: GAT converges fastest despite higher complexity
4. **Regularization Impact**: Strong dropout (0.5) crucial for small training set

## 🎨 **Visualizations Generated**

The project automatically generates comprehensive visualizations:

### 1. **Network Structure Visualization**
![Graph Structure](graph_visualization.png)

Shows the Cora citation network with:
- Nodes colored by research area (7 classes)
- Spring layout for optimal visualization
- Clear community structure visible

### 2. **Training Progress Monitoring**
![Training Curves](training_curves.png)

Displays for each model:
- **Loss curves**: Training and validation loss progression
- **Accuracy curves**: Training and validation accuracy trends
- **Overfitting analysis**: Gap between train/validation performance

### 3. **Learned Representation Quality**
![Node Embeddings](embeddings_tsne.png)

t-SNE visualization showing:
- **Class separation**: How well models distinguish between research areas
- **Embedding quality**: Clustering strength in learned representations
- **Model comparison**: Visual comparison of representation learning

## πŸ› οΈ **Technical Details**

### **Device Compatibility**
- **Apple Silicon MPS**: Automatic detection and acceleration
- **NVIDIA CUDA**: GPU acceleration support
- **CPU Fallback**: Universal compatibility

### **Best Practices Implemented**
- Early stopping to prevent overfitting
- Model checkpointing for reproducibility
- Comprehensive logging for debugging
- Cross-platform compatibility
- Memory-efficient implementations

## πŸ“š **Learning Outcomes**

This implementation demonstrates:

1. **Graph Neural Network Fundamentals**
   - Message passing framework
   - Neighborhood aggregation
   - Semi-supervised node classification

2. **Architecture Comparison**
   - Spectral vs spatial approaches
   - Attention mechanisms in graphs
   - Scalability considerations

3. **Best Practices**
   - Hyperparameter selection for graphs
   - Regularization techniques
   - Evaluation methodologies

## πŸ”§ **Reproducibility**

All experiments are fully reproducible:
- Fixed random seeds for consistent results
- Complete configuration saved in `results_summary.json`
- Model checkpoints saved at best validation performance
- Comprehensive logging of all training steps

## 🀝 **Contributing**

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### **Development Setup**
```bash
git clone https://github.com/GruheshKurra/GraphNeuralNetworks-GNN-.git
cd GraphNeuralNetworks-GNN-
pip install -r requirements.txt
```

## πŸ“„ **License**

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ™ **Acknowledgments**

- **PyTorch Geometric** team for excellent graph learning library
- **Cora Dataset** creators for benchmark citation network
- **Graph Neural Network** researchers for foundational work

## πŸ“ž **Contact**

For questions or collaborations, please open an issue or reach out through GitHub.

---

⭐ **If you find this project helpful, please consider giving it a star!** ⭐