File size: 3,266 Bytes
736c82f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
license: mit
tags:
  - graph-neural-networks
  - gnn
  - mathematical-proofs
  - curriculum-learning
  - pytorch
language:
  - en
library_name: pytorch
---

# Merge2Docs Models

This repository contains pre-trained models for the [Merge2Docs](https://github.com/pechang03/merge2docs) project - a research project combining LLMs with advanced graph algorithms for sophisticated document merging.

## Models

Total size: **4.38 MB**


### sparse_hierarchical_net_v1.pth

- **Type**: GNN
- **Size**: 0.375 MB
- **Description**: Sparse Hierarchical GNN model for graph neural network tasks
- **Path**: `sparse_hierarchical_net_v1.pth`


### curriculum_ultimate_model.pth

- **Type**: Math/Curriculum
- **Size**: 4.0 MB
- **Description**: Curriculum learning model for mathematical proof routing
- **Path**: `curriculum_ultimate_model.pth`


## Usage

### Download models

```python
from huggingface_hub import hf_hub_download

# Download GNN model
gnn_model_path = hf_hub_download(
    repo_id="peterliu06/merge2docs-models",
    filename="sparse_hierarchical_net_v1.pth"
)

# Download curriculum model
math_model_path = hf_hub_download(
    repo_id="peterliu06/merge2docs-models",
    filename="curriculum_ultimate_model.pth"
)
```

### Load in PyTorch

```python
import torch

# Load GNN model
gnn_model = torch.load(gnn_model_path)

# Load curriculum model
math_model = torch.load(math_model_path)
```

## Model Details

### Sparse Hierarchical GNN (sparse_hierarchical_net_v1.pth)

A Graph Neural Network designed for hierarchical document structure analysis.

**Architecture:**
- Sparse graph representation
- Hierarchical message passing
- Optimized for document-level relationships

**Training:**
- Dataset: Document graph structures
- Framework: PyTorch + PyTorch Geometric
- Hardware: GPU-accelerated training

### Curriculum Ultimate Model (curriculum_ultimate_model.pth)

A curriculum learning-based model for mathematical proof routing and validation.

**Architecture:**
- Progressive difficulty learning
- Multi-task proof validation
- Curriculum-based training strategy

**Training:**
- Dataset: Mathematical proofs and reasoning tasks
- Framework: PyTorch
- Approach: Curriculum learning with increasing complexity

## Integration with Merge2Docs

To use these models with Merge2Docs, update your `.env_m2d`:

```bash
# Download from Hugging Face
GNN_MODEL_PATH="./models/sparse_hierarchical_net_v1.pth"
MATH_MODEL_PATH="./models/curriculum_ultimate_model.pth"
```

Then download the models:

```bash
cd merge2docs
python scripts/download_models_from_huggingface.py
```

## Citation

If you use these models in your research, please cite:

```bibtex
@software{merge2docs_models,
  author = {Chang, Peter},
  title = {Merge2Docs Pre-trained Models},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/peterliu06/merge2docs-models}
}
```

## License

MIT License - See [LICENSE](https://github.com/pechang03/merge2docs/blob/main/LICENSE)

## Project Links

- **GitHub**: https://github.com/pechang03/merge2docs
- **Documentation**: See project README
- **Issues**: https://github.com/pechang03/merge2docs/issues

## Model Updates

Models are periodically retrained and updated. Check the git history for version information.