Dataset Description
File descriptions
SampleSubmission.csv - The sample submission, you can check the format of expected output.
Columns:

Index - the row index of test set.
Predicted - the predicted results (recommandation or not) of the author reading the paper. Range: {0, 1}.
bipartite_train_ann.txt - Contains an anonymous bipartite network with two integers per line (an edge). The first id refers to an author, and the second id refers to a paper that the author cites.
Columns:

author - The author id.
paper - The paper id that cited by the author.
bipartite_test_ann.txtThe list of author-paper pairs that need to be predicted(a pair each row), and it's recommended to output your prediction in the order of pairs in this file. (Test set)
Columns: similar to that of bipartite_train_ann.txt.

author_file_ann.txt - Contains an anonymous coauthor network with two integers (range from 0 to 6610, each represents one unique author) per row (an edge), which means two authors have cooperated.
Columns:

1.2. author - The author id that have collaborated.

paper_file_ann.txt - Contains an anonymous citation network with two integers (range from 0 to 79936, each represents one unique paper) per row (an edge), referring that the paper with former id cites the paper with latter id.
Columns:

paper - The paper id.
paper - The id of cited paper.
feature.pkl - For a better and reasonable experiment setup, we provide initial features with a dimension size of 512 for each paper, generated by USE (Universal Sentence Encoder), and need to be read with pickle.load() .
In Kaggle, you just need to submit your prediction result in '.csv' format, but pay attention to other materials that need to be submitted on the canvas platform.