Part 2 Rubric Explanation
1) Weak/strong ties and LCC change during removal
Tie strength is defined by edge weight in the LCC.
- Weak ties:
weight <= median - Strong ties:
weight > median
From the run output: I run two removal orders on the LCC:
- weakest to strongest
- strongest to weakest
After each edge removal, the LCC is recomputed and recorded (fraction removed vs. LCC size). This directly satisfies the rubric requirement to compare structural robustness under weak-first and strong-first deletions. Edges are removed one by one. After every removal, the LCC is recalculated and its size is stored as node count. The x-axis is fraction of ties removed, and the y-axis is LCC size.
2) Centrality, top papers, and correlation analysis
From the run output, the starting LCC is: Centrality is computed on the LCC using:
1662nodes26134edges
The code also prints exact weak/strong tie statistics:
- total number of ties in the LCC:
26134 - weak-tie threshold (median weight):
0.6276 - number of weak ties (
weight <= 0.6276):13067 - number of strong ties (
weight > 0.6276):13067
So both tie classification and total weak/strong counts are explicitly reported before the stepwise removal process.
Centrality, central papers, interpretation, correlation
Three centrality measures are computed on the LCC:
- Degree
- Closeness
- Betweenness
For each metric, top-10 papers are printed in ID<TAB>Title format. Correlation between ranking vectors is:
For each one, top-10 papers are listed in ID<TAB>Title format.
For correlation, I first convert centrality scores to ranking vectors and then compute Pearson correlation between rankings.
Results from the run:
| Metric | Degree | Closeness | Betweenness |
|---|---|---|---|
| Degree | 1.0000 | 0.9361 | 0.8114 |
| Closeness | 0.9361 | 1.0000 | 0.7684 |
| Betweenness | 0.8114 | 0.7684 | 1.0000 |
Lowest-correlation pair: Closeness vs Betweenness (0.7684).
- Degree vs Closeness:
0.9361 - Degree vs Betweenness:
0.8114 - Closeness vs Betweenness:
0.7684(lowest) Interpretation: closeness captures global proximity, while betweenness captures bridge roles on shortest paths. A node can be globally near many others without being a major bridge, so these rankings diverge more than the other pairs. The output explicitly reports the lowest-correlation pair. Papers repeatedly appearing across top lists (e.g.,ahuja-etal-2023-mega,ding-etal-2020-discriminatively,qin-etal-2023-chatgpt) indicate robust influence across multiple centrality notions. Lowest pair interpretation:
3) Optional extra credit: theme shift before vs after 2023
- closeness measures overall proximity in the graph
- betweenness measures bridge role on shortest paths
- these are related but different structural roles, so their rankings are less aligned I split papers into two periods (before 2023, and 2023+), build text from title+abstract, use one shared vocabulary, train LDA for both periods, and compare topic vectors by cosine similarity. Repeatedly central papers across top lists include: Output evidence:
ahuja-etal-2023-megading-etal-2020-discriminativelyshin-etal-2020-autopromptweller-etal-2020-learningqin-etal-2023-chatgpt
The code also explicitly prints papers that appear in multiple metric top-10 lists (with metric names), which strengthens the evidence for identifying robustly central papers.
Optional Extra Credit (50%): Theme shift before and after 2023
I compare two time groups: before 2023 and 2023+.
Steps used:
- split papers by year
- create text from title + abstract
- tokenize and clean
- build one shared vocabulary
- train LDA for each period
- extract topic-term matrices
D(before) andS(after) - compare topics with cosine similarity and rank by shift score
Run evidence:
Dshape:(5, 5000)Sshape:(5, 5000)
Examples from output:
- emerging:
After Topic 2 | shift=0.1989 | llms, large, data, tasks, knowledge, reasoning, generation, performance - disappearing:
Before Topic 4 | shift=0.1912 | question, knowledge, event, performance, questions, task, graph, can
This indicates a stronger LLM/reasoning focus in the later period.
Results (from current execution)
- Network loaded successfully; LCC size is
1662nodes and26134edges. - Weak/strong tie section reports:
- total ties:
26134 - median-weight threshold:
0.6276 - weak ties:
13067 - strong ties:
13067
- total ties:
- Centrality ranking correlations:
- Degree-Closeness:
0.9361 - Degree-Betweenness:
0.8114 - Closeness-Betweenness:
0.7684
- Degree-Closeness:
- Lowest-correlation pair: Closeness vs Betweenness.
- Top-10 central papers were produced for all three metrics in
ID<TAB>Titleformat. - Repeated papers across multiple centrality top-10 lists are explicitly reported.
- Topic-evolution matrices were produced:
D(before 2023):(5, 5000)S(2023+):(5, 5000)
- Highest-shift emerging topic: After Topic 2 (
shift=0.1989) with keywords aroundllms,reasoning, andgeneration. - Highest-shift disappearing topic: Before Topic 4 (
shift=0.1912) with keywords aroundquestion,knowledge, andgraph. - Topic matrices:
D(before) =(5, 5000),S(2023+) =(5, 5000)
Findings
Conclusion: post-2023 topics shift toward LLM- and reasoning-centered themes, while earlier topics are more question/knowledge/graph-oriented.
- The centrality rankings are strongly related overall, but not identical.
- Degree and closeness are most aligned (
0.9361), indicating that papers with strong local connectivity are often globally well-positioned. - Closeness and betweenness are least aligned (
0.7684), showing that global proximity and bridge-role influence capture different node functions. - Repeated appearance of papers such as
ahuja-etal-2023-mega,ding-etal-2020-discriminatively, andqin-etal-2023-chatgptacross multiple lists suggests robust influence across different centrality definitions. - Topic-shift outputs indicate post-2023 movement toward LLM-oriented and reasoning-heavy themes.
- Overall, the network remains highly connected at baseline, and the analysis pipeline covers connectivity, influence, and temporal theme evolution in a consistent way.