Part 2 Rubric Explanation

1) Weak/strong ties and LCC change during removal

Tie strength is defined by edge weight in the LCC.

Weak ties: weight <= median
Strong ties: weight > median

From the run output: I run two removal orders on the LCC:

weakest to strongest
strongest to weakest

After each edge removal, the LCC is recomputed and recorded (fraction removed vs. LCC size). This directly satisfies the rubric requirement to compare structural robustness under weak-first and strong-first deletions. Edges are removed one by one. After every removal, the LCC is recalculated and its size is stored as node count. The x-axis is fraction of ties removed, and the y-axis is LCC size.

2) Centrality, top papers, and correlation analysis

From the run output, the starting LCC is: Centrality is computed on the LCC using:

1662 nodes
26134 edges

The code also prints exact weak/strong tie statistics:

total number of ties in the LCC: 26134
weak-tie threshold (median weight): 0.6276
number of weak ties (weight <= 0.6276): 13067
number of strong ties (weight > 0.6276): 13067

So both tie classification and total weak/strong counts are explicitly reported before the stepwise removal process.

Centrality, central papers, interpretation, correlation

Three centrality measures are computed on the LCC:

Degree
Closeness
Betweenness

For each metric, top-10 papers are printed in ID<TAB>Title format. Correlation between ranking vectors is: For each one, top-10 papers are listed in ID<TAB>Title format.

For correlation, I first convert centrality scores to ranking vectors and then compute Pearson correlation between rankings.

Results from the run:

Metric	Degree	Closeness	Betweenness
Degree	1.0000	0.9361	0.8114
Closeness	0.9361	1.0000	0.7684
Betweenness	0.8114	0.7684	1.0000

Lowest-correlation pair: Closeness vs Betweenness (0.7684).

Degree vs Closeness: 0.9361
Degree vs Betweenness: 0.8114
Closeness vs Betweenness: 0.7684 (lowest) Interpretation: closeness captures global proximity, while betweenness captures bridge roles on shortest paths. A node can be globally near many others without being a major bridge, so these rankings diverge more than the other pairs. The output explicitly reports the lowest-correlation pair. Papers repeatedly appearing across top lists (e.g., ahuja-etal-2023-mega, ding-etal-2020-discriminatively, qin-etal-2023-chatgpt) indicate robust influence across multiple centrality notions. Lowest pair interpretation:

3) Optional extra credit: theme shift before vs after 2023

closeness measures overall proximity in the graph
betweenness measures bridge role on shortest paths
these are related but different structural roles, so their rankings are less aligned I split papers into two periods (before 2023, and 2023+), build text from title+abstract, use one shared vocabulary, train LDA for both periods, and compare topic vectors by cosine similarity. Repeatedly central papers across top lists include: Output evidence:
ahuja-etal-2023-mega
ding-etal-2020-discriminatively
shin-etal-2020-autoprompt
weller-etal-2020-learning
qin-etal-2023-chatgpt

The code also explicitly prints papers that appear in multiple metric top-10 lists (with metric names), which strengthens the evidence for identifying robustly central papers.

Optional Extra Credit (50%): Theme shift before and after 2023

I compare two time groups: before 2023 and 2023+.

Steps used:

split papers by year
create text from title + abstract
tokenize and clean
build one shared vocabulary
train LDA for each period
extract topic-term matrices D (before) and S (after)
compare topics with cosine similarity and rank by shift score

Run evidence:

D shape: (5, 5000)
S shape: (5, 5000)

Examples from output:

emerging: After Topic 2 | shift=0.1989 | llms, large, data, tasks, knowledge, reasoning, generation, performance
disappearing: Before Topic 4 | shift=0.1912 | question, knowledge, event, performance, questions, task, graph, can

This indicates a stronger LLM/reasoning focus in the later period.

Results (from current execution)

Network loaded successfully; LCC size is 1662 nodes and 26134 edges.
Weak/strong tie section reports:
- total ties: 26134
- median-weight threshold: 0.6276
- weak ties: 13067
- strong ties: 13067
Centrality ranking correlations:
- Degree-Closeness: 0.9361
- Degree-Betweenness: 0.8114
- Closeness-Betweenness: 0.7684
Lowest-correlation pair: Closeness vs Betweenness.
Top-10 central papers were produced for all three metrics in ID<TAB>Title format.
Repeated papers across multiple centrality top-10 lists are explicitly reported.
Topic-evolution matrices were produced:
- D (before 2023): (5, 5000)
- S (2023+): (5, 5000)
Highest-shift emerging topic: After Topic 2 (shift=0.1989) with keywords around llms, reasoning, and generation.
Highest-shift disappearing topic: Before Topic 4 (shift=0.1912) with keywords around question, knowledge, and graph.
Topic matrices: D (before) = (5, 5000), S (2023+) = (5, 5000)

Findings

Conclusion: post-2023 topics shift toward LLM- and reasoning-centered themes, while earlier topics are more question/knowledge/graph-oriented.

The centrality rankings are strongly related overall, but not identical.
Degree and closeness are most aligned (0.9361), indicating that papers with strong local connectivity are often globally well-positioned.
Closeness and betweenness are least aligned (0.7684), showing that global proximity and bridge-role influence capture different node functions.
Repeated appearance of papers such as ahuja-etal-2023-mega, ding-etal-2020-discriminatively, and qin-etal-2023-chatgpt across multiple lists suggests robust influence across different centrality definitions.
Topic-shift outputs indicate post-2023 movement toward LLM-oriented and reasoning-heavy themes.
Overall, the network remains highly connected at baseline, and the analysis pipeline covers connectivity, influence, and temporal theme evolution in a consistent way.