| # Part 2 Rubric Explanation |
| ## 1) Weak/strong ties and LCC change during removal |
|
|
| Tie strength is defined by edge `weight` in the LCC. |
|
|
| - Weak ties: `weight <= median` |
| - Strong ties: `weight > median` |
|
|
| From the run output: |
| I run two removal orders on the LCC: |
| 1. weakest to strongest |
| 2. strongest to weakest |
|
|
| After each edge removal, the LCC is recomputed and recorded (fraction removed vs. LCC size). This directly satisfies the rubric requirement to compare structural robustness under weak-first and strong-first deletions. |
| Edges are removed one by one. After every removal, the LCC is recalculated and its size is stored as node count. The x-axis is fraction of ties removed, and the y-axis is LCC size. |
| ## 2) Centrality, top papers, and correlation analysis |
| From the run output, the starting LCC is: |
| Centrality is computed on the LCC using: |
| - `1662` nodes |
| - `26134` edges |
|
|
| The code also prints exact weak/strong tie statistics: |
|
|
| - total number of ties in the LCC: `26134` |
| - weak-tie threshold (median weight): `0.6276` |
| - number of weak ties (`weight <= 0.6276`): `13067` |
| - number of strong ties (`weight > 0.6276`): `13067` |
|
|
| So both tie classification and total weak/strong counts are explicitly reported before the stepwise removal process. |
|
|
| ## Centrality, central papers, interpretation, correlation |
|
|
| Three centrality measures are computed on the LCC: |
| - Degree |
| - Closeness |
| - Betweenness |
|
|
| For each metric, top-10 papers are printed in `ID<TAB>Title` format. Correlation between ranking vectors is: |
| For each one, top-10 papers are listed in `ID<TAB>Title` format. |
|
|
| For correlation, I first convert centrality scores to ranking vectors and then compute Pearson correlation between rankings. |
|
|
| Results from the run: |
| | Metric | Degree | Closeness | Betweenness | |
| |---|---:|---:|---:| |
| | Degree | 1.0000 | 0.9361 | 0.8114 | |
| | Closeness | 0.9361 | 1.0000 | 0.7684 | |
| | Betweenness | 0.8114 | 0.7684 | 1.0000 | |
|
|
| Lowest-correlation pair: **Closeness vs Betweenness (`0.7684`)**. |
| - Degree vs Closeness: `0.9361` |
| - Degree vs Betweenness: `0.8114` |
| - Closeness vs Betweenness: `0.7684` (lowest) |
| Interpretation: closeness captures global proximity, while betweenness captures bridge roles on shortest paths. A node can be globally near many others without being a major bridge, so these rankings diverge more than the other pairs. |
| The output explicitly reports the lowest-correlation pair. |
| Papers repeatedly appearing across top lists (e.g., `ahuja-etal-2023-mega`, `ding-etal-2020-discriminatively`, `qin-etal-2023-chatgpt`) indicate robust influence across multiple centrality notions. |
| Lowest pair interpretation: |
| ## 3) Optional extra credit: theme shift before vs after 2023 |
| - closeness measures overall proximity in the graph |
| - betweenness measures bridge role on shortest paths |
| - these are related but different structural roles, so their rankings are less aligned |
| I split papers into two periods (before 2023, and 2023+), build text from title+abstract, use one shared vocabulary, train LDA for both periods, and compare topic vectors by cosine similarity. |
| Repeatedly central papers across top lists include: |
| Output evidence: |
| - `ahuja-etal-2023-mega` |
| - `ding-etal-2020-discriminatively` |
| - `shin-etal-2020-autoprompt` |
| - `weller-etal-2020-learning` |
| - `qin-etal-2023-chatgpt` |
|
|
| The code also explicitly prints papers that appear in multiple metric top-10 lists (with metric names), which strengthens the evidence for identifying robustly central papers. |
|
|
|
|
| ## Optional Extra Credit (50%): Theme shift before and after 2023 |
|
|
| I compare two time groups: before 2023 and 2023+. |
|
|
| Steps used: |
|
|
| 1. split papers by year |
| 2. create text from title + abstract |
| 3. tokenize and clean |
| 4. build one shared vocabulary |
| 5. train LDA for each period |
| 6. extract topic-term matrices `D` (before) and `S` (after) |
| 7. compare topics with cosine similarity and rank by shift score |
|
|
| Run evidence: |
|
|
| - `D` shape: `(5, 5000)` |
| - `S` shape: `(5, 5000)` |
|
|
| Examples from output: |
|
|
| - emerging: `After Topic 2 | shift=0.1989 | llms, large, data, tasks, knowledge, reasoning, generation, performance` |
| - disappearing: `Before Topic 4 | shift=0.1912 | question, knowledge, event, performance, questions, task, graph, can` |
|
|
| This indicates a stronger LLM/reasoning focus in the later period. |
|
|
| ## Results (from current execution) |
|
|
| - Network loaded successfully; LCC size is `1662` nodes and `26134` edges. |
| - Weak/strong tie section reports: |
| - total ties: `26134` |
| - median-weight threshold: `0.6276` |
| - weak ties: `13067` |
| - strong ties: `13067` |
| - Centrality ranking correlations: |
| - Degree-Closeness: `0.9361` |
| - Degree-Betweenness: `0.8114` |
| - Closeness-Betweenness: `0.7684` |
| - Lowest-correlation pair: Closeness vs Betweenness. |
| - Top-10 central papers were produced for all three metrics in `ID<TAB>Title` format. |
| - Repeated papers across multiple centrality top-10 lists are explicitly reported. |
| - Topic-evolution matrices were produced: |
| - `D` (before 2023): `(5, 5000)` |
| - `S` (2023+): `(5, 5000)` |
| - Highest-shift emerging topic: After Topic 2 (`shift=0.1989`) with keywords around `llms`, `reasoning`, and `generation`. |
| - Highest-shift disappearing topic: Before Topic 4 (`shift=0.1912`) with keywords around `question`, `knowledge`, and `graph`. |
| - Topic matrices: `D` (before) = `(5, 5000)`, `S` (2023+) = `(5, 5000)` |
| ## Findings |
| Conclusion: post-2023 topics shift toward LLM- and reasoning-centered themes, while earlier topics are more question/knowledge/graph-oriented. |
| - The centrality rankings are strongly related overall, but not identical. |
| - Degree and closeness are most aligned (`0.9361`), indicating that papers with strong local connectivity are often globally well-positioned. |
| - Closeness and betweenness are least aligned (`0.7684`), showing that global proximity and bridge-role influence capture different node functions. |
| - Repeated appearance of papers such as `ahuja-etal-2023-mega`, `ding-etal-2020-discriminatively`, and `qin-etal-2023-chatgpt` across multiple lists suggests robust influence across different centrality definitions. |
| - Topic-shift outputs indicate post-2023 movement toward LLM-oriented and reasoning-heavy themes. |
| - Overall, the network remains highly connected at baseline, and the analysis pipeline covers connectivity, influence, and temporal theme evolution in a consistent way. |
|
|