anishabhatnagar commited on
Commit
5e0b531
Β·
1 Parent(s): 25cd5d2

updated README

Browse files
Files changed (2) hide show
  1. README.md +62 -9
  2. app.py +1 -1
README.md CHANGED
@@ -12,24 +12,77 @@ license: apache-2.0
12
  short_description: Interpreting the latent space of Authorship Attribution
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
16
 
 
17
 
18
- ## Useful commands
 
 
19
 
20
- ### Prepare data training/test
21
 
 
22
 
 
23
 
24
- ### Clustering the background corpus
 
25
 
26
- python cluster_corpus.py ../../iarpa-hiatus/explanation_tool_files/reddit_cluster_training.pkl ../../iarpa-hiatus/explanation_tool_files/reddit_cluster_test.pkl "AnnaWegmann/Style-Embedding" ./datasets/reddit_clustered_authors.pkl --min_samples 2 --metric cosine --pca_dimensions 100 --eps 0.04
27
 
28
- ### Generate explainability sample
 
 
 
29
 
30
- python prepare_data.py ../explanation_tool_files/reddit_cluster_test.pkl ./datasets/reddit_explanation_sample.json
31
 
 
 
 
 
 
 
 
 
32
 
33
- ### Generate static explanations for a sample
 
34
 
35
- python baseline_static_explanations.py generate_explanations ./datasets/reddit_explanation_sample.json ./datasets/reddit_explanation_sample_with_explanations.json --interp_space_path ./datasets/reddit_interp_space.json --model_name 'AnnaWegmann/Style-Embedding'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  short_description: Interpreting the latent space of Authorship Attribution
13
  ---
14
 
15
+ <h1 align="center">Authorship Attribution Explainability Tool</h1>
16
+ <div align="center">
17
 
18
+ </div>
19
 
20
+ An interactive demo for visualizing and explaining authorship attribution (AA) models. The tool shows how sentence-transformer models interpret writing style using two separate explanation types:
21
+ 1. LLM-based stylistic features
22
+ 2. Gram2Vec linguistic features
23
 
24
+ It also provides an interactive latent-space view of authors to support deeper analysis of stylistic similarity and attribution behavior.
25
 
26
+ ## 🎯 What This Demo Does
27
 
28
+ Given:
29
 
30
+ 1. a mystery document, and
31
+ 2. a set of candidate authors,
32
 
33
+ the tool:
34
 
35
+ 1. Embeds all documents with a sentence-transformer model
36
+ 2. Visualizes author neighborhoods in a 2D latent space
37
+ 3. Shows LLM-derived stylistic cues and Gram2Vec linguistic features separately
38
+ 4. Highlights influential spans in the text for each explanation
39
 
40
+ This helps you understand why the model prefers one author over another.
41
 
42
+ ## πŸ’‘ Key Features
43
+ 1. Two Feature Types
44
+ - LLM Features: semantic, discourse, and stylistic cues from LLMs
45
+ - Gram2Vec Features: n-grams, POS patterns, and stylistic markers
46
+ 2. Latent Space Visualization
47
+ - Explore global author clusters
48
+ - Zoom into local neighborhoods
49
+ - Filter explanations to authors visible in the zoom region
50
 
51
+ 3. Span-Level Highlighting
52
+ - View the exact text segments most influential for attribution for each feature type.
53
 
54
+ 4. Model-Agnostic
55
+ - Use any sentence-transformer model by entering its Hugging Face model name.
56
+
57
+ 5. Custom Data Upload
58
+ - Upload your own mystery and candidate texts for personalized analysis.
59
+
60
+ ## πŸ“₯ How to Use This Demo
61
+ 1. Choose a Model
62
+ - Select one of the provided embedding models or enter a custom HF model name.
63
+
64
+ 2. Provide Input Texts
65
+ - Upload:
66
+ - mystery author texts
67
+ - multiple candidate author texts
68
+ - Or use the predefined the reddit task
69
+
70
+ 3. Load tasks and visualizations
71
+ - The tool computes embeddings,
72
+ - and displays the latent space.
73
+
74
+ 4. Explore the Results
75
+ - Inspect author clusters
76
+ - Zoom into local regions
77
+ - Load the feature lists for your chosen zoomed region
78
+ - Compare LLM vs Gram2Vec explanations
79
+ - View highlighted spans in each document
80
+
81
+ ## πŸ”— Source Code & Development
82
+
83
+ The full implementation, including preprocessing scripts and development tools, is available on GitHub:
84
+
85
+ πŸ‘‰ https://github.com/MiladAlshomary/explainability-for-style-analysis-demo
86
+
87
+ ## Funding Acknowledgment
88
+ This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the HIATUS Program contract #2022-22072200005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
app.py CHANGED
@@ -22,7 +22,7 @@ def load_config(path="config/config.yaml"):
22
  return yaml.safe_load(f)
23
 
24
  # A comment to trigger change in spaces
25
- # comment 6
26
  cfg = load_config()
27
 
28
 
 
22
  return yaml.safe_load(f)
23
 
24
  # A comment to trigger change in spaces
25
+ # comment 7
26
  cfg = load_config()
27
 
28