Spaces:
Sleeping
Sleeping
Upload 2 files
Browse files
evaluation_results/classification_report.csv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7726c5b7899421298a3732569702f85b7584dd4e9b89229a46b8433c556ee026
|
| 3 |
+
size 400
|
evaluation_results/evaluation_results.txt
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
✅ Loaded 1953 rows from 7 CSV files.
|
| 2 |
+
Columns detected in CSVs: ['tweet_id', 'datetime', 'username', 'user_display_name', 'user_followers', 'user_following', 'user_verified', 'content', 'user_location', 'user_description', 'hashtags', 'mentions', 'phone_numbers', 'tweet_url', 'retweet_count', 'like_count', 'reply_count', 'kar_score', 'drug_score', 'crime_score', 'sentiment', 'sentiment_compound', 'is_drug_related', 'is_crime_related', 'has_contact_info', 'risk_level', 'content_hash', 'date_parsed']
|
| 3 |
+
|
| 4 |
+
=== General Stats ===
|
| 5 |
+
Columns: ['tweet_id', 'datetime', 'username', 'user_display_name', 'user_followers', 'user_following', 'user_verified', 'content', 'user_location', 'user_description', 'hashtags', 'mentions', 'phone_numbers', 'tweet_url', 'retweet_count', 'like_count', 'reply_count', 'kar_score', 'drug_score', 'crime_score', 'sentiment', 'sentiment_compound', 'is_drug_related', 'is_crime_related', 'has_contact_info', 'risk_level', 'content_hash', 'date_parsed']
|
| 6 |
+
Total rows: 1953
|
| 7 |
+
Missing values per column:
|
| 8 |
+
tweet_id 0
|
| 9 |
+
datetime 0
|
| 10 |
+
username 0
|
| 11 |
+
user_display_name 0
|
| 12 |
+
user_followers 0
|
| 13 |
+
user_following 0
|
| 14 |
+
user_verified 0
|
| 15 |
+
content 0
|
| 16 |
+
user_location 362
|
| 17 |
+
user_description 1953
|
| 18 |
+
hashtags 978
|
| 19 |
+
mentions 1349
|
| 20 |
+
phone_numbers 1939
|
| 21 |
+
tweet_url 0
|
| 22 |
+
retweet_count 0
|
| 23 |
+
like_count 0
|
| 24 |
+
reply_count 0
|
| 25 |
+
kar_score 0
|
| 26 |
+
drug_score 0
|
| 27 |
+
crime_score 0
|
| 28 |
+
sentiment 0
|
| 29 |
+
sentiment_compound 0
|
| 30 |
+
is_drug_related 0
|
| 31 |
+
is_crime_related 0
|
| 32 |
+
has_contact_info 0
|
| 33 |
+
risk_level 0
|
| 34 |
+
content_hash 0
|
| 35 |
+
date_parsed 0
|
| 36 |
+
dtype: int64
|
| 37 |
+
Duplicate rows: 1148
|
| 38 |
+
|
| 39 |
+
Sample rows with missing values:
|
| 40 |
+
tweet_id datetime username ... risk_level content_hash date_parsed
|
| 41 |
+
0 1959959601048420576 25-08-2025 18:12:37 NewsMeter_In ... HIGH 8abf18065977e4493f16f4b165624ef2 2025-08-25 18:12:37
|
| 42 |
+
1 1966399847793345021 12-09-2025 12:43:52 idencies05 ... MEDIUM d555008fafd825e05aca84ccffb13a77 2025-09-12 12:43:52
|
| 43 |
+
2 1969293025831530539 20-09-2025 12:20:19 idencies05 ... CRITICAL 9884c305ea49804875088cd5b72c1781 2025-09-20 12:20:19
|
| 44 |
+
3 1963241947126018435 03-09-2025 19:35:30 Prathikthethith ... CRITICAL f2de5dc092522d4e1af0e49442e66937 2025-09-03 19:35:30
|
| 45 |
+
4 1969293025831530539 20-09-2025 12:20:19 idencies05 ... CRITICAL 9884c305ea49804875088cd5b72c1781 2025-09-20 12:20:19
|
| 46 |
+
|
| 47 |
+
[5 rows x 28 columns]
|
| 48 |
+
|
| 49 |
+
Sample duplicate rows:
|
| 50 |
+
tweet_id datetime username ... risk_level content_hash date_parsed
|
| 51 |
+
0 1959959601048420576 25-08-2025 18:12:37 NewsMeter_In ... HIGH 8abf18065977e4493f16f4b165624ef2 2025-08-25 18:12:37
|
| 52 |
+
1 1966399847793345021 12-09-2025 12:43:52 idencies05 ... MEDIUM d555008fafd825e05aca84ccffb13a77 2025-09-12 12:43:52
|
| 53 |
+
2 1969293025831530539 20-09-2025 12:20:19 idencies05 ... CRITICAL 9884c305ea49804875088cd5b72c1781 2025-09-20 12:20:19
|
| 54 |
+
3 1963241947126018435 03-09-2025 19:35:30 Prathikthethith ... CRITICAL f2de5dc092522d4e1af0e49442e66937 2025-09-03 19:35:30
|
| 55 |
+
5 1978329389067641222 15-10-2025 10:47:36 XpressBengaluru ... CRITICAL eef36dce5060c43923727565aa695b93 2025-10-15 10:47:36
|
| 56 |
+
|
| 57 |
+
[5 rows x 28 columns]
|
| 58 |
+
|
| 59 |
+
=== is_drug_related Distribution ===
|
| 60 |
+
is_drug_related
|
| 61 |
+
True 1473
|
| 62 |
+
False 480
|
| 63 |
+
Name: count, dtype: int64
|
| 64 |
+
Proportion:
|
| 65 |
+
is_drug_related
|
| 66 |
+
True 0.7542
|
| 67 |
+
False 0.2458
|
| 68 |
+
Name: proportion, dtype: float64
|
| 69 |
+
|
| 70 |
+
=== is_crime_related Distribution ===
|
| 71 |
+
is_crime_related
|
| 72 |
+
True 1576
|
| 73 |
+
False 377
|
| 74 |
+
Name: count, dtype: int64
|
| 75 |
+
Proportion:
|
| 76 |
+
is_crime_related
|
| 77 |
+
True 0.807
|
| 78 |
+
False 0.193
|
| 79 |
+
Name: proportion, dtype: float64
|
| 80 |
+
|
| 81 |
+
=== risk_level Distribution ===
|
| 82 |
+
risk_level
|
| 83 |
+
MEDIUM 1523
|
| 84 |
+
LOW 290
|
| 85 |
+
HIGH 127
|
| 86 |
+
CRITICAL 13
|
| 87 |
+
Name: count, dtype: int64
|
| 88 |
+
Proportion:
|
| 89 |
+
risk_level
|
| 90 |
+
MEDIUM 0.7798
|
| 91 |
+
LOW 0.1485
|
| 92 |
+
HIGH 0.0650
|
| 93 |
+
CRITICAL 0.0067
|
| 94 |
+
Name: proportion, dtype: float64
|
| 95 |
+
|
| 96 |
+
=== Date Range ===
|
| 97 |
+
Earliest: 2025-03-14 16:42:38
|
| 98 |
+
Latest: 2025-10-17 00:36:41
|
| 99 |
+
|
| 100 |
+
=== Daily Counts of Posts ===
|
| 101 |
+
date
|
| 102 |
+
2025-03-14 2
|
| 103 |
+
2025-07-19 112
|
| 104 |
+
2025-07-20 35
|
| 105 |
+
2025-07-21 23
|
| 106 |
+
2025-07-22 27
|
| 107 |
+
...
|
| 108 |
+
2025-10-13 43
|
| 109 |
+
2025-10-14 27
|
| 110 |
+
2025-10-15 20
|
| 111 |
+
2025-10-16 10
|
| 112 |
+
2025-10-17 4
|
| 113 |
+
Length: 91, dtype: int64
|
| 114 |
+
|
| 115 |
+
=== User Analysis ===
|
| 116 |
+
Total unique users: 554
|
| 117 |
+
Top 10 users by post count:
|
| 118 |
+
username
|
| 119 |
+
Newskarnataka 118
|
| 120 |
+
grok 57
|
| 121 |
+
KannadaRepublic 30
|
| 122 |
+
XpressBengaluru 25
|
| 123 |
+
path2shah 23
|
| 124 |
+
bangalore_22532 21
|
| 125 |
+
ndtv 20
|
| 126 |
+
ians_india 18
|
| 127 |
+
UvEnglish 17
|
| 128 |
+
wegro_app 17
|
| 129 |
+
Name: count, dtype: int64
|
| 130 |
+
|
| 131 |
+
=== Scraper Evaluation Metrics ===
|
| 132 |
+
Completeness (all columns filled): 88.38%
|
| 133 |
+
Duplicate rows rate: 58.78%
|
| 134 |
+
is_drug_related relevance rate: 75.42%
|
| 135 |
+
is_crime_related relevance rate: 80.7%
|
| 136 |
+
Time coverage ratio (active days / total days): 41.94%
|
| 137 |
+
|
| 138 |
+
=== Classification Metrics (is_drug_related vs is_crime_related) ===
|
| 139 |
+
Accuracy: 0.6994
|
| 140 |
+
Precision: 0.8357
|
| 141 |
+
Recall: 0.7811
|
| 142 |
+
F1-score: 0.8075
|
| 143 |
+
|
| 144 |
+
Classification report saved as 'classification_report.csv'
|