Spaces:
Runtime error
Runtime error
gsr
Browse files- CONTRIBUTING.md +0 -35
- main.ipynb +0 -1044
CONTRIBUTING.md
DELETED
|
@@ -1,35 +0,0 @@
|
|
| 1 |
-
# Contributing to selenium-twitter-scraper
|
| 2 |
-
|
| 3 |
-
We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:
|
| 4 |
-
|
| 5 |
-
- Reporting a bug
|
| 6 |
-
- Discussing the current state of the code
|
| 7 |
-
- Submitting a fix
|
| 8 |
-
- Proposing new features
|
| 9 |
-
- Becoming a maintainer
|
| 10 |
-
|
| 11 |
-
## We Develop with Github
|
| 12 |
-
|
| 13 |
-
We use github to host code, to track issues and feature requests, as well as accept pull requests.
|
| 14 |
-
|
| 15 |
-
## We Use [Github Flow](https://guides.github.com/introduction/flow/index.html), So All Code Changes Happen Through Pull Requests
|
| 16 |
-
|
| 17 |
-
Pull requests are the best way to propose changes to the codebase (we use [Github Flow](https://docs.github.com/en/get-started/quickstart/github-flow). We actively welcome your pull requests:
|
| 18 |
-
|
| 19 |
-
1. Fork the repo and create your branch from `master`.
|
| 20 |
-
2. If you've added code that should be tested, add tests.
|
| 21 |
-
3. Ensure the test suite passes.
|
| 22 |
-
4. Make sure your code lints.
|
| 23 |
-
5. Issue that pull request!
|
| 24 |
-
|
| 25 |
-
## Any contributions you make will be under the Apache License Version 2.0 Software License
|
| 26 |
-
|
| 27 |
-
In short, when you submit code changes, your submissions are understood to be under the same [Apache License Version 2.0 License](https://choosealicense.com/licenses/apache-2.0/) that covers the project. Feel free to contact the maintainers if that's a concern.
|
| 28 |
-
|
| 29 |
-
## Report bugs using Github's [issues](https://github.com/godkingjay/selenium-twitter-scraper/issues)
|
| 30 |
-
|
| 31 |
-
We use GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/godkingjay/selenium-twitter-scraper/issues/new); it's that easy!
|
| 32 |
-
|
| 33 |
-
## License
|
| 34 |
-
|
| 35 |
-
By contributing, you agree that your contributions will be licensed under its Apache License Version 2.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
main.ipynb
DELETED
|
@@ -1,1044 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"cells": [
|
| 3 |
-
{
|
| 4 |
-
"attachments": {},
|
| 5 |
-
"cell_type": "markdown",
|
| 6 |
-
"metadata": {},
|
| 7 |
-
"source": [
|
| 8 |
-
"# Twitter Scraper using Selenium\n",
|
| 9 |
-
"\n",
|
| 10 |
-
"Scraper for Twitter Tweets using selenium. It can scrape tweets from:\n",
|
| 11 |
-
"- Home/New Feeds\n",
|
| 12 |
-
"- User Profile Tweets\n",
|
| 13 |
-
"- Query or Search Tweets\n",
|
| 14 |
-
"- Hashtags Tweets\n",
|
| 15 |
-
"- Advanced Search Tweets"
|
| 16 |
-
]
|
| 17 |
-
},
|
| 18 |
-
{
|
| 19 |
-
"cell_type": "code",
|
| 20 |
-
"execution_count": null,
|
| 21 |
-
"metadata": {},
|
| 22 |
-
"outputs": [],
|
| 23 |
-
"source": [
|
| 24 |
-
"import os\n",
|
| 25 |
-
"import sys\n",
|
| 26 |
-
"import pandas as pd\n",
|
| 27 |
-
"\n",
|
| 28 |
-
"from datetime import datetime\n",
|
| 29 |
-
"from fake_headers import Headers\n",
|
| 30 |
-
"from time import sleep\n",
|
| 31 |
-
"from selenium import webdriver\n",
|
| 32 |
-
"from selenium.webdriver import Chrome\n",
|
| 33 |
-
"from selenium.webdriver.common.keys import Keys\n",
|
| 34 |
-
"from selenium.common.exceptions import (\n",
|
| 35 |
-
" NoSuchElementException,\n",
|
| 36 |
-
" StaleElementReferenceException,\n",
|
| 37 |
-
" WebDriverException,\n",
|
| 38 |
-
")\n",
|
| 39 |
-
"from selenium.webdriver.common.action_chains import ActionChains\n",
|
| 40 |
-
"\n",
|
| 41 |
-
"from selenium.webdriver.chrome.webdriver import WebDriver\n",
|
| 42 |
-
"from selenium.webdriver.chrome.options import Options as ChromeOptions\n",
|
| 43 |
-
"from selenium.webdriver.chrome.service import Service as ChromeService\n",
|
| 44 |
-
"\n",
|
| 45 |
-
"from webdriver_manager.chrome import ChromeDriverManager"
|
| 46 |
-
]
|
| 47 |
-
},
|
| 48 |
-
{
|
| 49 |
-
"attachments": {},
|
| 50 |
-
"cell_type": "markdown",
|
| 51 |
-
"metadata": {},
|
| 52 |
-
"source": [
|
| 53 |
-
"# Progress Class\n",
|
| 54 |
-
"\n",
|
| 55 |
-
"Class for the progress of the scraper instance."
|
| 56 |
-
]
|
| 57 |
-
},
|
| 58 |
-
{
|
| 59 |
-
"cell_type": "code",
|
| 60 |
-
"execution_count": null,
|
| 61 |
-
"metadata": {},
|
| 62 |
-
"outputs": [],
|
| 63 |
-
"source": [
|
| 64 |
-
"class Progress:\n",
|
| 65 |
-
" def __init__(self, current, total) -> None:\n",
|
| 66 |
-
" self.current = current\n",
|
| 67 |
-
" self.total = total\n",
|
| 68 |
-
" pass\n",
|
| 69 |
-
"\n",
|
| 70 |
-
" def print_progress(self, current) -> None:\n",
|
| 71 |
-
" self.current = current\n",
|
| 72 |
-
" progress = current / self.total\n",
|
| 73 |
-
" bar_length = 40\n",
|
| 74 |
-
" progress_bar = (\n",
|
| 75 |
-
" \"[\"\n",
|
| 76 |
-
" + \"=\" * int(bar_length * progress)\n",
|
| 77 |
-
" + \"-\" * (bar_length - int(bar_length * progress))\n",
|
| 78 |
-
" + \"]\"\n",
|
| 79 |
-
" )\n",
|
| 80 |
-
" sys.stdout.write(\n",
|
| 81 |
-
" \"\\rProgress: [{:<40}] {:.2%} {} of {}\".format(\n",
|
| 82 |
-
" progress_bar, progress, current, self.total\n",
|
| 83 |
-
" )\n",
|
| 84 |
-
" )\n",
|
| 85 |
-
" sys.stdout.flush()\n"
|
| 86 |
-
]
|
| 87 |
-
},
|
| 88 |
-
{
|
| 89 |
-
"attachments": {},
|
| 90 |
-
"cell_type": "markdown",
|
| 91 |
-
"metadata": {},
|
| 92 |
-
"source": [
|
| 93 |
-
"# Scroller Class\n",
|
| 94 |
-
"\n",
|
| 95 |
-
"Class for the scrollbar of the web page."
|
| 96 |
-
]
|
| 97 |
-
},
|
| 98 |
-
{
|
| 99 |
-
"cell_type": "code",
|
| 100 |
-
"execution_count": null,
|
| 101 |
-
"metadata": {},
|
| 102 |
-
"outputs": [],
|
| 103 |
-
"source": [
|
| 104 |
-
"class Scroller:\n",
|
| 105 |
-
" def __init__(self, driver) -> None:\n",
|
| 106 |
-
" self.driver = driver\n",
|
| 107 |
-
" self.current_position = 0\n",
|
| 108 |
-
" self.last_position = driver.execute_script(\"return window.pageYOffset;\")\n",
|
| 109 |
-
" self.scrolling = True\n",
|
| 110 |
-
" self.scroll_count = 0\n",
|
| 111 |
-
" pass\n",
|
| 112 |
-
"\n",
|
| 113 |
-
" def reset(self) -> None:\n",
|
| 114 |
-
" self.current_position = 0\n",
|
| 115 |
-
" self.last_position = self.driver.execute_script(\"return window.pageYOffset;\")\n",
|
| 116 |
-
" self.scroll_count = 0\n",
|
| 117 |
-
" pass\n",
|
| 118 |
-
"\n",
|
| 119 |
-
" def scroll_to_top(self) -> None:\n",
|
| 120 |
-
" self.driver.execute_script(\"window.scrollTo(0, 0);\")\n",
|
| 121 |
-
" pass\n",
|
| 122 |
-
"\n",
|
| 123 |
-
" def scroll_to_bottom(self) -> None:\n",
|
| 124 |
-
" self.driver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")\n",
|
| 125 |
-
" pass\n",
|
| 126 |
-
"\n",
|
| 127 |
-
" def update_scroll_position(self) -> None:\n",
|
| 128 |
-
" self.current_position = self.driver.execute_script(\"return window.pageYOffset;\")\n",
|
| 129 |
-
" pass\n"
|
| 130 |
-
]
|
| 131 |
-
},
|
| 132 |
-
{
|
| 133 |
-
"attachments": {},
|
| 134 |
-
"cell_type": "markdown",
|
| 135 |
-
"metadata": {},
|
| 136 |
-
"source": [
|
| 137 |
-
"# Tweet Class\n",
|
| 138 |
-
"\n",
|
| 139 |
-
"Object for the tweet. Including its data."
|
| 140 |
-
]
|
| 141 |
-
},
|
| 142 |
-
{
|
| 143 |
-
"cell_type": "code",
|
| 144 |
-
"execution_count": null,
|
| 145 |
-
"metadata": {},
|
| 146 |
-
"outputs": [],
|
| 147 |
-
"source": [
|
| 148 |
-
"class Tweet:\n",
|
| 149 |
-
" def __init__(\n",
|
| 150 |
-
" self,\n",
|
| 151 |
-
" card: WebDriver,\n",
|
| 152 |
-
" driver: WebDriver,\n",
|
| 153 |
-
" actions: ActionChains,\n",
|
| 154 |
-
" scrape_poster_details=False\n",
|
| 155 |
-
" ) -> None:\n",
|
| 156 |
-
" self.card = card\n",
|
| 157 |
-
" self.error = False\n",
|
| 158 |
-
" self.tweet = None\n",
|
| 159 |
-
"\n",
|
| 160 |
-
" try:\n",
|
| 161 |
-
" self.user = card.find_element(\n",
|
| 162 |
-
" \"xpath\", './/div[@data-testid=\"User-Name\"]//span'\n",
|
| 163 |
-
" ).text\n",
|
| 164 |
-
" except NoSuchElementException:\n",
|
| 165 |
-
" self.error = True\n",
|
| 166 |
-
" self.user = \"skip\"\n",
|
| 167 |
-
"\n",
|
| 168 |
-
" try:\n",
|
| 169 |
-
" self.handle = card.find_element(\n",
|
| 170 |
-
" \"xpath\", './/span[contains(text(), \"@\")]'\n",
|
| 171 |
-
" ).text\n",
|
| 172 |
-
" except NoSuchElementException:\n",
|
| 173 |
-
" self.error = True\n",
|
| 174 |
-
" self.handle = \"skip\"\n",
|
| 175 |
-
"\n",
|
| 176 |
-
" try:\n",
|
| 177 |
-
" self.date_time = card.find_element(\"xpath\", \".//time\").get_attribute(\n",
|
| 178 |
-
" \"datetime\"\n",
|
| 179 |
-
" )\n",
|
| 180 |
-
"\n",
|
| 181 |
-
" if self.date_time is not None:\n",
|
| 182 |
-
" self.is_ad = False\n",
|
| 183 |
-
" except NoSuchElementException:\n",
|
| 184 |
-
" self.is_ad = True\n",
|
| 185 |
-
" self.error = True\n",
|
| 186 |
-
" self.date_time = \"skip\"\n",
|
| 187 |
-
" \n",
|
| 188 |
-
" if self.error:\n",
|
| 189 |
-
" return\n",
|
| 190 |
-
"\n",
|
| 191 |
-
" try:\n",
|
| 192 |
-
" card.find_element(\n",
|
| 193 |
-
" \"xpath\", './/*[local-name()=\"svg\" and @data-testid=\"icon-verified\"]'\n",
|
| 194 |
-
" )\n",
|
| 195 |
-
"\n",
|
| 196 |
-
" self.verified = True\n",
|
| 197 |
-
" except NoSuchElementException:\n",
|
| 198 |
-
" self.verified = False\n",
|
| 199 |
-
"\n",
|
| 200 |
-
" self.content = \"\"\n",
|
| 201 |
-
" contents = card.find_elements(\n",
|
| 202 |
-
" \"xpath\",\n",
|
| 203 |
-
" '(.//div[@data-testid=\"tweetText\"])[1]/span | (.//div[@data-testid=\"tweetText\"])[1]/a',\n",
|
| 204 |
-
" )\n",
|
| 205 |
-
"\n",
|
| 206 |
-
" for index, content in enumerate(contents):\n",
|
| 207 |
-
" self.content += content.text\n",
|
| 208 |
-
"\n",
|
| 209 |
-
" try:\n",
|
| 210 |
-
" self.reply_cnt = card.find_element(\n",
|
| 211 |
-
" \"xpath\", './/div[@data-testid=\"reply\"]//span'\n",
|
| 212 |
-
" ).text\n",
|
| 213 |
-
" \n",
|
| 214 |
-
" if self.reply_cnt == \"\":\n",
|
| 215 |
-
" self.reply_cnt = \"0\"\n",
|
| 216 |
-
" except NoSuchElementException:\n",
|
| 217 |
-
" self.reply_cnt = \"0\"\n",
|
| 218 |
-
"\n",
|
| 219 |
-
" try:\n",
|
| 220 |
-
" self.retweet_cnt = card.find_element(\n",
|
| 221 |
-
" \"xpath\", './/div[@data-testid=\"retweet\"]//span'\n",
|
| 222 |
-
" ).text\n",
|
| 223 |
-
" \n",
|
| 224 |
-
" if self.retweet_cnt == \"\":\n",
|
| 225 |
-
" self.retweet_cnt = \"0\"\n",
|
| 226 |
-
" except NoSuchElementException:\n",
|
| 227 |
-
" self.retweet_cnt = \"0\"\n",
|
| 228 |
-
"\n",
|
| 229 |
-
" try:\n",
|
| 230 |
-
" self.like_cnt = card.find_element(\n",
|
| 231 |
-
" \"xpath\", './/div[@data-testid=\"like\"]//span'\n",
|
| 232 |
-
" ).text\n",
|
| 233 |
-
" \n",
|
| 234 |
-
" if self.like_cnt == \"\":\n",
|
| 235 |
-
" self.like_cnt = \"0\"\n",
|
| 236 |
-
" except NoSuchElementException:\n",
|
| 237 |
-
" self.like_cnt = \"0\"\n",
|
| 238 |
-
"\n",
|
| 239 |
-
" try:\n",
|
| 240 |
-
" self.analytics_cnt = card.find_element(\n",
|
| 241 |
-
" \"xpath\", './/a[contains(@href, \"/analytics\")]//span'\n",
|
| 242 |
-
" ).text\n",
|
| 243 |
-
" \n",
|
| 244 |
-
" if self.analytics_cnt == \"\":\n",
|
| 245 |
-
" self.analytics_cnt = \"0\"\n",
|
| 246 |
-
" except NoSuchElementException:\n",
|
| 247 |
-
" self.analytics_cnt = \"0\"\n",
|
| 248 |
-
"\n",
|
| 249 |
-
" try:\n",
|
| 250 |
-
" self.tags = card.find_elements(\n",
|
| 251 |
-
" \"xpath\",\n",
|
| 252 |
-
" './/a[contains(@href, \"src=hashtag_click\")]',\n",
|
| 253 |
-
" )\n",
|
| 254 |
-
"\n",
|
| 255 |
-
" self.tags = [tag.text for tag in self.tags]\n",
|
| 256 |
-
" except NoSuchElementException:\n",
|
| 257 |
-
" self.tags = []\n",
|
| 258 |
-
" \n",
|
| 259 |
-
" try:\n",
|
| 260 |
-
" self.mentions = card.find_elements(\n",
|
| 261 |
-
" \"xpath\",\n",
|
| 262 |
-
" '(.//div[@data-testid=\"tweetText\"])[1]//a[contains(text(), \"@\")]',\n",
|
| 263 |
-
" )\n",
|
| 264 |
-
"\n",
|
| 265 |
-
" self.mentions = [mention.text for mention in self.mentions]\n",
|
| 266 |
-
" except NoSuchElementException:\n",
|
| 267 |
-
" self.mentions = []\n",
|
| 268 |
-
" \n",
|
| 269 |
-
" try:\n",
|
| 270 |
-
" raw_emojis = card.find_elements(\n",
|
| 271 |
-
" \"xpath\",\n",
|
| 272 |
-
" '(.//div[@data-testid=\"tweetText\"])[1]/img[contains(@src, \"emoji\")]',\n",
|
| 273 |
-
" )\n",
|
| 274 |
-
" \n",
|
| 275 |
-
" self.emojis = [emoji.get_attribute(\"alt\").encode(\"unicode-escape\").decode(\"ASCII\") for emoji in raw_emojis]\n",
|
| 276 |
-
" except NoSuchElementException:\n",
|
| 277 |
-
" self.emojis = []\n",
|
| 278 |
-
" \n",
|
| 279 |
-
" try:\n",
|
| 280 |
-
" self.profile_img = card.find_element(\n",
|
| 281 |
-
" \"xpath\", './/div[@data-testid=\"Tweet-User-Avatar\"]//img'\n",
|
| 282 |
-
" ).get_attribute(\"src\")\n",
|
| 283 |
-
" except NoSuchElementException:\n",
|
| 284 |
-
" self.profile_img = \"\"\n",
|
| 285 |
-
" \n",
|
| 286 |
-
" try:\n",
|
| 287 |
-
" self.tweet_link = self.card.find_element(\n",
|
| 288 |
-
" \"xpath\",\n",
|
| 289 |
-
" \".//a[contains(@href, '/status/')]\",\n",
|
| 290 |
-
" ).get_attribute(\"href\")\n",
|
| 291 |
-
" self.tweet_id = str(self.tweet_link.split(\"/\")[-1])\n",
|
| 292 |
-
" except NoSuchElementException:\n",
|
| 293 |
-
" self.tweet_link = \"\"\n",
|
| 294 |
-
" self.tweet_id = \"\"\n",
|
| 295 |
-
" \n",
|
| 296 |
-
" self.following_cnt = \"0\"\n",
|
| 297 |
-
" self.followers_cnt = \"0\"\n",
|
| 298 |
-
" self.user_id = None\n",
|
| 299 |
-
" \n",
|
| 300 |
-
" if scrape_poster_details:\n",
|
| 301 |
-
" el_name = card.find_element(\n",
|
| 302 |
-
" \"xpath\", './/div[@data-testid=\"User-Name\"]//span'\n",
|
| 303 |
-
" )\n",
|
| 304 |
-
" \n",
|
| 305 |
-
" ext_hover_card = False\n",
|
| 306 |
-
" ext_user_id = False\n",
|
| 307 |
-
" ext_following = False\n",
|
| 308 |
-
" ext_followers = False\n",
|
| 309 |
-
" hover_attempt = 0\n",
|
| 310 |
-
" \n",
|
| 311 |
-
" while not ext_hover_card or not ext_user_id or not ext_following or not ext_followers:\n",
|
| 312 |
-
" try:\n",
|
| 313 |
-
" actions.move_to_element(el_name).perform()\n",
|
| 314 |
-
" \n",
|
| 315 |
-
" hover_card = driver.find_element(\n",
|
| 316 |
-
" \"xpath\",\n",
|
| 317 |
-
" '//div[@data-testid=\"hoverCardParent\"]'\n",
|
| 318 |
-
" )\n",
|
| 319 |
-
" \n",
|
| 320 |
-
" ext_hover_card = True\n",
|
| 321 |
-
" \n",
|
| 322 |
-
" while not ext_user_id:\n",
|
| 323 |
-
" try:\n",
|
| 324 |
-
" raw_user_id = hover_card.find_element(\n",
|
| 325 |
-
" \"xpath\",\n",
|
| 326 |
-
" '(.//div[contains(@data-testid, \"-follow\")]) | (.//div[contains(@data-testid, \"-unfollow\")])'\n",
|
| 327 |
-
" ).get_attribute(\"data-testid\")\n",
|
| 328 |
-
" \n",
|
| 329 |
-
" if raw_user_id == \"\":\n",
|
| 330 |
-
" self.user_id = None\n",
|
| 331 |
-
" else:\n",
|
| 332 |
-
" self.user_id = str(raw_user_id.split(\"-\")[0])\n",
|
| 333 |
-
" \n",
|
| 334 |
-
" ext_user_id = True\n",
|
| 335 |
-
" except NoSuchElementException:\n",
|
| 336 |
-
" continue\n",
|
| 337 |
-
" except StaleElementReferenceException:\n",
|
| 338 |
-
" self.error = True\n",
|
| 339 |
-
" return\n",
|
| 340 |
-
" \n",
|
| 341 |
-
" while not ext_following:\n",
|
| 342 |
-
" try:\n",
|
| 343 |
-
" self.following_cnt = hover_card.find_element(\n",
|
| 344 |
-
" \"xpath\",\n",
|
| 345 |
-
" './/a[contains(@href, \"/following\")]//span'\n",
|
| 346 |
-
" ).text\n",
|
| 347 |
-
" \n",
|
| 348 |
-
" if self.following_cnt == \"\":\n",
|
| 349 |
-
" self.following_cnt = \"0\"\n",
|
| 350 |
-
" \n",
|
| 351 |
-
" ext_following = True\n",
|
| 352 |
-
" except NoSuchElementException:\n",
|
| 353 |
-
" continue\n",
|
| 354 |
-
" except StaleElementReferenceException:\n",
|
| 355 |
-
" self.error = True\n",
|
| 356 |
-
" return\n",
|
| 357 |
-
" \n",
|
| 358 |
-
" while not ext_followers:\n",
|
| 359 |
-
" try:\n",
|
| 360 |
-
" self.followers_cnt = hover_card.find_element(\n",
|
| 361 |
-
" \"xpath\",\n",
|
| 362 |
-
" './/a[contains(@href, \"/verified_followers\")]//span'\n",
|
| 363 |
-
" ).text\n",
|
| 364 |
-
" \n",
|
| 365 |
-
" if self.followers_cnt == \"\":\n",
|
| 366 |
-
" self.followers_cnt = \"0\"\n",
|
| 367 |
-
" \n",
|
| 368 |
-
" ext_followers = True\n",
|
| 369 |
-
" except NoSuchElementException:\n",
|
| 370 |
-
" continue\n",
|
| 371 |
-
" except StaleElementReferenceException:\n",
|
| 372 |
-
" self.error = True\n",
|
| 373 |
-
" return\n",
|
| 374 |
-
" except NoSuchElementException:\n",
|
| 375 |
-
" if hover_attempt==3:\n",
|
| 376 |
-
" self.error\n",
|
| 377 |
-
" return\n",
|
| 378 |
-
" hover_attempt+=1\n",
|
| 379 |
-
" sleep(0.5)\n",
|
| 380 |
-
" continue\n",
|
| 381 |
-
" except StaleElementReferenceException:\n",
|
| 382 |
-
" self.error = True\n",
|
| 383 |
-
" return\n",
|
| 384 |
-
" \n",
|
| 385 |
-
" if ext_hover_card and ext_following and ext_followers:\n",
|
| 386 |
-
" actions.reset_actions()\n",
|
| 387 |
-
" \n",
|
| 388 |
-
" self.tweet = (\n",
|
| 389 |
-
" self.user,\n",
|
| 390 |
-
" self.handle,\n",
|
| 391 |
-
" self.date_time,\n",
|
| 392 |
-
" self.verified,\n",
|
| 393 |
-
" self.content,\n",
|
| 394 |
-
" self.reply_cnt,\n",
|
| 395 |
-
" self.retweet_cnt,\n",
|
| 396 |
-
" self.like_cnt,\n",
|
| 397 |
-
" self.analytics_cnt,\n",
|
| 398 |
-
" self.tags,\n",
|
| 399 |
-
" self.mentions,\n",
|
| 400 |
-
" self.emojis,\n",
|
| 401 |
-
" self.profile_img,\n",
|
| 402 |
-
" self.tweet_link,\n",
|
| 403 |
-
" self.tweet_id,\n",
|
| 404 |
-
" self.user_id,\n",
|
| 405 |
-
" self.following_cnt,\n",
|
| 406 |
-
" self.followers_cnt,\n",
|
| 407 |
-
" )\n",
|
| 408 |
-
"\n",
|
| 409 |
-
" pass\n"
|
| 410 |
-
]
|
| 411 |
-
},
|
| 412 |
-
{
|
| 413 |
-
"attachments": {},
|
| 414 |
-
"cell_type": "markdown",
|
| 415 |
-
"metadata": {},
|
| 416 |
-
"source": [
|
| 417 |
-
"# Twitter Scraper Class\n",
|
| 418 |
-
"\n",
|
| 419 |
-
"Class for the Twitter Scraper."
|
| 420 |
-
]
|
| 421 |
-
},
|
| 422 |
-
{
|
| 423 |
-
"cell_type": "code",
|
| 424 |
-
"execution_count": null,
|
| 425 |
-
"metadata": {},
|
| 426 |
-
"outputs": [],
|
| 427 |
-
"source": [
|
| 428 |
-
"TWITTER_LOGIN_URL = \"https://twitter.com/i/flow/login\"\n",
|
| 429 |
-
"\n",
|
| 430 |
-
"class Twitter_Scraper:\n",
|
| 431 |
-
" def __init__(\n",
|
| 432 |
-
" self,\n",
|
| 433 |
-
" username,\n",
|
| 434 |
-
" password,\n",
|
| 435 |
-
" max_tweets=50,\n",
|
| 436 |
-
" scrape_username=None,\n",
|
| 437 |
-
" scrape_hashtag=None,\n",
|
| 438 |
-
" scrape_query=None,\n",
|
| 439 |
-
" scrape_poster_details=False,\n",
|
| 440 |
-
" scrape_latest=True,\n",
|
| 441 |
-
" scrape_top=False,\n",
|
| 442 |
-
" ):\n",
|
| 443 |
-
" print(\"Initializing Twitter Scraper...\")\n",
|
| 444 |
-
" self.username = username\n",
|
| 445 |
-
" self.password = password\n",
|
| 446 |
-
" self.interrupted = False\n",
|
| 447 |
-
" self.tweet_ids = set()\n",
|
| 448 |
-
" self.data = []\n",
|
| 449 |
-
" self.tweet_cards = []\n",
|
| 450 |
-
" self.scraper_details = {\n",
|
| 451 |
-
" \"type\": None,\n",
|
| 452 |
-
" \"username\": None,\n",
|
| 453 |
-
" \"hashtag\": None,\n",
|
| 454 |
-
" \"query\": None,\n",
|
| 455 |
-
" \"tab\": None,\n",
|
| 456 |
-
" \"poster_details\": False,\n",
|
| 457 |
-
" }\n",
|
| 458 |
-
" self.max_tweets = max_tweets\n",
|
| 459 |
-
" self.progress = Progress(0, max_tweets)\n",
|
| 460 |
-
" self.router = self.go_to_home\n",
|
| 461 |
-
" self.driver = self._get_driver()\n",
|
| 462 |
-
" self.actions = ActionChains(self.driver)\n",
|
| 463 |
-
" self.scroller = Scroller(self.driver)\n",
|
| 464 |
-
" self._config_scraper(\n",
|
| 465 |
-
" max_tweets,\n",
|
| 466 |
-
" scrape_username,\n",
|
| 467 |
-
" scrape_hashtag,\n",
|
| 468 |
-
" scrape_query,\n",
|
| 469 |
-
" scrape_latest,\n",
|
| 470 |
-
" scrape_top,\n",
|
| 471 |
-
" scrape_poster_details,\n",
|
| 472 |
-
" )\n",
|
| 473 |
-
"\n",
|
| 474 |
-
" def _config_scraper(\n",
|
| 475 |
-
" self,\n",
|
| 476 |
-
" max_tweets=50,\n",
|
| 477 |
-
" scrape_username=None,\n",
|
| 478 |
-
" scrape_hashtag=None,\n",
|
| 479 |
-
" scrape_query=None,\n",
|
| 480 |
-
" scrape_latest=True,\n",
|
| 481 |
-
" scrape_top=False,\n",
|
| 482 |
-
" scrape_poster_details=False,\n",
|
| 483 |
-
" ):\n",
|
| 484 |
-
" self.tweet_ids = set()\n",
|
| 485 |
-
" self.data = []\n",
|
| 486 |
-
" self.tweet_cards = []\n",
|
| 487 |
-
" self.max_tweets = max_tweets\n",
|
| 488 |
-
" self.progress = Progress(0, max_tweets)\n",
|
| 489 |
-
" self.scraper_details = {\n",
|
| 490 |
-
" \"type\": None,\n",
|
| 491 |
-
" \"username\": scrape_username,\n",
|
| 492 |
-
" \"hashtag\": str(scrape_hashtag).replace(\"#\", \"\")\n",
|
| 493 |
-
" if scrape_hashtag is not None\n",
|
| 494 |
-
" else None,\n",
|
| 495 |
-
" \"query\": scrape_query,\n",
|
| 496 |
-
" \"tab\": \"Latest\" if scrape_latest else \"Top\" if scrape_top else \"Latest\",\n",
|
| 497 |
-
" \"poster_details\": scrape_poster_details,\n",
|
| 498 |
-
" }\n",
|
| 499 |
-
" self.router = self.go_to_home\n",
|
| 500 |
-
" self.scroller = Scroller(self.driver)\n",
|
| 501 |
-
"\n",
|
| 502 |
-
" if scrape_username is not None:\n",
|
| 503 |
-
" self.scraper_details[\"type\"] = \"Username\"\n",
|
| 504 |
-
" self.router = self.go_to_profile\n",
|
| 505 |
-
" elif scrape_hashtag is not None:\n",
|
| 506 |
-
" self.scraper_details[\"type\"] = \"Hashtag\"\n",
|
| 507 |
-
" self.router = self.go_to_hashtag\n",
|
| 508 |
-
" elif scrape_query is not None:\n",
|
| 509 |
-
" self.scraper_details[\"type\"] = \"Query\"\n",
|
| 510 |
-
" self.router = self.go_to_search\n",
|
| 511 |
-
" else:\n",
|
| 512 |
-
" self.scraper_details[\"type\"] = \"Home\"\n",
|
| 513 |
-
" self.router = self.go_to_home\n",
|
| 514 |
-
" pass\n",
|
| 515 |
-
"\n",
|
| 516 |
-
" def _get_driver(self):\n",
|
| 517 |
-
" print(\"Setup WebDriver...\")\n",
|
| 518 |
-
" header = Headers().generate()[\"User-Agent\"]\n",
|
| 519 |
-
"\n",
|
| 520 |
-
" browser_option = ChromeOptions()\n",
|
| 521 |
-
" browser_option.add_argument(\"--no-sandbox\")\n",
|
| 522 |
-
" browser_option.add_argument(\"--disable-dev-shm-usage\")\n",
|
| 523 |
-
" browser_option.add_argument(\"--ignore-certificate-errors\")\n",
|
| 524 |
-
" browser_option.add_argument(\"--disable-gpu\")\n",
|
| 525 |
-
" browser_option.add_argument(\"--log-level=3\")\n",
|
| 526 |
-
" browser_option.add_argument(\"--disable-notifications\")\n",
|
| 527 |
-
" browser_option.add_argument(\"--disable-popup-blocking\")\n",
|
| 528 |
-
" browser_option.add_argument(\"--user-agent={}\".format(header))\n",
|
| 529 |
-
"\n",
|
| 530 |
-
" # For Hiding Browser\n",
|
| 531 |
-
" browser_option.add_argument(\"--headless\")\n",
|
| 532 |
-
"\n",
|
| 533 |
-
" try:\n",
|
| 534 |
-
" print(\"Initializing ChromeDriver...\")\n",
|
| 535 |
-
" driver = webdriver.Chrome(\n",
|
| 536 |
-
" options=browser_option,\n",
|
| 537 |
-
" )\n",
|
| 538 |
-
"\n",
|
| 539 |
-
" print(\"WebDriver Setup Complete\")\n",
|
| 540 |
-
" return driver\n",
|
| 541 |
-
" except WebDriverException:\n",
|
| 542 |
-
" try:\n",
|
| 543 |
-
" print(\"Downloading ChromeDriver...\")\n",
|
| 544 |
-
" chromedriver_path = ChromeDriverManager().install()\n",
|
| 545 |
-
" chrome_service = ChromeService(executable_path=chromedriver_path)\n",
|
| 546 |
-
"\n",
|
| 547 |
-
" print(\"Initializing ChromeDriver...\")\n",
|
| 548 |
-
" driver = webdriver.Chrome(\n",
|
| 549 |
-
" service=chrome_service,\n",
|
| 550 |
-
" options=browser_option,\n",
|
| 551 |
-
" )\n",
|
| 552 |
-
"\n",
|
| 553 |
-
" print(\"WebDriver Setup Complete\")\n",
|
| 554 |
-
" return driver\n",
|
| 555 |
-
" except Exception as e:\n",
|
| 556 |
-
" print(f\"Error setting up WebDriver: {e}\")\n",
|
| 557 |
-
" sys.exit(1)\n",
|
| 558 |
-
" pass\n",
|
| 559 |
-
"\n",
|
| 560 |
-
" def login(self):\n",
|
| 561 |
-
" print()\n",
|
| 562 |
-
" print(\"Logging in to Twitter...\")\n",
|
| 563 |
-
"\n",
|
| 564 |
-
" try:\n",
|
| 565 |
-
" self.driver.maximize_window()\n",
|
| 566 |
-
" self.driver.get(TWITTER_LOGIN_URL)\n",
|
| 567 |
-
" sleep(3)\n",
|
| 568 |
-
"\n",
|
| 569 |
-
" self._input_username()\n",
|
| 570 |
-
" self._input_unusual_activity()\n",
|
| 571 |
-
" self._input_password()\n",
|
| 572 |
-
"\n",
|
| 573 |
-
" cookies = self.driver.get_cookies()\n",
|
| 574 |
-
"\n",
|
| 575 |
-
" auth_token = None\n",
|
| 576 |
-
"\n",
|
| 577 |
-
" for cookie in cookies:\n",
|
| 578 |
-
" if cookie[\"name\"] == \"auth_token\":\n",
|
| 579 |
-
" auth_token = cookie[\"value\"]\n",
|
| 580 |
-
" break\n",
|
| 581 |
-
"\n",
|
| 582 |
-
" if auth_token is None:\n",
|
| 583 |
-
" raise ValueError(\n",
|
| 584 |
-
" \"\"\"This may be due to the following:\n",
|
| 585 |
-
"\n",
|
| 586 |
-
"- Internet connection is unstable\n",
|
| 587 |
-
"- Username is incorrect\n",
|
| 588 |
-
"- Password is incorrect\n",
|
| 589 |
-
"\"\"\"\n",
|
| 590 |
-
" )\n",
|
| 591 |
-
"\n",
|
| 592 |
-
" print()\n",
|
| 593 |
-
" print(\"Login Successful\")\n",
|
| 594 |
-
" print()\n",
|
| 595 |
-
" except Exception as e:\n",
|
| 596 |
-
" print()\n",
|
| 597 |
-
" print(f\"Login Failed: {e}\")\n",
|
| 598 |
-
" sys.exit(1)\n",
|
| 599 |
-
"\n",
|
| 600 |
-
" pass\n",
|
| 601 |
-
"\n",
|
| 602 |
-
" def _input_username(self):\n",
|
| 603 |
-
" input_attempt = 0\n",
|
| 604 |
-
"\n",
|
| 605 |
-
" while True:\n",
|
| 606 |
-
" try:\n",
|
| 607 |
-
" username = self.driver.find_element(\n",
|
| 608 |
-
" \"xpath\", \"//input[@autocomplete='username']\"\n",
|
| 609 |
-
" )\n",
|
| 610 |
-
"\n",
|
| 611 |
-
" username.send_keys(self.username)\n",
|
| 612 |
-
" username.send_keys(Keys.RETURN)\n",
|
| 613 |
-
" sleep(3)\n",
|
| 614 |
-
" break\n",
|
| 615 |
-
" except NoSuchElementException:\n",
|
| 616 |
-
" input_attempt += 1\n",
|
| 617 |
-
" if input_attempt >= 3:\n",
|
| 618 |
-
" print()\n",
|
| 619 |
-
" print(\n",
|
| 620 |
-
" \"\"\"There was an error inputting the username.\n",
|
| 621 |
-
"\n",
|
| 622 |
-
"It may be due to the following:\n",
|
| 623 |
-
"- Internet connection is unstable\n",
|
| 624 |
-
"- Username is incorrect\n",
|
| 625 |
-
"- Twitter is experiencing unusual activity\"\"\"\n",
|
| 626 |
-
" )\n",
|
| 627 |
-
" self.driver.quit()\n",
|
| 628 |
-
" sys.exit(1)\n",
|
| 629 |
-
" else:\n",
|
| 630 |
-
" print(\"Re-attempting to input username...\")\n",
|
| 631 |
-
" sleep(2)\n",
|
| 632 |
-
"\n",
|
| 633 |
-
" def _input_unusual_activity(self):\n",
|
| 634 |
-
" input_attempt = 0\n",
|
| 635 |
-
"\n",
|
| 636 |
-
" while True:\n",
|
| 637 |
-
" try:\n",
|
| 638 |
-
" unusual_activity = self.driver.find_element(\n",
|
| 639 |
-
" \"xpath\", \"//input[@data-testid='ocfEnterTextTextInput']\"\n",
|
| 640 |
-
" )\n",
|
| 641 |
-
" unusual_activity.send_keys(self.username)\n",
|
| 642 |
-
" unusual_activity.send_keys(Keys.RETURN)\n",
|
| 643 |
-
" sleep(3)\n",
|
| 644 |
-
" break\n",
|
| 645 |
-
" except NoSuchElementException:\n",
|
| 646 |
-
" input_attempt += 1\n",
|
| 647 |
-
" if input_attempt >= 3:\n",
|
| 648 |
-
" break\n",
|
| 649 |
-
"\n",
|
| 650 |
-
" def _input_password(self):\n",
|
| 651 |
-
" input_attempt = 0\n",
|
| 652 |
-
"\n",
|
| 653 |
-
" while True:\n",
|
| 654 |
-
" try:\n",
|
| 655 |
-
" password = self.driver.find_element(\n",
|
| 656 |
-
" \"xpath\", \"//input[@autocomplete='current-password']\"\n",
|
| 657 |
-
" )\n",
|
| 658 |
-
"\n",
|
| 659 |
-
" password.send_keys(self.password)\n",
|
| 660 |
-
" password.send_keys(Keys.RETURN)\n",
|
| 661 |
-
" sleep(3)\n",
|
| 662 |
-
" break\n",
|
| 663 |
-
" except NoSuchElementException:\n",
|
| 664 |
-
" input_attempt += 1\n",
|
| 665 |
-
" if input_attempt >= 3:\n",
|
| 666 |
-
" print()\n",
|
| 667 |
-
" print(\n",
|
| 668 |
-
" \"\"\"There was an error inputting the password.\n",
|
| 669 |
-
"\n",
|
| 670 |
-
"It may be due to the following:\n",
|
| 671 |
-
"- Internet connection is unstable\n",
|
| 672 |
-
"- Password is incorrect\n",
|
| 673 |
-
"- Twitter is experiencing unusual activity\"\"\"\n",
|
| 674 |
-
" )\n",
|
| 675 |
-
" self.driver.quit()\n",
|
| 676 |
-
" sys.exit(1)\n",
|
| 677 |
-
" else:\n",
|
| 678 |
-
" print(\"Re-attempting to input password...\")\n",
|
| 679 |
-
" sleep(2)\n",
|
| 680 |
-
"\n",
|
| 681 |
-
" def go_to_home(self):\n",
|
| 682 |
-
" self.driver.get(\"https://twitter.com/home\")\n",
|
| 683 |
-
" sleep(3)\n",
|
| 684 |
-
" pass\n",
|
| 685 |
-
"\n",
|
| 686 |
-
" def go_to_profile(self):\n",
|
| 687 |
-
" if (\n",
|
| 688 |
-
" self.scraper_details[\"username\"] is None\n",
|
| 689 |
-
" or self.scraper_details[\"username\"] == \"\"\n",
|
| 690 |
-
" ):\n",
|
| 691 |
-
" print(\"Username is not set.\")\n",
|
| 692 |
-
" sys.exit(1)\n",
|
| 693 |
-
" else:\n",
|
| 694 |
-
" self.driver.get(f\"https://twitter.com/{self.scraper_details['username']}\")\n",
|
| 695 |
-
" sleep(3)\n",
|
| 696 |
-
" pass\n",
|
| 697 |
-
"\n",
|
| 698 |
-
" def go_to_hashtag(self):\n",
|
| 699 |
-
" if (\n",
|
| 700 |
-
" self.scraper_details[\"hashtag\"] is None\n",
|
| 701 |
-
" or self.scraper_details[\"hashtag\"] == \"\"\n",
|
| 702 |
-
" ):\n",
|
| 703 |
-
" print(\"Hashtag is not set.\")\n",
|
| 704 |
-
" sys.exit(1)\n",
|
| 705 |
-
" else:\n",
|
| 706 |
-
" url = f\"https://twitter.com/hashtag/{self.scraper_details['hashtag']}?src=hashtag_click\"\n",
|
| 707 |
-
" if self.scraper_details[\"tab\"] == \"Latest\":\n",
|
| 708 |
-
" url += \"&f=live\"\n",
|
| 709 |
-
"\n",
|
| 710 |
-
" self.driver.get(url)\n",
|
| 711 |
-
" sleep(3)\n",
|
| 712 |
-
" pass\n",
|
| 713 |
-
"\n",
|
| 714 |
-
" def go_to_search(self):\n",
|
| 715 |
-
" if self.scraper_details[\"query\"] is None or self.scraper_details[\"query\"] == \"\":\n",
|
| 716 |
-
" print(\"Query is not set.\")\n",
|
| 717 |
-
" sys.exit(1)\n",
|
| 718 |
-
" else:\n",
|
| 719 |
-
" url = f\"https://twitter.com/search?q={self.scraper_details['query']}&src=typed_query\"\n",
|
| 720 |
-
" if self.scraper_details[\"tab\"] == \"Latest\":\n",
|
| 721 |
-
" url += \"&f=live\"\n",
|
| 722 |
-
"\n",
|
| 723 |
-
" self.driver.get(url)\n",
|
| 724 |
-
" sleep(3)\n",
|
| 725 |
-
" pass\n",
|
| 726 |
-
"\n",
|
| 727 |
-
" def get_tweet_cards(self):\n",
|
| 728 |
-
" self.tweet_cards = self.driver.find_elements(\n",
|
| 729 |
-
" \"xpath\", '//article[@data-testid=\"tweet\" and not(@disabled)]'\n",
|
| 730 |
-
" )\n",
|
| 731 |
-
" pass\n",
|
| 732 |
-
"\n",
|
| 733 |
-
" def remove_hidden_cards(self):\n",
|
| 734 |
-
" try:\n",
|
| 735 |
-
" hidden_cards = self.driver.find_elements(\n",
|
| 736 |
-
" \"xpath\", '//article[@data-testid=\"tweet\" and @disabled]'\n",
|
| 737 |
-
" )\n",
|
| 738 |
-
"\n",
|
| 739 |
-
" for card in hidden_cards[1:-2]:\n",
|
| 740 |
-
" self.driver.execute_script(\n",
|
| 741 |
-
" \"arguments[0].parentNode.parentNode.parentNode.remove();\", card\n",
|
| 742 |
-
" )\n",
|
| 743 |
-
" except Exception as e:\n",
|
| 744 |
-
" return\n",
|
| 745 |
-
" pass\n",
|
| 746 |
-
"\n",
|
| 747 |
-
" def scrape_tweets(\n",
|
| 748 |
-
" self,\n",
|
| 749 |
-
" max_tweets=50,\n",
|
| 750 |
-
" scrape_username=None,\n",
|
| 751 |
-
" scrape_hashtag=None,\n",
|
| 752 |
-
" scrape_query=None,\n",
|
| 753 |
-
" scrape_latest=True,\n",
|
| 754 |
-
" scrape_top=False,\n",
|
| 755 |
-
" scrape_poster_details=False,\n",
|
| 756 |
-
" router=None,\n",
|
| 757 |
-
" ):\n",
|
| 758 |
-
" self._config_scraper(\n",
|
| 759 |
-
" max_tweets,\n",
|
| 760 |
-
" scrape_username,\n",
|
| 761 |
-
" scrape_hashtag,\n",
|
| 762 |
-
" scrape_query,\n",
|
| 763 |
-
" scrape_latest,\n",
|
| 764 |
-
" scrape_top,\n",
|
| 765 |
-
" scrape_poster_details,\n",
|
| 766 |
-
" )\n",
|
| 767 |
-
"\n",
|
| 768 |
-
" if router is None:\n",
|
| 769 |
-
" router = self.router\n",
|
| 770 |
-
"\n",
|
| 771 |
-
" router()\n",
|
| 772 |
-
"\n",
|
| 773 |
-
" if self.scraper_details[\"type\"] == \"Username\":\n",
|
| 774 |
-
" print(\n",
|
| 775 |
-
" \"Scraping Tweets from @{}...\".format(self.scraper_details[\"username\"])\n",
|
| 776 |
-
" )\n",
|
| 777 |
-
" elif self.scraper_details[\"type\"] == \"Hashtag\":\n",
|
| 778 |
-
" print(\n",
|
| 779 |
-
" \"Scraping {} Tweets from #{}...\".format(\n",
|
| 780 |
-
" self.scraper_details[\"tab\"], self.scraper_details[\"hashtag\"]\n",
|
| 781 |
-
" )\n",
|
| 782 |
-
" )\n",
|
| 783 |
-
" elif self.scraper_details[\"type\"] == \"Query\":\n",
|
| 784 |
-
" print(\n",
|
| 785 |
-
" \"Scraping {} Tweets from {} search...\".format(\n",
|
| 786 |
-
" self.scraper_details[\"tab\"], self.scraper_details[\"query\"]\n",
|
| 787 |
-
" )\n",
|
| 788 |
-
" )\n",
|
| 789 |
-
" elif self.scraper_details[\"type\"] == \"Home\":\n",
|
| 790 |
-
" print(\"Scraping Tweets from Home...\")\n",
|
| 791 |
-
"\n",
|
| 792 |
-
" self.progress.print_progress(0)\n",
|
| 793 |
-
"\n",
|
| 794 |
-
" refresh_count = 0\n",
|
| 795 |
-
" added_tweets = 0\n",
|
| 796 |
-
" empty_count = 0\n",
|
| 797 |
-
"\n",
|
| 798 |
-
" while self.scroller.scrolling:\n",
|
| 799 |
-
" try:\n",
|
| 800 |
-
" self.get_tweet_cards()\n",
|
| 801 |
-
" added_tweets = 0\n",
|
| 802 |
-
"\n",
|
| 803 |
-
" for card in self.tweet_cards[-15:]:\n",
|
| 804 |
-
" try:\n",
|
| 805 |
-
" tweet_id = str(card)\n",
|
| 806 |
-
"\n",
|
| 807 |
-
" if tweet_id not in self.tweet_ids:\n",
|
| 808 |
-
" self.tweet_ids.add(tweet_id)\n",
|
| 809 |
-
"\n",
|
| 810 |
-
" if not self.scraper_details[\"poster_details\"]:\n",
|
| 811 |
-
" self.driver.execute_script(\n",
|
| 812 |
-
" \"arguments[0].scrollIntoView();\", card\n",
|
| 813 |
-
" )\n",
|
| 814 |
-
"\n",
|
| 815 |
-
" tweet = Tweet(\n",
|
| 816 |
-
" card=card,\n",
|
| 817 |
-
" driver=self.driver,\n",
|
| 818 |
-
" actions=self.actions,\n",
|
| 819 |
-
" scrape_poster_details=self.scraper_details[\n",
|
| 820 |
-
" \"poster_details\"\n",
|
| 821 |
-
" ],\n",
|
| 822 |
-
" )\n",
|
| 823 |
-
"\n",
|
| 824 |
-
" if tweet:\n",
|
| 825 |
-
" if not tweet.error and tweet.tweet is not None:\n",
|
| 826 |
-
" if not tweet.is_ad:\n",
|
| 827 |
-
" self.data.append(tweet.tweet)\n",
|
| 828 |
-
" added_tweets += 1\n",
|
| 829 |
-
" self.progress.print_progress(len(self.data))\n",
|
| 830 |
-
"\n",
|
| 831 |
-
" if len(self.data) >= self.max_tweets:\n",
|
| 832 |
-
" self.scroller.scrolling = False\n",
|
| 833 |
-
" break\n",
|
| 834 |
-
" else:\n",
|
| 835 |
-
" continue\n",
|
| 836 |
-
" else:\n",
|
| 837 |
-
" continue\n",
|
| 838 |
-
" else:\n",
|
| 839 |
-
" continue\n",
|
| 840 |
-
" else:\n",
|
| 841 |
-
" continue\n",
|
| 842 |
-
" except NoSuchElementException:\n",
|
| 843 |
-
" continue\n",
|
| 844 |
-
"\n",
|
| 845 |
-
" if len(self.data) >= self.max_tweets:\n",
|
| 846 |
-
" break\n",
|
| 847 |
-
"\n",
|
| 848 |
-
" if added_tweets == 0:\n",
|
| 849 |
-
" if empty_count >= 5:\n",
|
| 850 |
-
" if refresh_count >= 3:\n",
|
| 851 |
-
" print()\n",
|
| 852 |
-
" print(\"No more tweets to scrape\")\n",
|
| 853 |
-
" break\n",
|
| 854 |
-
" refresh_count += 1\n",
|
| 855 |
-
" empty_count += 1\n",
|
| 856 |
-
" sleep(1)\n",
|
| 857 |
-
" else:\n",
|
| 858 |
-
" empty_count = 0\n",
|
| 859 |
-
" refresh_count = 0\n",
|
| 860 |
-
" except StaleElementReferenceException:\n",
|
| 861 |
-
" sleep(2)\n",
|
| 862 |
-
" continue\n",
|
| 863 |
-
" except KeyboardInterrupt:\n",
|
| 864 |
-
" print(\"\\n\")\n",
|
| 865 |
-
" print(\"Keyboard Interrupt\")\n",
|
| 866 |
-
" self.interrupted = True\n",
|
| 867 |
-
" break\n",
|
| 868 |
-
" except Exception as e:\n",
|
| 869 |
-
" print(\"\\n\")\n",
|
| 870 |
-
" print(f\"Error scraping tweets: {e}\")\n",
|
| 871 |
-
" break\n",
|
| 872 |
-
"\n",
|
| 873 |
-
" print(\"\")\n",
|
| 874 |
-
"\n",
|
| 875 |
-
" if len(self.data) >= self.max_tweets:\n",
|
| 876 |
-
" print(\"Scraping Complete\")\n",
|
| 877 |
-
" else:\n",
|
| 878 |
-
" print(\"Scraping Incomplete\")\n",
|
| 879 |
-
"\n",
|
| 880 |
-
" print(\"Tweets: {} out of {}\\n\".format(len(self.data), self.max_tweets))\n",
|
| 881 |
-
"\n",
|
| 882 |
-
" pass\n",
|
| 883 |
-
"\n",
|
| 884 |
-
" def save_to_csv(self):\n",
|
| 885 |
-
" print(\"Saving Tweets to CSV...\")\n",
|
| 886 |
-
" now = datetime.now()\n",
|
| 887 |
-
" folder_path = \"./tweets/\"\n",
|
| 888 |
-
"\n",
|
| 889 |
-
" if not os.path.exists(folder_path):\n",
|
| 890 |
-
" os.makedirs(folder_path)\n",
|
| 891 |
-
" print(\"Created Folder: {}\".format(folder_path))\n",
|
| 892 |
-
"\n",
|
| 893 |
-
" data = {\n",
|
| 894 |
-
" \"Name\": [tweet[0] for tweet in self.data],\n",
|
| 895 |
-
" \"Handle\": [tweet[1] for tweet in self.data],\n",
|
| 896 |
-
" \"Timestamp\": [tweet[2] for tweet in self.data],\n",
|
| 897 |
-
" \"Verified\": [tweet[3] for tweet in self.data],\n",
|
| 898 |
-
" \"Content\": [tweet[4] for tweet in self.data],\n",
|
| 899 |
-
" \"Comments\": [tweet[5] for tweet in self.data],\n",
|
| 900 |
-
" \"Retweets\": [tweet[6] for tweet in self.data],\n",
|
| 901 |
-
" \"Likes\": [tweet[7] for tweet in self.data],\n",
|
| 902 |
-
" \"Analytics\": [tweet[8] for tweet in self.data],\n",
|
| 903 |
-
" \"Tags\": [tweet[9] for tweet in self.data],\n",
|
| 904 |
-
" \"Mentions\": [tweet[10] for tweet in self.data],\n",
|
| 905 |
-
" \"Emojis\": [tweet[11] for tweet in self.data],\n",
|
| 906 |
-
" \"Profile Image\": [tweet[12] for tweet in self.data],\n",
|
| 907 |
-
" \"Tweet Link\": [tweet[13] for tweet in self.data],\n",
|
| 908 |
-
" \"Tweet ID\": [f'tweet_id:{tweet[14]}' for tweet in self.data],\n",
|
| 909 |
-
" }\n",
|
| 910 |
-
"\n",
|
| 911 |
-
" if self.scraper_details[\"poster_details\"]:\n",
|
| 912 |
-
" data[\"Tweeter ID\"] = [f'user_id:{tweet[15]}' for tweet in self.data]\n",
|
| 913 |
-
" data[\"Following\"] = [tweet[16] for tweet in self.data]\n",
|
| 914 |
-
" data[\"Followers\"] = [tweet[17] for tweet in self.data]\n",
|
| 915 |
-
"\n",
|
| 916 |
-
" df = pd.DataFrame(data)\n",
|
| 917 |
-
"\n",
|
| 918 |
-
" current_time = now.strftime(\"%Y-%m-%d_%H-%M-%S\")\n",
|
| 919 |
-
" file_path = f\"{folder_path}{current_time}_tweets_1-{len(self.data)}.csv\"\n",
|
| 920 |
-
" pd.set_option(\"display.max_colwidth\", None)\n",
|
| 921 |
-
" df.to_csv(file_path, index=False, encoding=\"utf-8\")\n",
|
| 922 |
-
"\n",
|
| 923 |
-
" print(\"CSV Saved: {}\".format(file_path))\n",
|
| 924 |
-
"\n",
|
| 925 |
-
" pass\n",
|
| 926 |
-
"\n",
|
| 927 |
-
" def get_tweets(self):\n",
|
| 928 |
-
" return self.data"
|
| 929 |
-
]
|
| 930 |
-
},
|
| 931 |
-
{
|
| 932 |
-
"attachments": {},
|
| 933 |
-
"cell_type": "markdown",
|
| 934 |
-
"metadata": {},
|
| 935 |
-
"source": [
|
| 936 |
-
"# Create a new instance of the Twitter Scraper class"
|
| 937 |
-
]
|
| 938 |
-
},
|
| 939 |
-
{
|
| 940 |
-
"cell_type": "code",
|
| 941 |
-
"execution_count": null,
|
| 942 |
-
"metadata": {},
|
| 943 |
-
"outputs": [],
|
| 944 |
-
"source": [
|
| 945 |
-
"USER_UNAME = os.environ['TWITTER_USERNAME']\n",
|
| 946 |
-
"USER_PASSWORD = os.environ['TWITTER_PASSWORD']\n",
|
| 947 |
-
"\n",
|
| 948 |
-
"scraper = Twitter_Scraper(\n",
|
| 949 |
-
" username=USER_UNAME,\n",
|
| 950 |
-
" password=USER_PASSWORD,\n",
|
| 951 |
-
" # max_tweets=10,\n",
|
| 952 |
-
" # scrape_username=\"something\",\n",
|
| 953 |
-
" # scrape_hashtag=\"something\",\n",
|
| 954 |
-
" # scrape_query=\"something\",\n",
|
| 955 |
-
" # scrape_latest=False,\n",
|
| 956 |
-
" # scrape_top=True,\n",
|
| 957 |
-
" # scrape_poster_details=True\n",
|
| 958 |
-
")"
|
| 959 |
-
]
|
| 960 |
-
},
|
| 961 |
-
{
|
| 962 |
-
"cell_type": "code",
|
| 963 |
-
"execution_count": null,
|
| 964 |
-
"metadata": {},
|
| 965 |
-
"outputs": [],
|
| 966 |
-
"source": [
|
| 967 |
-
"scraper.login()"
|
| 968 |
-
]
|
| 969 |
-
},
|
| 970 |
-
{
|
| 971 |
-
"attachments": {},
|
| 972 |
-
"cell_type": "markdown",
|
| 973 |
-
"metadata": {},
|
| 974 |
-
"source": [
|
| 975 |
-
"# Run Twitter Scraper"
|
| 976 |
-
]
|
| 977 |
-
},
|
| 978 |
-
{
|
| 979 |
-
"cell_type": "code",
|
| 980 |
-
"execution_count": null,
|
| 981 |
-
"metadata": {},
|
| 982 |
-
"outputs": [],
|
| 983 |
-
"source": [
|
| 984 |
-
"scraper.scrape_tweets(\n",
|
| 985 |
-
" # max_tweets=100,\n",
|
| 986 |
-
" # scrape_username=\"something\",\n",
|
| 987 |
-
" # scrape_hashtag=\"something\",\n",
|
| 988 |
-
" # scrape_query=\"something\",\n",
|
| 989 |
-
" # scrape_latest=False,\n",
|
| 990 |
-
" # scrape_top=True,\n",
|
| 991 |
-
" # scrape_poster_details=True,\n",
|
| 992 |
-
")"
|
| 993 |
-
]
|
| 994 |
-
},
|
| 995 |
-
{
|
| 996 |
-
"attachments": {},
|
| 997 |
-
"cell_type": "markdown",
|
| 998 |
-
"metadata": {},
|
| 999 |
-
"source": [
|
| 1000 |
-
"# Save Scraped Tweets in a CSV"
|
| 1001 |
-
]
|
| 1002 |
-
},
|
| 1003 |
-
{
|
| 1004 |
-
"cell_type": "code",
|
| 1005 |
-
"execution_count": null,
|
| 1006 |
-
"metadata": {},
|
| 1007 |
-
"outputs": [],
|
| 1008 |
-
"source": [
|
| 1009 |
-
"scraper.save_to_csv()"
|
| 1010 |
-
]
|
| 1011 |
-
},
|
| 1012 |
-
{
|
| 1013 |
-
"cell_type": "code",
|
| 1014 |
-
"execution_count": null,
|
| 1015 |
-
"metadata": {},
|
| 1016 |
-
"outputs": [],
|
| 1017 |
-
"source": [
|
| 1018 |
-
"scraper.driver.close()"
|
| 1019 |
-
]
|
| 1020 |
-
}
|
| 1021 |
-
],
|
| 1022 |
-
"metadata": {
|
| 1023 |
-
"kernelspec": {
|
| 1024 |
-
"display_name": "ml",
|
| 1025 |
-
"language": "python",
|
| 1026 |
-
"name": "python3"
|
| 1027 |
-
},
|
| 1028 |
-
"language_info": {
|
| 1029 |
-
"codemirror_mode": {
|
| 1030 |
-
"name": "ipython",
|
| 1031 |
-
"version": 3
|
| 1032 |
-
},
|
| 1033 |
-
"file_extension": ".py",
|
| 1034 |
-
"mimetype": "text/x-python",
|
| 1035 |
-
"name": "python",
|
| 1036 |
-
"nbconvert_exporter": "python",
|
| 1037 |
-
"pygments_lexer": "ipython3",
|
| 1038 |
-
"version": "3.11.5"
|
| 1039 |
-
},
|
| 1040 |
-
"orig_nbformat": 4
|
| 1041 |
-
},
|
| 1042 |
-
"nbformat": 4,
|
| 1043 |
-
"nbformat_minor": 2
|
| 1044 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|