File size: 3,348 Bytes
dd99def
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# Indian News Scraper

A collection of web scrapers for various Indian news websites that can extract articles based on specific topics.

## Features

- Scrapes articles from major Indian news sources:
  - Times of India (TOI)
  - NDTV
  - WION
  - Scroll.in
- Command-line interface for easy use
- Multithreaded scraping for fast performance
- Automatic progress saving to prevent data loss
- CSV output format for easy analysis

## Requirements

- Python 3.7+
- Chrome browser
- ChromeDriver (compatible with your Chrome version)

## Installation

1. Clone this repository:
   ```bash
   git clone https://github.com/yourusername/indian-news-scraper.git
   cd indian-news-scraper
   ```

2. Install the required dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. Make sure you have Chrome and ChromeDriver installed:
   - Install Chrome: [https://www.google.com/chrome/](https://www.google.com/chrome/)
   - Download ChromeDriver: [https://chromedriver.chromium.org/downloads](https://chromedriver.chromium.org/downloads)
   - Make sure ChromeDriver is in your PATH

## Usage

Run the main script with the desired news source and topic:

```bash
python run_scraper.py --source toi --topic "Climate Change"
```

### Available News Sources

- `toi` - Times of India
- `ndtv` - NDTV
- `wion` - WION News
- `scroll` - Scroll.in

### Command Line Options

```
usage: run_scraper.py [-h] --source {toi,ndtv,wion,scroll} --topic TOPIC [--workers WORKERS] [--interval INTERVAL]

Scrape news articles from Indian news websites

optional arguments:
  -h, --help            show this help message and exit
  --source {toi,ndtv,wion,scroll}, -s {toi,ndtv,wion,scroll}
                        News source to scrape from
  --topic TOPIC, -t TOPIC
                        Topic to search for (e.g., "Climate Change", "Politics")
  --workers WORKERS, -w WORKERS
                        Number of worker threads (default: 4)
  --interval INTERVAL, -i INTERVAL
                        Auto-save interval in seconds (default: 300)
```

### Examples

Scrape articles about "COVID" from Times of India:
```bash
python run_scraper.py --source toi --topic COVID
```

Scrape articles about "Elections" from NDTV with 8 worker threads:
```bash
python run_scraper.py --source ndtv --topic Elections --workers 8
```

Scrape articles about "Climate Change" from Scroll.in with auto-save every minute:
```bash
python run_scraper.py --source scroll --topic "Climate Change" --interval 60
```

## Output

The scraped articles are saved in CSV format in the `output` directory with filenames in the following format:
```
{source}_{topic}articles_{timestamp}_{status}.csv
```

For example:
```
output/toi_COVIDarticles_20250407_121530_final.csv
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Disclaimer

This tool is meant for research and educational purposes only. Please respect the terms of service of the websites you scrape and use the data responsibly.