rbbist's picture
Upload 4 files
39044e6 verified
```markdown
# Nepal Kanoon Patrika Scraper
This is a web application deployed on Hugging Face Spaces (Free Tier) that scrapes legal case data from the Nepal Kanoon Patrika website (https://nkp.gov.np/). It allows users to select a case type (mudda type) and a Nepali year to scrape legal case details, which are stored in a SQLite database and associated HTML files are saved in a folder.
## Features
- Scrapes legal case details including decision number, court, judges, parties, and more.
- Stores data in a SQLite database (`legal_cases.db`).
- Saves raw HTML files in the `scraped_html` folder for future reference.
- Uses existing HTML files when available to reduce redundant web requests.
- Provides a user-friendly Gradio interface for initiating scraping tasks.
## Usage Instructions
1. Open the Gradio interface in your browser.
2. Select a **Mudda Type** from the dropdown menu. Options include:
- दुनियाबादी देवानी
- सरकारबादी देवानी
- दुनियावादी फौजदारी
- सरकारवादी फौजदारी
- रिट
- निवेदन
- विविध
3. Enter a **Nepali Year** (e.g., २०७३) in the textbox.
4. Click the **Run Scraper** button to start scraping.
5. Monitor the progress and results in the status output box.
## Technical Details
- **Backend**: The scraping logic is implemented in `Kanun_Patrika_Scraper_For_HFSpaces.py`, which handles web requests, HTML parsing, and data storage.
- **Storage**: Scraped data is stored in a SQLite database (`legal_cases.db`) to keep file sizes manageable within Hugging Face Spaces' free tier storage limits.
- **HTML Files**: Raw HTML content is saved in the `scraped_html` folder for reuse, reducing the need for repeated web requests.
- **Dependencies**: Listed in `requirements.txt`, including `requests`, `beautifulsoup4`, `pandas`, `nepali-datetime`, and `gradio`.
- **Environment**: Designed to run on Hugging Face Spaces (Free Tier) with CPU-only requirements.
## Notes
- The database and HTML files are stored persistently in the Hugging Face Spaces environment.
- The application is modular, allowing updates to the backend script (`Kanun_Patrika_Scraper_For_HFSpaces.py`) without modifying the Gradio interface (`app.py`).
- Ensure the Nepali year entered is valid (e.g., between 2015 and the current year in Nepali calendar) to avoid errors.
For issues or contributions, please contact the repository maintainer.
```