| ```markdown |
| # Nepal Kanoon Patrika Scraper |
| |
| This is a web application deployed on Hugging Face Spaces (Free Tier) that scrapes legal case data from the Nepal Kanoon Patrika website (https://nkp.gov.np/). It allows users to select a case type (mudda type) and a Nepali year to scrape legal case details, which are stored in a SQLite database and associated HTML files are saved in a folder. |
| |
| ## Features |
| - Scrapes legal case details including decision number, court, judges, parties, and more. |
| - Stores data in a SQLite database (`legal_cases.db`). |
| - Saves raw HTML files in the `scraped_html` folder for future reference. |
| - Uses existing HTML files when available to reduce redundant web requests. |
| - Provides a user-friendly Gradio interface for initiating scraping tasks. |
| |
| ## Usage Instructions |
| 1. Open the Gradio interface in your browser. |
| 2. Select a **Mudda Type** from the dropdown menu. Options include: |
| - दुनियाबादी देवानी |
| - सरकारबादी देवानी |
| - दुनियावादी फौजदारी |
| - सरकारवादी फौजदारी |
| - रिट |
| - निवेदन |
| - विविध |
| 3. Enter a **Nepali Year** (e.g., २०७३) in the textbox. |
| 4. Click the **Run Scraper** button to start scraping. |
| 5. Monitor the progress and results in the status output box. |
| |
| ## Technical Details |
| - **Backend**: The scraping logic is implemented in `Kanun_Patrika_Scraper_For_HFSpaces.py`, which handles web requests, HTML parsing, and data storage. |
| - **Storage**: Scraped data is stored in a SQLite database (`legal_cases.db`) to keep file sizes manageable within Hugging Face Spaces' free tier storage limits. |
| - **HTML Files**: Raw HTML content is saved in the `scraped_html` folder for reuse, reducing the need for repeated web requests. |
| - **Dependencies**: Listed in `requirements.txt`, including `requests`, `beautifulsoup4`, `pandas`, `nepali-datetime`, and `gradio`. |
| - **Environment**: Designed to run on Hugging Face Spaces (Free Tier) with CPU-only requirements. |
| |
| ## Notes |
| - The database and HTML files are stored persistently in the Hugging Face Spaces environment. |
| - The application is modular, allowing updates to the backend script (`Kanun_Patrika_Scraper_For_HFSpaces.py`) without modifying the Gradio interface (`app.py`). |
| - Ensure the Nepali year entered is valid (e.g., between 2015 and the current year in Nepali calendar) to avoid errors. |
| |
| For issues or contributions, please contact the repository maintainer. |
| ``` |