Spaces:
Sleeping
Sleeping
Upload 4 files
Browse files- Kanun_Patrika_Scrapper_For_HFSpaces.py +0 -0
- README.markdown +40 -0
- app.py +63 -0
- requirements.txt +7 -0
Kanun_Patrika_Scrapper_For_HFSpaces.py
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.markdown
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```markdown
|
| 2 |
+
# Nepal Kanoon Patrika Scraper
|
| 3 |
+
|
| 4 |
+
This is a web application deployed on Hugging Face Spaces (Free Tier) that scrapes legal case data from the Nepal Kanoon Patrika website (https://nkp.gov.np/). It allows users to select a case type (mudda type) and a Nepali year to scrape legal case details, which are stored in a SQLite database and associated HTML files are saved in a folder.
|
| 5 |
+
|
| 6 |
+
## Features
|
| 7 |
+
- Scrapes legal case details including decision number, court, judges, parties, and more.
|
| 8 |
+
- Stores data in a SQLite database (`legal_cases.db`).
|
| 9 |
+
- Saves raw HTML files in the `scraped_html` folder for future reference.
|
| 10 |
+
- Uses existing HTML files when available to reduce redundant web requests.
|
| 11 |
+
- Provides a user-friendly Gradio interface for initiating scraping tasks.
|
| 12 |
+
|
| 13 |
+
## Usage Instructions
|
| 14 |
+
1. Open the Gradio interface in your browser.
|
| 15 |
+
2. Select a **Mudda Type** from the dropdown menu. Options include:
|
| 16 |
+
- दुनियाबादी देवानी
|
| 17 |
+
- सरकारबादी देवानी
|
| 18 |
+
- दुनियावादी फौजदारी
|
| 19 |
+
- सरकारवादी फौजदारी
|
| 20 |
+
- रिट
|
| 21 |
+
- निवेदन
|
| 22 |
+
- विविध
|
| 23 |
+
3. Enter a **Nepali Year** (e.g., २०७३) in the textbox.
|
| 24 |
+
4. Click the **Run Scraper** button to start scraping.
|
| 25 |
+
5. Monitor the progress and results in the status output box.
|
| 26 |
+
|
| 27 |
+
## Technical Details
|
| 28 |
+
- **Backend**: The scraping logic is implemented in `Kanun_Patrika_Scraper_For_HFSpaces.py`, which handles web requests, HTML parsing, and data storage.
|
| 29 |
+
- **Storage**: Scraped data is stored in a SQLite database (`legal_cases.db`) to keep file sizes manageable within Hugging Face Spaces' free tier storage limits.
|
| 30 |
+
- **HTML Files**: Raw HTML content is saved in the `scraped_html` folder for reuse, reducing the need for repeated web requests.
|
| 31 |
+
- **Dependencies**: Listed in `requirements.txt`, including `requests`, `beautifulsoup4`, `pandas`, `nepali-datetime`, and `gradio`.
|
| 32 |
+
- **Environment**: Designed to run on Hugging Face Spaces (Free Tier) with CPU-only requirements.
|
| 33 |
+
|
| 34 |
+
## Notes
|
| 35 |
+
- The database and HTML files are stored persistently in the Hugging Face Spaces environment.
|
| 36 |
+
- The application is modular, allowing updates to the backend script (`Kanun_Patrika_Scraper_For_HFSpaces.py`) without modifying the Gradio interface (`app.py`).
|
| 37 |
+
- Ensure the Nepali year entered is valid (e.g., between 2015 and the current year in Nepali calendar) to avoid errors.
|
| 38 |
+
|
| 39 |
+
For issues or contributions, please contact the repository maintainer.
|
| 40 |
+
```
|
app.py
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```python
|
| 2 |
+
import gradio as gr
|
| 3 |
+
from Kanun_Patrika_Scraper_For_HFSpaces import LegalCaseScraper
|
| 4 |
+
|
| 5 |
+
def run_scraper(mudda_type, nepali_year, progress=gr.Progress()):
|
| 6 |
+
"""
|
| 7 |
+
Run the scraper with the given inputs and update progress.
|
| 8 |
+
Returns a message indicating success or failure.
|
| 9 |
+
"""
|
| 10 |
+
try:
|
| 11 |
+
# Initialize scraper
|
| 12 |
+
scraper = LegalCaseScraper(output_db="legal_cases.db", html_folder="scraped_html")
|
| 13 |
+
|
| 14 |
+
# Validate inputs
|
| 15 |
+
if not mudda_type or not nepali_year:
|
| 16 |
+
return "Error: Please select a mudda type and enter a Nepali year."
|
| 17 |
+
|
| 18 |
+
# Run scraper
|
| 19 |
+
progress(0.1, desc="Starting scraper...")
|
| 20 |
+
scraper.run_scraper(mudda_type=mudda_type, sal=nepali_year, use_saved=True)
|
| 21 |
+
|
| 22 |
+
progress(1.0, desc="Scraping completed!")
|
| 23 |
+
return f"Scraping completed for mudda_type: {mudda_type}, year: {nepali_year}. Data saved to SQLite database."
|
| 24 |
+
|
| 25 |
+
except Exception as e:
|
| 26 |
+
return f"Error: {str(e)}"
|
| 27 |
+
|
| 28 |
+
finally:
|
| 29 |
+
scraper.close()
|
| 30 |
+
|
| 31 |
+
# Define Gradio interface using Blocks
|
| 32 |
+
with gr.Blocks(title="Nepal Kanoon Patrika Scraper") as demo:
|
| 33 |
+
gr.Markdown("# Nepal Kanoon Patrika Scraper")
|
| 34 |
+
gr.Markdown("Scrape legal case data from Nepal Kanoon Patrika website. Select a mudda type and enter a Nepali year to begin.")
|
| 35 |
+
|
| 36 |
+
with gr.Row():
|
| 37 |
+
mudda_type = gr.Dropdown(
|
| 38 |
+
choices=[
|
| 39 |
+
"दुनियाबादी देवानी",
|
| 40 |
+
"सरकारबादी देवानी",
|
| 41 |
+
"दुनियावादी फौजदारी",
|
| 42 |
+
"सरकारवादी फौजदारी",
|
| 43 |
+
"रिट",
|
| 44 |
+
"निवेदन",
|
| 45 |
+
"विविध"
|
| 46 |
+
],
|
| 47 |
+
label="Mudda Type",
|
| 48 |
+
info="Select the type of legal case"
|
| 49 |
+
)
|
| 50 |
+
nepali_year = gr.Textbox(label="Nepali Year", placeholder="e.g., २०७३", max_lines=1)
|
| 51 |
+
|
| 52 |
+
run_button = gr.Button("Run Scraper")
|
| 53 |
+
output = gr.Textbox(label="Status", interactive=False)
|
| 54 |
+
|
| 55 |
+
run_button.click(
|
| 56 |
+
fn=run_scraper,
|
| 57 |
+
inputs=[mudda_type, nepali_year],
|
| 58 |
+
outputs=output
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
# Launch the interface
|
| 62 |
+
demo.launch()
|
| 63 |
+
```
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```text
|
| 2 |
+
requests==2.32.3
|
| 3 |
+
beautifulsoup4==4.12.3
|
| 4 |
+
pandas==2.2.3
|
| 5 |
+
nepali-datetime==1.0.2
|
| 6 |
+
gradio==4.44.0
|
| 7 |
+
```
|