rbbist commited on
Commit
39044e6
·
verified ·
1 Parent(s): c686250

Upload 4 files

Browse files
Kanun_Patrika_Scrapper_For_HFSpaces.py ADDED
The diff for this file is too large to render. See raw diff
 
README.markdown ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ # Nepal Kanoon Patrika Scraper
3
+
4
+ This is a web application deployed on Hugging Face Spaces (Free Tier) that scrapes legal case data from the Nepal Kanoon Patrika website (https://nkp.gov.np/). It allows users to select a case type (mudda type) and a Nepali year to scrape legal case details, which are stored in a SQLite database and associated HTML files are saved in a folder.
5
+
6
+ ## Features
7
+ - Scrapes legal case details including decision number, court, judges, parties, and more.
8
+ - Stores data in a SQLite database (`legal_cases.db`).
9
+ - Saves raw HTML files in the `scraped_html` folder for future reference.
10
+ - Uses existing HTML files when available to reduce redundant web requests.
11
+ - Provides a user-friendly Gradio interface for initiating scraping tasks.
12
+
13
+ ## Usage Instructions
14
+ 1. Open the Gradio interface in your browser.
15
+ 2. Select a **Mudda Type** from the dropdown menu. Options include:
16
+ - दुनियाबादी देवानी
17
+ - सरकारबादी देवानी
18
+ - दुनियावादी फौजदारी
19
+ - सरकारवादी फौजदारी
20
+ - रिट
21
+ - निवेदन
22
+ - विविध
23
+ 3. Enter a **Nepali Year** (e.g., २०७३) in the textbox.
24
+ 4. Click the **Run Scraper** button to start scraping.
25
+ 5. Monitor the progress and results in the status output box.
26
+
27
+ ## Technical Details
28
+ - **Backend**: The scraping logic is implemented in `Kanun_Patrika_Scraper_For_HFSpaces.py`, which handles web requests, HTML parsing, and data storage.
29
+ - **Storage**: Scraped data is stored in a SQLite database (`legal_cases.db`) to keep file sizes manageable within Hugging Face Spaces' free tier storage limits.
30
+ - **HTML Files**: Raw HTML content is saved in the `scraped_html` folder for reuse, reducing the need for repeated web requests.
31
+ - **Dependencies**: Listed in `requirements.txt`, including `requests`, `beautifulsoup4`, `pandas`, `nepali-datetime`, and `gradio`.
32
+ - **Environment**: Designed to run on Hugging Face Spaces (Free Tier) with CPU-only requirements.
33
+
34
+ ## Notes
35
+ - The database and HTML files are stored persistently in the Hugging Face Spaces environment.
36
+ - The application is modular, allowing updates to the backend script (`Kanun_Patrika_Scraper_For_HFSpaces.py`) without modifying the Gradio interface (`app.py`).
37
+ - Ensure the Nepali year entered is valid (e.g., between 2015 and the current year in Nepali calendar) to avoid errors.
38
+
39
+ For issues or contributions, please contact the repository maintainer.
40
+ ```
app.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```python
2
+ import gradio as gr
3
+ from Kanun_Patrika_Scraper_For_HFSpaces import LegalCaseScraper
4
+
5
+ def run_scraper(mudda_type, nepali_year, progress=gr.Progress()):
6
+ """
7
+ Run the scraper with the given inputs and update progress.
8
+ Returns a message indicating success or failure.
9
+ """
10
+ try:
11
+ # Initialize scraper
12
+ scraper = LegalCaseScraper(output_db="legal_cases.db", html_folder="scraped_html")
13
+
14
+ # Validate inputs
15
+ if not mudda_type or not nepali_year:
16
+ return "Error: Please select a mudda type and enter a Nepali year."
17
+
18
+ # Run scraper
19
+ progress(0.1, desc="Starting scraper...")
20
+ scraper.run_scraper(mudda_type=mudda_type, sal=nepali_year, use_saved=True)
21
+
22
+ progress(1.0, desc="Scraping completed!")
23
+ return f"Scraping completed for mudda_type: {mudda_type}, year: {nepali_year}. Data saved to SQLite database."
24
+
25
+ except Exception as e:
26
+ return f"Error: {str(e)}"
27
+
28
+ finally:
29
+ scraper.close()
30
+
31
+ # Define Gradio interface using Blocks
32
+ with gr.Blocks(title="Nepal Kanoon Patrika Scraper") as demo:
33
+ gr.Markdown("# Nepal Kanoon Patrika Scraper")
34
+ gr.Markdown("Scrape legal case data from Nepal Kanoon Patrika website. Select a mudda type and enter a Nepali year to begin.")
35
+
36
+ with gr.Row():
37
+ mudda_type = gr.Dropdown(
38
+ choices=[
39
+ "दुनियाबादी देवानी",
40
+ "सरकारबादी देवानी",
41
+ "दुनियावादी फौजदारी",
42
+ "सरकारवादी फौजदारी",
43
+ "रिट",
44
+ "निवेदन",
45
+ "विविध"
46
+ ],
47
+ label="Mudda Type",
48
+ info="Select the type of legal case"
49
+ )
50
+ nepali_year = gr.Textbox(label="Nepali Year", placeholder="e.g., २०७३", max_lines=1)
51
+
52
+ run_button = gr.Button("Run Scraper")
53
+ output = gr.Textbox(label="Status", interactive=False)
54
+
55
+ run_button.click(
56
+ fn=run_scraper,
57
+ inputs=[mudda_type, nepali_year],
58
+ outputs=output
59
+ )
60
+
61
+ # Launch the interface
62
+ demo.launch()
63
+ ```
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ ```text
2
+ requests==2.32.3
3
+ beautifulsoup4==4.12.3
4
+ pandas==2.2.3
5
+ nepali-datetime==1.0.2
6
+ gradio==4.44.0
7
+ ```