Adive01 commited on
Commit
9a8cb16
·
verified ·
1 Parent(s): 032a4fa

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +75 -10
README.md CHANGED
@@ -1,10 +1,75 @@
1
- ---
2
- title: SummaryGenerator
3
- emoji: 👁
4
- colorFrom: yellow
5
- colorTo: red
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Text Summarization Tool
2
+
3
+ This repo contains an end-to-end abstractive summarization project built around Hugging Face Transformers, the XSum dataset, and a Gradio demo app.
4
+
5
+ ## Project Layout
6
+
7
+ ```text
8
+ requirements.txt
9
+ mlplo/
10
+ app.py # Gradio UI for inference (single + batch mode)
11
+ common.py # Shared utilities
12
+ compare.py # Compare two models side-by-side
13
+ data_cleaning.py # Dataset preparation
14
+ eval.py # Standalone evaluation (ROUGE + BERTScore)
15
+ report.py # HTML Evaluation Report generator
16
+ train.py # Fine-tuning loop
17
+ tests/ # Pytest suite
18
+ ```
19
+
20
+ ## Quick Start
21
+
22
+ 1. Create and activate a virtual environment.
23
+ 2. Install dependencies:
24
+
25
+ ```bash
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ 3. Prepare a small debug dataset first:
30
+
31
+ ```bash
32
+ python -m mlplo.data_cleaning --debug --output-dir mlplo/data/processed/xsum_debug
33
+ ```
34
+
35
+ 4. Run a smoke-test training job:
36
+
37
+ ```bash
38
+ python -m mlplo.train --dataset-dir mlplo/data/processed/xsum_debug --output-dir mlplo/checkpoints/bart-base-xsum-debug --num-train-epochs 1 --per-device-train-batch-size 2 --per-device-eval-batch-size 2 --gradient-accumulation-steps 2 --run-test-eval
39
+ ```
40
+
41
+ 5. Evaluate the trained checkpoint:
42
+
43
+ ```bash
44
+ python -m mlplo.eval --dataset-dir mlplo/data/processed/xsum_debug --model-path mlplo/checkpoints/bart-base-xsum-debug --include-bertscore
45
+ ```
46
+
47
+ 6. Generate an Evaluation Report:
48
+
49
+ ```bash
50
+ python -m mlplo.report --checkpoint-dir mlplo/checkpoints/bart-base-xsum-debug
51
+ ```
52
+
53
+ 7. Launch the Gradio app:
54
+
55
+ ```bash
56
+ python -m mlplo.app --model-path mlplo/checkpoints/bart-base-xsum-debug
57
+ ```
58
+
59
+ ## Running Tests
60
+
61
+ To run the full test suite for edge cases:
62
+ ```bash
63
+ python -m pytest tests/ -v
64
+ ```
65
+
66
+ ## Colab Portability
67
+
68
+ The scripts are path-based and CLI-driven, so the same commands work in Google Colab after cloning the repo and installing `requirements.txt`. If you want a faster first pass, keep using `--debug` or provide `--train-samples`, `--validation-samples`, and `--test-samples`.
69
+
70
+ ## Notes
71
+
72
+ - Training defaults to `facebook/bart-base` for fine-tuning.
73
+ - The Gradio app falls back to `facebook/bart-large-xsum` if no local checkpoint is supplied, which makes the UI useful before fine-tuning finishes.
74
+ - Mixed precision is enabled automatically when CUDA is available.
75
+ - BERTScore is excluded from the training loop (to keep it fast) and is opt-in for evaluation using the `--include-bertscore` flag.