Spaces:
Running
Running
Martin L (GitHub Actions) commited on
Commit ·
76cb64f
1
Parent(s): d614eb1
Automated deployment from GitHub (source commit: 95e226)
Browse files
docs/ADRs/001_design_of_huggingface_space_dockerfile.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Design of HuggingFace Space Dockerfile
|
| 2 |
+
|
| 3 |
+
- Status: Ongoing
|
| 4 |
+
- Deciders: Martin Lellep (@PellelNitram)
|
| 5 |
+
- Drivers: Martin Lellep (@PellelNitram)
|
| 6 |
+
- PRD: None
|
| 7 |
+
- Date: 2025-10-04
|
| 8 |
+
|
| 9 |
+
## Context
|
| 10 |
+
|
| 11 |
+
*Explain the background and the context in which the decision is being made. Include any relevant information about the problem, constraints, or goals.*
|
| 12 |
+
|
| 13 |
+
## Decisions
|
| 14 |
+
|
| 15 |
+
*State the decision that has been made. Be clear and concise.*
|
| 16 |
+
|
| 17 |
+
- In the future, download models at build time into the Docker image from Github release page. In the
|
| 18 |
+
very far future, pull them from HuggingFace at run-time.
|
| 19 |
+
- Add `xournalpp` binary to Docker image so that the `xopp` file can be exported as PDF prior to
|
| 20 |
+
execution of the HTR pipeline.
|
| 21 |
+
|
| 22 |
+
## Consequences
|
| 23 |
+
|
| 24 |
+
*Describe the consequences of the decision. Include both positive and negative outcomes, as well as any trade-offs.*
|
| 25 |
+
|
| 26 |
+
## Alternatives Considered
|
| 27 |
+
|
| 28 |
+
*List and briefly describe other options that were considered and why they were not chosen.*
|
| 29 |
+
|
| 30 |
+
## References
|
| 31 |
+
|
| 32 |
+
*Include links or references to any supporting documentation, discussions, or resources.*
|
docs/ADRs/002_use_HuggingFace_ecosystem_for_ML.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ADR 002 – Use Hugging Face Ecosystem for Machine Learning
|
| 2 |
+
|
| 3 |
+
- Date: 2025-11-03
|
| 4 |
+
- Status: Accepted
|
| 5 |
+
- PRD: None
|
| 6 |
+
- Drivers: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
|
| 7 |
+
- Deciders: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
|
| 8 |
+
|
| 9 |
+
## Context
|
| 10 |
+
|
| 11 |
+
The project originally used plain PyTorch with local files for datasets and model storage.
|
| 12 |
+
This made it hard to share and version models; once trained, they essentially lived on a hard drive with no central management or deployment integration and I had to rely on Google Drive and Dropbox links.
|
| 13 |
+
Also, training models on different machines is cumbersome because one needs to manually download the dataset every time.
|
| 14 |
+
|
| 15 |
+
## Decision
|
| 16 |
+
|
| 17 |
+
Adopt the **[Hugging Face ecosystem](https://huggingface.co/)** for all machine learning–related components, including:
|
| 18 |
+
|
| 19 |
+
* Model Hub: hosting and versioning trained models
|
| 20 |
+
* Dataset Hub: storing and sharing datasets
|
| 21 |
+
* `transformers` and `datasets` libraries: for training and data handling
|
| 22 |
+
* Trainer API: for standard training workflows
|
| 23 |
+
|
| 24 |
+
Note: We need to agree on a good naming scheme for storing and retrieving models and datasets efficiently on HuggingFace.
|
| 25 |
+
This will be the subject of a future ADR.
|
| 26 |
+
|
| 27 |
+
## Rationale
|
| 28 |
+
|
| 29 |
+
Hugging Face offers a free, community-maintained platform that is now the industry standard for open ML projects.
|
| 30 |
+
It provides built-in versioning and sharing, making it easy to pull models directly in demos or end-user environments.
|
| 31 |
+
The same applies to retrieving properly versioned datasets for training, benchmarking, and various demo use cases (e.g., providing sample data in Gradio applications).
|
| 32 |
+
|
| 33 |
+
## Consequences
|
| 34 |
+
|
| 35 |
+
### Pros
|
| 36 |
+
|
| 37 |
+
* Centralized and versioned model/dataset hosting
|
| 38 |
+
* Easier sharing, collaboration, and reproducibility
|
| 39 |
+
* Straightforward integration in deployments by letting HuggingFace download the model automatically
|
| 40 |
+
* Large and active community support
|
| 41 |
+
* One can either fully integrate the model by subclassing `PreTrainedModel` or use it as plain artifact storage of
|
| 42 |
+
the binary weights file
|
| 43 |
+
|
| 44 |
+
### Cons
|
| 45 |
+
|
| 46 |
+
* Requires learning new APIs and conventions
|
| 47 |
+
* Custom training routines may need workarounds
|
| 48 |
+
* Invest time to learn how to convert a PyTorch model into a HF model, incl pre and post processing code
|
| 49 |
+
|
| 50 |
+
## Alternatives
|
| 51 |
+
|
| 52 |
+
Continuing with plain PyTorch and local storage would have been simpler but lacked any versioning, reproducibility, or sharing capabilities.
|
docs/ADRs/template.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ADR 002 – Use Hugging Face Ecosystem for Machine Learning
|
| 2 |
+
|
| 3 |
+
- Date: YYYY-MM-DD
|
| 4 |
+
- Status: Accepted or Ongoing or Superseeded by [ADR]()
|
| 5 |
+
- PRD: None
|
| 6 |
+
- Drivers: Name ([Link to Github handle](https://github.com/))
|
| 7 |
+
- Deciders: Name ([Link to Github handle](https://github.com/))
|
| 8 |
+
|
| 9 |
+
## Context
|
| 10 |
+
|
| 11 |
+
*(Add text here.)*
|
| 12 |
+
|
| 13 |
+
## Decision
|
| 14 |
+
|
| 15 |
+
*(Add text here.)*
|
| 16 |
+
|
| 17 |
+
## Rationale
|
| 18 |
+
|
| 19 |
+
*(Add text here.)*
|
| 20 |
+
|
| 21 |
+
## Consequences
|
| 22 |
+
|
| 23 |
+
### Pros
|
| 24 |
+
|
| 25 |
+
*(Add bulletpoints here).*
|
| 26 |
+
|
| 27 |
+
### Cons
|
| 28 |
+
|
| 29 |
+
*(Add bulletpoints here).*
|
| 30 |
+
|
| 31 |
+
## Alternatives
|
| 32 |
+
|
| 33 |
+
*(Add text here.)*
|
mkdocs.yml
CHANGED
|
@@ -26,6 +26,9 @@ nav:
|
|
| 26 |
- Getting Started as Developer:
|
| 27 |
- Installation: 'installation_developer.md'
|
| 28 |
- Developer Guide: 'developer_guide.md'
|
|
|
|
|
|
|
|
|
|
| 29 |
# - Data Collection: 'data_collection.md' # Unclear if even needed
|
| 30 |
# - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
|
| 31 |
- Contributing: 'contributing.md'
|
|
|
|
| 26 |
- Getting Started as Developer:
|
| 27 |
- Installation: 'installation_developer.md'
|
| 28 |
- Developer Guide: 'developer_guide.md'
|
| 29 |
+
- ADRs:
|
| 30 |
+
- 001 Design of HuggingFace Space Dockerfile: 'ADRs/001_design_of_huggingface_space_dockerfile.md'
|
| 31 |
+
- 002 Use HuggingFace ecosystem for ML: 'ADRs/002_use_HuggingFace_ecosystem_for_ML.md'
|
| 32 |
# - Data Collection: 'data_collection.md' # Unclear if even needed
|
| 33 |
# - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
|
| 34 |
- Contributing: 'contributing.md'
|