Spaces:

PellelNitram
/

xournalpp_htr

Running

App Files Files Community

Martin L (GitHub Actions) commited on Nov 3, 2025

Commit

76cb64f

1 Parent(s): d614eb1

Automated deployment from GitHub (source commit: 95e226)

Browse files

Files changed (4) hide show

docs/ADRs/001_design_of_huggingface_space_dockerfile.md +32 -0
docs/ADRs/002_use_HuggingFace_ecosystem_for_ML.md +52 -0
docs/ADRs/template.md +33 -0
mkdocs.yml +3 -0

docs/ADRs/001_design_of_huggingface_space_dockerfile.md ADDED Viewed

	@@ -0,0 +1,32 @@

+# Design of HuggingFace Space Dockerfile
+- Status: Ongoing
+- Deciders: Martin Lellep (@PellelNitram)
+- Drivers: Martin Lellep (@PellelNitram)
+- PRD: None
+- Date: 2025-10-04
+## Context
+*Explain the background and the context in which the decision is being made. Include any relevant information about the problem, constraints, or goals.*
+## Decisions
+*State the decision that has been made. Be clear and concise.*
+- In the future, download models at build time into the Docker image from Github release page. In the
+  very far future, pull them from HuggingFace at run-time.
+- Add `xournalpp` binary to Docker image so that the `xopp` file can be exported as PDF prior to
+  execution of the HTR pipeline.
+## Consequences
+*Describe the consequences of the decision. Include both positive and negative outcomes, as well as any trade-offs.*
+## Alternatives Considered
+*List and briefly describe other options that were considered and why they were not chosen.*
+## References
+*Include links or references to any supporting documentation, discussions, or resources.*

docs/ADRs/002_use_HuggingFace_ecosystem_for_ML.md ADDED Viewed

	@@ -0,0 +1,52 @@

+# ADR 002 – Use Hugging Face Ecosystem for Machine Learning
+- Date: 2025-11-03
+- Status: Accepted
+- PRD: None
+- Drivers: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
+- Deciders: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
+## Context
+The project originally used plain PyTorch with local files for datasets and model storage.
+This made it hard to share and version models; once trained, they essentially lived on a hard drive with no central management or deployment integration and I had to rely on Google Drive and Dropbox links.
+Also, training models on different machines is cumbersome because one needs to manually download the dataset every time.
+## Decision
+Adopt the **[Hugging Face ecosystem](https://huggingface.co/)** for all machine learning–related components, including:
+* Model Hub: hosting and versioning trained models
+* Dataset Hub: storing and sharing datasets
+* `transformers` and `datasets` libraries: for training and data handling
+* Trainer API: for standard training workflows
+Note: We need to agree on a good naming scheme for storing and retrieving models and datasets efficiently on HuggingFace.
+This will be the subject of a future ADR.
+## Rationale
+Hugging Face offers a free, community-maintained platform that is now the industry standard for open ML projects.
+It provides built-in versioning and sharing, making it easy to pull models directly in demos or end-user environments.
+The same applies to retrieving properly versioned datasets for training, benchmarking, and various demo use cases (e.g., providing sample data in Gradio applications).
+## Consequences
+### Pros
+* Centralized and versioned model/dataset hosting
+* Easier sharing, collaboration, and reproducibility
+* Straightforward integration in deployments by letting HuggingFace download the model automatically
+* Large and active community support
+* One can either fully integrate the model by subclassing `PreTrainedModel` or use it as plain artifact storage of
+  the binary weights file
+### Cons
+* Requires learning new APIs and conventions
+* Custom training routines may need workarounds
+* Invest time to learn how to convert a PyTorch model into a HF model, incl pre and post processing code
+## Alternatives
+Continuing with plain PyTorch and local storage would have been simpler but lacked any versioning, reproducibility, or sharing capabilities.

docs/ADRs/template.md ADDED Viewed

	@@ -0,0 +1,33 @@

+# ADR 002 – Use Hugging Face Ecosystem for Machine Learning
+- Date: YYYY-MM-DD
+- Status: Accepted or Ongoing or Superseeded by [ADR]()
+- PRD: None
+- Drivers: Name ([Link to Github handle](https://github.com/))
+- Deciders: Name ([Link to Github handle](https://github.com/))
+## Context
+*(Add text here.)*
+## Decision
+*(Add text here.)*
+## Rationale
+*(Add text here.)*
+## Consequences
+### Pros
+*(Add bulletpoints here).*
+### Cons
+*(Add bulletpoints here).*
+## Alternatives
+*(Add text here.)*

mkdocs.yml CHANGED Viewed

@@ -26,6 +26,9 @@ nav:
   - Getting Started as Developer:
     - Installation: 'installation_developer.md'
     - Developer Guide: 'developer_guide.md'
     # - Data Collection: 'data_collection.md' # Unclear if even needed
     # - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
   - Contributing: 'contributing.md'

   - Getting Started as Developer:
     - Installation: 'installation_developer.md'
     - Developer Guide: 'developer_guide.md'
+    - ADRs:
+      - 001 Design of HuggingFace Space Dockerfile: 'ADRs/001_design_of_huggingface_space_dockerfile.md'
+      - 002 Use HuggingFace ecosystem for ML: 'ADRs/002_use_HuggingFace_ecosystem_for_ML.md'
     # - Data Collection: 'data_collection.md' # Unclear if even needed
     # - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
   - Contributing: 'contributing.md'