Martin L (GitHub Actions) commited on
Commit
76cb64f
·
1 Parent(s): d614eb1

Automated deployment from GitHub (source commit: 95e226)

Browse files
docs/ADRs/001_design_of_huggingface_space_dockerfile.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Design of HuggingFace Space Dockerfile
2
+
3
+ - Status: Ongoing
4
+ - Deciders: Martin Lellep (@PellelNitram)
5
+ - Drivers: Martin Lellep (@PellelNitram)
6
+ - PRD: None
7
+ - Date: 2025-10-04
8
+
9
+ ## Context
10
+
11
+ *Explain the background and the context in which the decision is being made. Include any relevant information about the problem, constraints, or goals.*
12
+
13
+ ## Decisions
14
+
15
+ *State the decision that has been made. Be clear and concise.*
16
+
17
+ - In the future, download models at build time into the Docker image from Github release page. In the
18
+ very far future, pull them from HuggingFace at run-time.
19
+ - Add `xournalpp` binary to Docker image so that the `xopp` file can be exported as PDF prior to
20
+ execution of the HTR pipeline.
21
+
22
+ ## Consequences
23
+
24
+ *Describe the consequences of the decision. Include both positive and negative outcomes, as well as any trade-offs.*
25
+
26
+ ## Alternatives Considered
27
+
28
+ *List and briefly describe other options that were considered and why they were not chosen.*
29
+
30
+ ## References
31
+
32
+ *Include links or references to any supporting documentation, discussions, or resources.*
docs/ADRs/002_use_HuggingFace_ecosystem_for_ML.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ADR 002 – Use Hugging Face Ecosystem for Machine Learning
2
+
3
+ - Date: 2025-11-03
4
+ - Status: Accepted
5
+ - PRD: None
6
+ - Drivers: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
7
+ - Deciders: Martin Lellep ([@PellelNitram](https://github.com/PellelNitram/))
8
+
9
+ ## Context
10
+
11
+ The project originally used plain PyTorch with local files for datasets and model storage.
12
+ This made it hard to share and version models; once trained, they essentially lived on a hard drive with no central management or deployment integration and I had to rely on Google Drive and Dropbox links.
13
+ Also, training models on different machines is cumbersome because one needs to manually download the dataset every time.
14
+
15
+ ## Decision
16
+
17
+ Adopt the **[Hugging Face ecosystem](https://huggingface.co/)** for all machine learning–related components, including:
18
+
19
+ * Model Hub: hosting and versioning trained models
20
+ * Dataset Hub: storing and sharing datasets
21
+ * `transformers` and `datasets` libraries: for training and data handling
22
+ * Trainer API: for standard training workflows
23
+
24
+ Note: We need to agree on a good naming scheme for storing and retrieving models and datasets efficiently on HuggingFace.
25
+ This will be the subject of a future ADR.
26
+
27
+ ## Rationale
28
+
29
+ Hugging Face offers a free, community-maintained platform that is now the industry standard for open ML projects.
30
+ It provides built-in versioning and sharing, making it easy to pull models directly in demos or end-user environments.
31
+ The same applies to retrieving properly versioned datasets for training, benchmarking, and various demo use cases (e.g., providing sample data in Gradio applications).
32
+
33
+ ## Consequences
34
+
35
+ ### Pros
36
+
37
+ * Centralized and versioned model/dataset hosting
38
+ * Easier sharing, collaboration, and reproducibility
39
+ * Straightforward integration in deployments by letting HuggingFace download the model automatically
40
+ * Large and active community support
41
+ * One can either fully integrate the model by subclassing `PreTrainedModel` or use it as plain artifact storage of
42
+ the binary weights file
43
+
44
+ ### Cons
45
+
46
+ * Requires learning new APIs and conventions
47
+ * Custom training routines may need workarounds
48
+ * Invest time to learn how to convert a PyTorch model into a HF model, incl pre and post processing code
49
+
50
+ ## Alternatives
51
+
52
+ Continuing with plain PyTorch and local storage would have been simpler but lacked any versioning, reproducibility, or sharing capabilities.
docs/ADRs/template.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ADR 002 – Use Hugging Face Ecosystem for Machine Learning
2
+
3
+ - Date: YYYY-MM-DD
4
+ - Status: Accepted or Ongoing or Superseeded by [ADR]()
5
+ - PRD: None
6
+ - Drivers: Name ([Link to Github handle](https://github.com/))
7
+ - Deciders: Name ([Link to Github handle](https://github.com/))
8
+
9
+ ## Context
10
+
11
+ *(Add text here.)*
12
+
13
+ ## Decision
14
+
15
+ *(Add text here.)*
16
+
17
+ ## Rationale
18
+
19
+ *(Add text here.)*
20
+
21
+ ## Consequences
22
+
23
+ ### Pros
24
+
25
+ *(Add bulletpoints here).*
26
+
27
+ ### Cons
28
+
29
+ *(Add bulletpoints here).*
30
+
31
+ ## Alternatives
32
+
33
+ *(Add text here.)*
mkdocs.yml CHANGED
@@ -26,6 +26,9 @@ nav:
26
  - Getting Started as Developer:
27
  - Installation: 'installation_developer.md'
28
  - Developer Guide: 'developer_guide.md'
 
 
 
29
  # - Data Collection: 'data_collection.md' # Unclear if even needed
30
  # - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
31
  - Contributing: 'contributing.md'
 
26
  - Getting Started as Developer:
27
  - Installation: 'installation_developer.md'
28
  - Developer Guide: 'developer_guide.md'
29
+ - ADRs:
30
+ - 001 Design of HuggingFace Space Dockerfile: 'ADRs/001_design_of_huggingface_space_dockerfile.md'
31
+ - 002 Use HuggingFace ecosystem for ML: 'ADRs/002_use_HuggingFace_ecosystem_for_ML.md'
32
  # - Data Collection: 'data_collection.md' # Unclear if even needed
33
  # - Developing New Models: 'developing_new_models.md' # Very unclear what to write as I haven't built anything yet
34
  - Contributing: 'contributing.md'