ModerRAS commited on
Commit
3197202
·
1 Parent(s): be5f706

Document AniFileBERT maintenance workflow

Browse files
Files changed (3) hide show
  1. ANDROID.md +6 -5
  2. MAINTENANCE.md +111 -0
  3. README.md +10 -0
ANDROID.md CHANGED
@@ -1,10 +1,11 @@
1
  # Android export and runtime
2
 
3
- This folder is a normal MiruPlay subdirectory, not a Git submodule. It contains
4
- the Python training pipeline plus an ONNX export path for Android.
 
5
 
6
  For the full scanner integration notes, file-vs-folder behavior, and device
7
- test procedure, see [`../../docs/anime-filename-parser.md`](../../docs/anime-filename-parser.md).
8
 
9
  ## Export
10
 
@@ -12,7 +13,7 @@ From `tools/anime_parser`:
12
 
13
  ```bash
14
  python -m pip install -r requirements.txt
15
- python export_onnx.py --model-dir checkpoints/final --android-assets-dir ../../scraper/src/main/assets/anime_parser
16
  ```
17
 
18
  The exporter writes:
@@ -30,7 +31,7 @@ The ONNX graph uses fixed Android inputs:
30
  - `logits`: `float32[1,64,15]`
31
 
32
  The current export was verified against PyTorch with max absolute logits
33
- difference `2.5033950805664062e-05`.
34
 
35
  ## Runtime
36
 
 
1
  # Android export and runtime
2
 
3
+ This repository is used by MiruPlay as a Git submodule at
4
+ `tools/anime_parser`. It contains the Python training pipeline plus an ONNX
5
+ export path for Android.
6
 
7
  For the full scanner integration notes, file-vs-folder behavior, and device
8
+ test procedure, see MiruPlay's `docs/anime-filename-parser.md`.
9
 
10
  ## Export
11
 
 
13
 
14
  ```bash
15
  python -m pip install -r requirements.txt
16
+ python export_onnx.py --model-dir checkpoints/dmhy-finetune/final --android-assets-dir ../../scraper/src/main/assets/anime_parser
17
  ```
18
 
19
  The exporter writes:
 
31
  - `logits`: `float32[1,64,15]`
32
 
33
  The current export was verified against PyTorch with max absolute logits
34
+ difference `1.621246337890625e-05`.
35
 
36
  ## Runtime
37
 
MAINTENANCE.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AniFileBERT Maintenance
2
+
3
+ This repository is the standalone Hugging Face model repo used by MiruPlay as
4
+ `tools/anime_parser`.
5
+
6
+ ## Related Repositories
7
+
8
+ | Repository | URL | Purpose |
9
+ |------------|-----|---------|
10
+ | AniFileBERT | `https://huggingface.co/ModerRAS/AniFileBERT` | Model, training scripts, ONNX export |
11
+ | AnimeName | `https://huggingface.co/datasets/ModerRAS/AnimeName` | Training datasets and manifests |
12
+ | MiruPlay | `https://github.com/ModerRAS/MiruPlay` | Android app and runtime integration |
13
+
14
+ Nested structure:
15
+
16
+ ```text
17
+ AniFileBERT
18
+ datasets/AnimeName -> ModerRAS/AnimeName
19
+ ```
20
+
21
+ ## Clone
22
+
23
+ ```bash
24
+ git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT
25
+ ```
26
+
27
+ After a normal clone:
28
+
29
+ ```bash
30
+ git submodule update --init --recursive
31
+ ```
32
+
33
+ ## Dataset Waterline
34
+
35
+ Current DMHY snapshot:
36
+
37
+ ```text
38
+ last_file_id: 689304
39
+ next_min_id: 689305
40
+ labeled_samples: 263042
41
+ mixed_train_samples: 363042
42
+ ```
43
+
44
+ The authoritative dataset files live in `datasets/AnimeName`.
45
+
46
+ ## Train
47
+
48
+ ```bash
49
+ python -m pip install -r requirements.txt
50
+ python train.py \
51
+ --data-file datasets/AnimeName/mixed_train.jsonl \
52
+ --vocab-file datasets/AnimeName/vocab.json \
53
+ --save-dir checkpoints/dmhy-finetune \
54
+ --init-model-dir . \
55
+ --epochs 1 \
56
+ --batch-size 128 \
57
+ --learning-rate 0.0003 \
58
+ --warmup-steps 300 \
59
+ --seed 42
60
+ ```
61
+
62
+ ## Publish a New Checkpoint
63
+
64
+ Copy the final checkpoint to the repository root:
65
+
66
+ ```powershell
67
+ Copy-Item checkpoints/dmhy-finetune/final/config.json . -Force
68
+ Copy-Item checkpoints/dmhy-finetune/final/model.safetensors . -Force
69
+ Copy-Item checkpoints/dmhy-finetune/final/tokenizer_config.json . -Force
70
+ Copy-Item checkpoints/dmhy-finetune/final/training_args.bin . -Force
71
+ Copy-Item checkpoints/dmhy-finetune/final/vocab.json . -Force
72
+ ```
73
+
74
+ Then commit and push:
75
+
76
+ ```bash
77
+ git add .
78
+ git commit -m "Update AniFileBERT checkpoint"
79
+ git push origin main
80
+ ```
81
+
82
+ ## Update the Dataset Submodule
83
+
84
+ After pushing new files to `ModerRAS/AnimeName`, update the nested pointer:
85
+
86
+ ```bash
87
+ git submodule update --remote datasets/AnimeName
88
+ git add datasets/AnimeName
89
+ git commit -m "Update AnimeName dataset pointer"
90
+ git push origin main
91
+ ```
92
+
93
+ ## Update MiruPlay
94
+
95
+ From the MiruPlay root:
96
+
97
+ ```bash
98
+ git submodule update --remote --recursive tools/anime_parser
99
+ git add tools/anime_parser
100
+ git commit -m "Update AniFileBERT submodule"
101
+ git push origin master
102
+ ```
103
+
104
+ If a new ONNX export changed Android runtime assets, also stage:
105
+
106
+ ```text
107
+ scraper/src/main/assets/anime_parser/anime_filename_parser.onnx
108
+ scraper/src/main/assets/anime_parser/config.json
109
+ scraper/src/main/assets/anime_parser/vocab.json
110
+ ```
111
+
README.md CHANGED
@@ -128,6 +128,16 @@ python export_onnx.py --model-dir checkpoints/dmhy-finetune/final --output expor
128
  - `export_onnx.py`: ONNX export for Android integration
129
  - `exports/`: exported ONNX model and metadata
130
  - `data/dmhy/*.manifest.json`: dataset waterlines and counts
 
 
 
 
 
 
 
 
 
 
131
 
132
 
133
 
 
128
  - `export_onnx.py`: ONNX export for Android integration
129
  - `exports/`: exported ONNX model and metadata
130
  - `data/dmhy/*.manifest.json`: dataset waterlines and counts
131
+ - `datasets/AnimeName/`: nested dataset submodule
132
+
133
+ ## Maintenance Notes
134
+
135
+ MiruPlay tracks this repository as `tools/anime_parser`, and this repository
136
+ tracks `ModerRAS/AnimeName` as `datasets/AnimeName`. After updating either
137
+ repo, remember to commit the submodule pointer in the parent repo.
138
+
139
+ For the full maintenance workflow, see MiruPlay's
140
+ `docs/anifilebert-maintenance.md`.
141
 
142
 
143