File size: 2,597 Bytes
3197202
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# AniFileBERT Maintenance

This repository is the standalone Hugging Face model repo used by MiruPlay as
`tools/anime_parser`.

## Related Repositories

| Repository | URL | Purpose |
|------------|-----|---------|
| AniFileBERT | `https://huggingface.co/ModerRAS/AniFileBERT` | Model, training scripts, ONNX export |
| AnimeName | `https://huggingface.co/datasets/ModerRAS/AnimeName` | Training datasets and manifests |
| MiruPlay | `https://github.com/ModerRAS/MiruPlay` | Android app and runtime integration |

Nested structure:

```text
AniFileBERT
  datasets/AnimeName -> ModerRAS/AnimeName
```

## Clone

```bash
git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT
```

After a normal clone:

```bash
git submodule update --init --recursive
```

## Dataset Waterline

Current DMHY snapshot:

```text
last_file_id: 689304
next_min_id: 689305
labeled_samples: 263042
mixed_train_samples: 363042
```

The authoritative dataset files live in `datasets/AnimeName`.

## Train

```bash
python -m pip install -r requirements.txt
python train.py \
  --data-file datasets/AnimeName/mixed_train.jsonl \
  --vocab-file datasets/AnimeName/vocab.json \
  --save-dir checkpoints/dmhy-finetune \
  --init-model-dir . \
  --epochs 1 \
  --batch-size 128 \
  --learning-rate 0.0003 \
  --warmup-steps 300 \
  --seed 42
```

## Publish a New Checkpoint

Copy the final checkpoint to the repository root:

```powershell
Copy-Item checkpoints/dmhy-finetune/final/config.json . -Force
Copy-Item checkpoints/dmhy-finetune/final/model.safetensors . -Force
Copy-Item checkpoints/dmhy-finetune/final/tokenizer_config.json . -Force
Copy-Item checkpoints/dmhy-finetune/final/training_args.bin . -Force
Copy-Item checkpoints/dmhy-finetune/final/vocab.json . -Force
```

Then commit and push:

```bash
git add .
git commit -m "Update AniFileBERT checkpoint"
git push origin main
```

## Update the Dataset Submodule

After pushing new files to `ModerRAS/AnimeName`, update the nested pointer:

```bash
git submodule update --remote datasets/AnimeName
git add datasets/AnimeName
git commit -m "Update AnimeName dataset pointer"
git push origin main
```

## Update MiruPlay

From the MiruPlay root:

```bash
git submodule update --remote --recursive tools/anime_parser
git add tools/anime_parser
git commit -m "Update AniFileBERT submodule"
git push origin master
```

If a new ONNX export changed Android runtime assets, also stage:

```text
scraper/src/main/assets/anime_parser/anime_filename_parser.onnx
scraper/src/main/assets/anime_parser/config.json
scraper/src/main/assets/anime_parser/vocab.json
```