| 2026-06-21 18:28:11 | INFO | dapt | Environment: |
| { |
| : , |
| : , |
| : , |
| : , |
| : , |
| : , |
| : true, |
| : , |
| : , |
| : 95.0, |
| : , |
| : true, |
| : [ |
| , |
| , |
| , |
| , |
| , |
| |
| ], |
| : true |
| } |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/tokenizer_config.json |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main?recursive=true&expand=false |
| 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main?recursive=true&expand=false |
| 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large |
| 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large |
| 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md |
| 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta |
| 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py |
| 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py |
| 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md |
| 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/revision/bb4994d1d2ac8765e4df863a5846210c8ecaacd1 |
| 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml |
| 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta |
| 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false |
| 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1?recursive=false&expand=false |
| 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json |
| 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00000-of-00002.arrow |
| 2026-06-21 18:28:22 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00001-of-00002.arrow |
| 2026-06-21 18:28:28 | WARNING | dapt | Dataset mschonhardt/MedBerta is PRE-TOKENIZED (columns=['input_ids', 'attention_mask', 'doc_id', 'category']). Skipping raw-text tokenization and DOC-SENTENCES packing. |
| 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md |
| 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta |
| 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py |
| 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py |
| 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md |
| 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml |
| 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta |
| 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false |
| 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json |
| 2026-06-21 18:28:29 | INFO | dapt | loaded pretokenized canon -> 291630 sequences (doc_id=doc_id) |
| 2026-06-21 18:28:35 | INFO | dapt | Pretokenized diagnostics: unk_rate=0.0037% mean_len=500.1 bos=100% eos=100% |
| 2026-06-21 18:28:36 | INFO | dapt | Applying STRATIFIED document split based on column 'category'. |
| 2026-06-21 18:28:40 | INFO | dapt | Pretokenized split (document-level STRATIFIED group-aware) -> train=286062 val=5568 test=0 |
| 2026-06-21 18:29:04 | INFO | dapt | Saved pretokenized datasets to .cache_packed/packed_a891125d9987e2ce527ec74b |
| 2026-06-21 18:29:28 | INFO | dapt | Packed dataset statistics: |
| { |
| : { |
| : 286062, |
| : 142794676, |
| : 499.2 |
| }, |
| : { |
| : 5568, |
| : 2812986, |
| : 505.2 |
| } |
| } |
| 2026-06-21 18:29:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json |
| 2026-06-21 18:29:28 | INFO | dapt | Model parameters: 560.1M |
| 2026-06-22 04:00:28 | INFO | dapt | Final evaluation: |
| { |
| : 1.0483721494674683, |
| : 0.7699716443872566, |
| : 25.2687, |
| : 220.351, |
| : 3.443, |
| : 10.0, |
| : 1464637440, |
| : 2.853003073356264 |
| } |
| 2026-06-22 04:00:28 | INFO | dapt | Done. Best model + tokenizer saved to ./MediBerta |
|
|