2026-06-21 18:28:11 | INFO | dapt | Environment: { "timestamp": "2026-06-21T18:28:11", "python": "3.11.2", "platform": "Linux-6.1.0-49-amd64-x86_64-with-glibc2.36", "torch": "2.12.1+cu130", "transformers": "5.12.1", "datasets": "5.0.0", "cuda_available": true, "cuda": "13.0", "gpu_name": "NVIDIA RTX PRO 6000 Blackwell Server Edition", "gpu_total_memory_gb": 95.0, "gpu_capability": "12.0", "bf16_supported": true, "torch_arch_list": [ "sm_75", "sm_80", "sm_86", "sm_90", "sm_100", "sm_120" ], "gpu_arch_supported_by_torch": true } 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json "HTTP/1.1 200 OK" 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 307 Temporary Redirect" 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main?recursive=true&expand=false "HTTP/1.1 307 Temporary Redirect" 2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large "HTTP/1.1 307 Temporary Redirect" 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large "HTTP/1.1 200 OK" 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md "HTTP/1.1 404 Not Found" 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta "HTTP/1.1 200 OK" 2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py "HTTP/1.1 404 Not Found" 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py "HTTP/1.1 404 Not Found" 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md "HTTP/1.1 404 Not Found" 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/revision/bb4994d1d2ac8765e4df863a5846210c8ecaacd1 "HTTP/1.1 200 OK" 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml "HTTP/1.1 404 Not Found" 2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta "HTTP/1.1 501 Not Implemented" 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1?recursive=false&expand=false "HTTP/1.1 200 OK" 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json "HTTP/1.1 404 Not Found" 2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00000-of-00002.arrow "HTTP/1.1 302 Found" 2026-06-21 18:28:22 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00001-of-00002.arrow "HTTP/1.1 302 Found" 2026-06-21 18:28:28 | WARNING | dapt | Dataset mschonhardt/MedBerta is PRE-TOKENIZED (columns=['input_ids', 'attention_mask', 'doc_id', 'category']). Skipping raw-text tokenization and DOC-SENTENCES packing. 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md "HTTP/1.1 404 Not Found" 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta "HTTP/1.1 200 OK" 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py "HTTP/1.1 404 Not Found" 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py "HTTP/1.1 404 Not Found" 2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md "HTTP/1.1 404 Not Found" 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml "HTTP/1.1 404 Not Found" 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta "HTTP/1.1 501 Not Implemented" 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" 2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json "HTTP/1.1 404 Not Found" 2026-06-21 18:28:29 | INFO | dapt | loaded pretokenized canon -> 291630 sequences (doc_id=doc_id) 2026-06-21 18:28:35 | INFO | dapt | Pretokenized diagnostics: unk_rate=0.0037% mean_len=500.1 bos=100% eos=100% 2026-06-21 18:28:36 | INFO | dapt | Applying STRATIFIED document split based on column 'category'. 2026-06-21 18:28:40 | INFO | dapt | Pretokenized split (document-level STRATIFIED group-aware) -> train=286062 val=5568 test=0 2026-06-21 18:29:04 | INFO | dapt | Saved pretokenized datasets to .cache_packed/packed_a891125d9987e2ce527ec74b 2026-06-21 18:29:28 | INFO | dapt | Packed dataset statistics: { "train": { "sequences": 286062, "tokens": 142794676, "mean_seq_len": 499.2 }, "validation": { "sequences": 5568, "tokens": 2812986, "mean_seq_len": 505.2 } } 2026-06-21 18:29:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json "HTTP/1.1 200 OK" 2026-06-21 18:29:28 | INFO | dapt | Model parameters: 560.1M 2026-06-22 04:00:28 | INFO | dapt | Final evaluation: { "final_val_loss": 1.0483721494674683, "final_val_masked_accuracy": 0.7699716443872566, "final_val_runtime": 25.2687, "final_val_samples_per_second": 220.351, "final_val_steps_per_second": 3.443, "epoch": 10.0, "num_input_tokens_seen": 1464637440, "final_val_perplexity": 2.853003073356264 } 2026-06-22 04:00:28 | INFO | dapt | Done. Best model + tokenizer saved to ./MediBerta