MediBerta / train.log
mschonhardt's picture
Add files using upload-large-folder tool
6b4288c verified
Raw
History Blame Contribute Delete
7.8 kB
2026-06-21 18:28:11 | INFO | dapt | Environment:
{
"timestamp": "2026-06-21T18:28:11",
"python": "3.11.2",
"platform": "Linux-6.1.0-49-amd64-x86_64-with-glibc2.36",
"torch": "2.12.1+cu130",
"transformers": "5.12.1",
"datasets": "5.0.0",
"cuda_available": true,
"cuda": "13.0",
"gpu_name": "NVIDIA RTX PRO 6000 Blackwell Server Edition",
"gpu_total_memory_gb": 95.0,
"gpu_capability": "12.0",
"bf16_supported": true,
"torch_arch_list": [
"sm_75",
"sm_80",
"sm_86",
"sm_90",
"sm_100",
"sm_120"
],
"gpu_arch_supported_by_torch": true
}
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json "HTTP/1.1 200 OK"
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 307 Temporary Redirect"
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large/tree/main?recursive=true&expand=false "HTTP/1.1 307 Temporary Redirect"
2026-06-21 18:28:12 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/xlm-roberta-large "HTTP/1.1 307 Temporary Redirect"
2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/models/FacebookAI/xlm-roberta-large "HTTP/1.1 200 OK"
2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md "HTTP/1.1 404 Not Found"
2026-06-21 18:28:14 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta "HTTP/1.1 200 OK"
2026-06-21 18:28:14 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py "HTTP/1.1 404 Not Found"
2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py "HTTP/1.1 404 Not Found"
2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md "HTTP/1.1 404 Not Found"
2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/revision/bb4994d1d2ac8765e4df863a5846210c8ecaacd1 "HTTP/1.1 200 OK"
2026-06-21 18:28:15 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml "HTTP/1.1 404 Not Found"
2026-06-21 18:28:15 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta "HTTP/1.1 501 Not Implemented"
2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
2026-06-21 18:28:16 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1?recursive=false&expand=false "HTTP/1.1 200 OK"
2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json "HTTP/1.1 404 Not Found"
2026-06-21 18:28:16 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00000-of-00002.arrow "HTTP/1.1 302 Found"
2026-06-21 18:28:22 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data-00001-of-00002.arrow "HTTP/1.1 302 Found"
2026-06-21 18:28:28 | WARNING | dapt | Dataset mschonhardt/MedBerta is PRE-TOKENIZED (columns=['input_ids', 'attention_mask', 'doc_id', 'category']). Skipping raw-text tokenization and DOC-SENTENCES packing.
2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/main/README.md "HTTP/1.1 404 Not Found"
2026-06-21 18:28:28 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta "HTTP/1.1 200 OK"
2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/MedBerta.py "HTTP/1.1 404 Not Found"
2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/mschonhardt/MedBerta/mschonhardt/MedBerta.py "HTTP/1.1 404 Not Found"
2026-06-21 18:28:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/README.md "HTTP/1.1 404 Not Found"
2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/.huggingface.yaml "HTTP/1.1 404 Not Found"
2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=mschonhardt/MedBerta "HTTP/1.1 501 Not Implemented"
2026-06-21 18:28:29 | INFO | httpx | HTTP Request: GET https://huggingface.co/api/datasets/mschonhardt/MedBerta/tree/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
2026-06-21 18:28:29 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/datasets/mschonhardt/MedBerta/resolve/bb4994d1d2ac8765e4df863a5846210c8ecaacd1/dataset_infos.json "HTTP/1.1 404 Not Found"
2026-06-21 18:28:29 | INFO | dapt | loaded pretokenized canon -> 291630 sequences (doc_id=doc_id)
2026-06-21 18:28:35 | INFO | dapt | Pretokenized diagnostics: unk_rate=0.0037% mean_len=500.1 bos=100% eos=100%
2026-06-21 18:28:36 | INFO | dapt | Applying STRATIFIED document split based on column 'category'.
2026-06-21 18:28:40 | INFO | dapt | Pretokenized split (document-level STRATIFIED group-aware) -> train=286062 val=5568 test=0
2026-06-21 18:29:04 | INFO | dapt | Saved pretokenized datasets to .cache_packed/packed_a891125d9987e2ce527ec74b
2026-06-21 18:29:28 | INFO | dapt | Packed dataset statistics:
{
"train": {
"sequences": 286062,
"tokens": 142794676,
"mean_seq_len": 499.2
},
"validation": {
"sequences": 5568,
"tokens": 2812986,
"mean_seq_len": 505.2
}
}
2026-06-21 18:29:28 | INFO | httpx | HTTP Request: HEAD https://huggingface.co/xlm-roberta-large/resolve/main/config.json "HTTP/1.1 200 OK"
2026-06-21 18:29:28 | INFO | dapt | Model parameters: 560.1M
2026-06-22 04:00:28 | INFO | dapt | Final evaluation:
{
"final_val_loss": 1.0483721494674683,
"final_val_masked_accuracy": 0.7699716443872566,
"final_val_runtime": 25.2687,
"final_val_samples_per_second": 220.351,
"final_val_steps_per_second": 3.443,
"epoch": 10.0,
"num_input_tokens_seen": 1464637440,
"final_val_perplexity": 2.853003073356264
}
2026-06-22 04:00:28 | INFO | dapt | Done. Best model + tokenizer saved to ./MediBerta