Vvbvv / training.log
jnjj's picture
Periodic upload
9cf0f22 verified
2025-05-06 16:30:06,158 INFO: Gradio: Loading model and tokenizer from arnir0/Tiny-LLM (model), arnir0/Tiny-LLM (tokenizer)
2025-05-06 16:30:08,577 INFO: Gradio: Model arnir0/Tiny-LLM and tokenizer loaded.
2025-05-06 16:30:08,578 INFO: Interfaz Gradio iniciada en http://0.0.0.0:7860
2025-05-06 16:30:08,587 INFO: Servidor de métricas Prometheus iniciado en el puerto 8000.
2025-05-06 16:30:08,587 INFO: Fetching new list of datasets...
2025-05-06 16:30:41,406 INFO: HTTP Request: GET http://localhost:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-05-06 16:30:41,433 INFO: HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK"
2025-05-06 16:31:07,974 INFO: Fetched 379843 datasets to process.
2025-05-06 16:31:08,585 INFO: Preparing data for nvidia/Nemotron-CrossThink, config: default
2025-05-06 16:31:12,688 INFO: Upload successful.
2025-05-06 16:31:12,688 INFO: Preparing data for nvidia/OpenMathReasoning, config: default
2025-05-06 16:31:14,303 INFO: Starting model update for nvidia/Nemotron-CrossThink, config: default
2025-05-06 16:31:17,591 INFO: Finished training and saved model/tokenizer for nvidia/Nemotron-CrossThink config default
2025-05-06 16:31:17,595 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:31:17,970 INFO: Preparing data for nvidia/OpenCodeReasoning, config: split_0
2025-05-06 16:31:18,439 INFO: Starting model update for nvidia/OpenMathReasoning, config: default
2025-05-06 16:31:21,695 INFO: Finished training and saved model/tokenizer for nvidia/OpenMathReasoning config default
2025-05-06 16:31:21,776 ERROR: Failed to get configs for rajpurkarlab/ReXGradient-160K: Dataset 'rajpurkarlab/ReXGradient-160K' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/rajpurkarlab/ReXGradient-160K to ask for access.
2025-05-06 16:31:29,921 INFO: Starting model update for nvidia/OpenCodeReasoning, config: split_0
2025-05-06 16:31:39,156 INFO: Finished training and saved model/tokenizer for nvidia/OpenCodeReasoning config split_0
2025-05-06 16:32:15,293 INFO: Upload successful.
2025-05-06 16:32:17,716 WARNING: Repo card metadata block was not found. Setting CardData to empty.
2025-05-06 16:32:18,098 INFO: Preparing data for deepseek-ai/DeepSeek-ProverBench, config: default
2025-05-06 16:32:18,216 WARNING: Repo card metadata block was not found. Setting CardData to empty.
2025-05-06 16:32:18,265 INFO: Preparing data for fka/awesome-chatgpt-prompts, config: default
2025-05-06 16:32:20,585 INFO: Starting model update for fka/awesome-chatgpt-prompts, config: default
2025-05-06 16:32:22,050 INFO: Finished training and saved model/tokenizer for fka/awesome-chatgpt-prompts config default
2025-05-06 16:32:22,050 INFO: Starting model update for deepseek-ai/DeepSeek-ProverBench, config: default
2025-05-06 16:32:22,051 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:32:23,356 INFO: Finished training and saved model/tokenizer for deepseek-ai/DeepSeek-ProverBench config default
2025-05-06 16:32:23,963 INFO: Preparing data for OpenGVLab/InternVL-Data, config: default
2025-05-06 16:32:24,197 INFO: Preparing data for nvidia/Llama-Nemotron-Post-Training-Dataset, config: SFT
2025-05-06 16:32:26,974 INFO: Starting model update for nvidia/Llama-Nemotron-Post-Training-Dataset, config: SFT
2025-05-06 16:32:29,446 INFO: Finished training and saved model/tokenizer for nvidia/Llama-Nemotron-Post-Training-Dataset config SFT
2025-05-06 16:32:30,119 ERROR: Error during data preparation for OpenGVLab/InternVL-Data config default: JSON parse error: Invalid value. in row 0
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 160, in _generate_tables
df = pandas_read_json(f)
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 38, in pandas_read_json
return pd.read_json(path_or_buf, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 791, in read_json
json_reader = JsonReader(
File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 905, in __init__
self.data = self._preprocess_data(data)
File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 917, in _preprocess_data
data = data.read()
File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 827, in read_with_retries
out = read(*args, **kwargs)
File "/usr/local/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/app.py", line 233, in process_and_train
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 233, in <lambda>
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__
for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples
yield from enumerate(self.config.generator(**gen_kwargs))
File "/home/user/app/app.py", line 214, in gen_data_for_cfg
for ex in dataset_split:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 302, in __iter__
for key, pa_table in self.generate_tables_fn(**gen_kwags):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 163, in _generate_tables
raise e
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 137, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 342, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Invalid value. in row 0
2025-05-06 16:33:18,163 INFO: Upload successful.
2025-05-06 16:33:22,552 INFO: Preparing data for Eureka-Lab/PHYBench, config: default
2025-05-06 16:33:22,655 INFO: Preparing data for nyuuzyou/svgfind, config: default
2025-05-06 16:33:24,226 INFO: Starting model update for Eureka-Lab/PHYBench, config: default
2025-05-06 16:33:25,829 ERROR: Error during data preparation for nyuuzyou/svgfind config default: Compression type zstd not supported
Traceback (most recent call last):
File "/home/user/app/app.py", line 233, in process_and_train
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 233, in <lambda>
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__
for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples
yield from enumerate(self.config.generator(**gen_kwargs))
File "/home/user/app/app.py", line 214, in gen_data_for_cfg
for ex in dataset_split:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 302, in __iter__
for key, pa_table in self.generate_tables_fn(**gen_kwags):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 99, in _generate_tables
for file_idx, file in enumerate(itertools.chain.from_iterable(files)):
File "/usr/local/lib/python3.10/site-packages/datasets/utils/track.py", line 49, in __iter__
for x in self.generator(*self.args):
File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 1366, in _iter_from_urlpaths
elif xisdir(urlpath, download_config=download_config):
File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 799, in xisdir
return fs.isdir(inner_path)
File "/usr/local/lib/python3.10/site-packages/fsspec/spec.py", line 701, in isdir
return self.info(path)["type"] == "directory"
File "/usr/local/lib/python3.10/site-packages/fsspec/archive.py", line 40, in info
self._get_dirs()
File "/usr/local/lib/python3.10/site-packages/datasets/filesystems/compression.py", line 66, in _get_dirs
f = {**self._open_with_fsspec().fs.info(self.fo), "name": self.uncompressed_name}
File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 491, in open
out = open_files(
File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 314, in open_files
[
File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 315, in <listcomp>
OpenFile(
File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 78, in __init__
self.compression = get_compression(path, compression)
File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 544, in get_compression
raise ValueError(f"Compression type {compression} not supported")
ValueError: Compression type zstd not supported
2025-05-06 16:33:27,737 INFO: Finished training and saved model/tokenizer for Eureka-Lab/PHYBench config default
2025-05-06 16:33:27,738 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:33:28,089 INFO: Preparing data for FreedomIntelligence/medical-o1-reasoning-SFT, config: en
2025-05-06 16:33:28,153 INFO: Preparing data for BramVanroy/CommonCrawl-CreativeCommons, config: v1
2025-05-06 16:33:32,653 INFO: Starting model update for FreedomIntelligence/medical-o1-reasoning-SFT, config: en
2025-05-06 16:33:36,683 INFO: Finished training and saved model/tokenizer for FreedomIntelligence/medical-o1-reasoning-SFT config en
2025-05-06 16:33:36,683 INFO: Starting model update for BramVanroy/CommonCrawl-CreativeCommons, config: v1
2025-05-06 16:33:38,967 INFO: Finished training and saved model/tokenizer for BramVanroy/CommonCrawl-CreativeCommons config v1
2025-05-06 16:34:26,492 INFO: Upload successful.
2025-05-06 16:34:28,175 INFO: Preparing data for Anthropic/values-in-the-wild, config: values_frequencies
2025-05-06 16:34:29,468 INFO: Starting model update for Anthropic/values-in-the-wild, config: values_frequencies
2025-05-06 16:34:30,869 INFO: Finished training and saved model/tokenizer for Anthropic/values-in-the-wild config values_frequencies
2025-05-06 16:34:30,872 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:34:31,335 INFO: Preparing data for zwhe99/DeepMath-103K, config: default
2025-05-06 16:34:36,224 INFO: Starting model update for zwhe99/DeepMath-103K, config: default
2025-05-06 16:34:43,124 INFO: Finished training and saved model/tokenizer for zwhe99/DeepMath-103K config default
2025-05-06 16:34:43,784 INFO: Preparing data for nvidia/When2Call, config: test
2025-05-06 16:34:45,485 INFO: Starting model update for nvidia/When2Call, config: test
2025-05-06 16:34:48,284 INFO: Finished training and saved model/tokenizer for nvidia/When2Call config test
2025-05-06 16:34:48,390 INFO: Preparing data for HuggingFaceFW/fineweb, config: default
2025-05-06 16:35:16,738 INFO: Starting model update for HuggingFaceFW/fineweb, config: default
2025-05-06 16:35:19,184 INFO: Finished training and saved model/tokenizer for HuggingFaceFW/fineweb config default
2025-05-06 16:35:29,000 INFO: Upload successful.
2025-05-06 16:35:31,652 INFO: Preparing data for Amod/mental_health_counseling_conversations, config: default
2025-05-06 16:35:31,699 INFO: Preparing data for Giova-tech/sentiment-analysis-test, config: default
2025-05-06 16:35:34,308 INFO: Starting model update for Giova-tech/sentiment-analysis-test, config: default
2025-05-06 16:35:35,832 INFO: Finished training and saved model/tokenizer for Giova-tech/sentiment-analysis-test config default
2025-05-06 16:35:35,833 INFO: Starting model update for Amod/mental_health_counseling_conversations, config: default
2025-05-06 16:35:35,833 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:35:37,841 INFO: Finished training and saved model/tokenizer for Amod/mental_health_counseling_conversations config default
2025-05-06 16:35:38,679 INFO: Preparing data for Mxode/Chinese-Instruct, config: stem_zh
2025-05-06 16:35:39,297 INFO: Preparing data for syCen/CameraBench, config: default
2025-05-06 16:35:40,980 INFO: Starting model update for Mxode/Chinese-Instruct, config: stem_zh
2025-05-06 16:35:43,130 INFO: Finished training and saved model/tokenizer for Mxode/Chinese-Instruct config stem_zh
2025-05-06 16:35:43,130 INFO: Starting model update for syCen/CameraBench, config: default
2025-05-06 16:35:44,612 INFO: Finished training and saved model/tokenizer for syCen/CameraBench config default
2025-05-06 16:36:31,841 INFO: Upload successful.
2025-05-06 16:36:36,228 INFO: Preparing data for open-r1/OpenR1-Math-220k, config: all
2025-05-06 16:36:36,551 INFO: Preparing data for quotientai/HalluMix, config: default
2025-05-06 16:36:40,037 INFO: Starting model update for quotientai/HalluMix, config: default
2025-05-06 16:36:43,175 INFO: Finished training and saved model/tokenizer for quotientai/HalluMix config default
2025-05-06 16:36:43,176 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:36:43,349 INFO: Starting model update for open-r1/OpenR1-Math-220k, config: all
2025-05-06 16:36:43,350 ERROR: Failed to get configs for LLM360/MegaMath: Dataset 'LLM360/MegaMath' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/LLM360/MegaMath to ask for access.
2025-05-06 16:36:45,068 INFO: Preparing data for Aleph-Alpha/Aleph-Alpha-GermanWeb, config: fineweb2
2025-05-06 16:36:47,376 INFO: Finished training and saved model/tokenizer for open-r1/OpenR1-Math-220k config all
2025-05-06 16:36:52,446 INFO: Starting model update for Aleph-Alpha/Aleph-Alpha-GermanWeb, config: fineweb2
2025-05-06 16:36:57,163 INFO: Finished training and saved model/tokenizer for Aleph-Alpha/Aleph-Alpha-GermanWeb config fineweb2
2025-05-06 16:37:45,107 INFO: Upload successful.
2025-05-06 16:37:45,675 INFO: Preparing data for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset, config: default
2025-05-06 16:37:45,680 INFO: Preparing data for qwertychri/sentiment-analysis-test, config: default
2025-05-06 16:37:47,830 INFO: Starting model update for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset, config: default
2025-05-06 16:37:50,035 INFO: Finished training and saved model/tokenizer for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset config default
2025-05-06 16:37:50,035 INFO: Starting model update for qwertychri/sentiment-analysis-test, config: default
2025-05-06 16:37:50,036 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:37:51,657 INFO: Finished training and saved model/tokenizer for qwertychri/sentiment-analysis-test config default
2025-05-06 16:37:51,753 INFO: Preparing data for openai/gsm8k, config: main
2025-05-06 16:37:52,059 INFO: Preparing data for open-thoughts/OpenThoughts-114k, config: default
2025-05-06 16:37:53,926 INFO: Starting model update for openai/gsm8k, config: main
2025-05-06 16:37:55,591 INFO: Finished training and saved model/tokenizer for openai/gsm8k config main
2025-05-06 16:37:56,166 INFO: Starting model update for open-thoughts/OpenThoughts-114k, config: default
2025-05-06 16:37:59,075 INFO: Finished training and saved model/tokenizer for open-thoughts/OpenThoughts-114k config default
2025-05-06 16:38:47,926 INFO: Upload successful.
2025-05-06 16:38:50,587 INFO: Preparing data for Felipeit/sentiment-analysis-test, config: default
2025-05-06 16:38:50,652 INFO: Preparing data for Riccardoschillaci7/sentiment-analysis-test, config: default
2025-05-06 16:38:52,831 INFO: Starting model update for Riccardoschillaci7/sentiment-analysis-test, config: default
2025-05-06 16:38:54,734 INFO: Finished training and saved model/tokenizer for Riccardoschillaci7/sentiment-analysis-test config default
2025-05-06 16:38:54,735 INFO: Starting model update for Felipeit/sentiment-analysis-test, config: default
2025-05-06 16:38:54,735 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:38:56,381 INFO: Finished training and saved model/tokenizer for Felipeit/sentiment-analysis-test config default
2025-05-06 16:38:56,556 INFO: Preparing data for Cocciadipollo/sentiment-analysis-test, config: default
2025-05-06 16:38:56,925 INFO: Preparing data for Merlinooooo/sentiment-analysis-test, config: default
2025-05-06 16:38:59,252 INFO: Starting model update for Cocciadipollo/sentiment-analysis-test, config: default
2025-05-06 16:39:00,796 INFO: Finished training and saved model/tokenizer for Cocciadipollo/sentiment-analysis-test config default
2025-05-06 16:39:00,797 INFO: Starting model update for Merlinooooo/sentiment-analysis-test, config: default
2025-05-06 16:39:02,326 INFO: Finished training and saved model/tokenizer for Merlinooooo/sentiment-analysis-test config default
2025-05-06 16:39:59,725 INFO: Upload successful.
2025-05-06 16:40:00,379 INFO: Preparing data for nvidia/describe-anything-dataset, config: COCOStuff
2025-05-06 16:40:00,492 INFO: Preparing data for happycircus1/sentiment-analysis-test, config: default
2025-05-06 16:40:02,910 INFO: Starting model update for happycircus1/sentiment-analysis-test, config: default
2025-05-06 16:40:04,558 INFO: Finished training and saved model/tokenizer for happycircus1/sentiment-analysis-test config default
2025-05-06 16:40:04,558 INFO: Starting model update for nvidia/describe-anything-dataset, config: COCOStuff
2025-05-06 16:40:04,558 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:40:07,289 INFO: Preparing data for wikimedia/wikipedia, config: 20231101.ab
2025-05-06 16:40:07,531 INFO: Finished training and saved model/tokenizer for nvidia/describe-anything-dataset config COCOStuff
2025-05-06 16:40:08,176 ERROR: Failed to get configs for Genius-Society/hoyoTTS: Loading Genius-Society/hoyoTTS requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
2025-05-06 16:40:10,192 INFO: Starting model update for wikimedia/wikipedia, config: 20231101.ab
2025-05-06 16:40:12,067 INFO: Finished training and saved model/tokenizer for wikimedia/wikipedia config 20231101.ab
2025-05-06 16:41:05,668 INFO: Upload successful.
2025-05-06 16:41:05,804 WARNING: Repo card metadata block was not found. Setting CardData to empty.
2025-05-06 16:41:06,132 ERROR: Failed to get configs for Genius-Society/wwTTS: Loading Genius-Society/wwTTS requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
2025-05-06 16:41:06,355 INFO: Preparing data for reyavir/PromptEvals, config: default
2025-05-06 16:41:06,429 WARNING: Repo card metadata block was not found. Setting CardData to empty.
2025-05-06 16:41:07,539 INFO: Preparing data for GeneralReasoning/GeneralThought-430K, config: default
2025-05-06 16:41:08,448 INFO: Starting model update for reyavir/PromptEvals, config: default
2025-05-06 16:41:10,496 INFO: Finished training and saved model/tokenizer for reyavir/PromptEvals config default
2025-05-06 16:41:10,496 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:41:10,563 ERROR: Failed to get configs for whitecircle-ai/circleguardbench_public: Dataset 'whitecircle-ai/circleguardbench_public' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/whitecircle-ai/circleguardbench_public to ask for access.
2025-05-06 16:41:15,167 INFO: Starting model update for GeneralReasoning/GeneralThought-430K, config: default
2025-05-06 16:41:20,197 INFO: Finished training and saved model/tokenizer for GeneralReasoning/GeneralThought-430K config default
2025-05-06 16:42:10,057 INFO: Upload successful.
2025-05-06 16:42:11,040 INFO: Preparing data for Liux69/sentiment-analysis-test, config: default
2025-05-06 16:42:14,025 INFO: Starting model update for Liux69/sentiment-analysis-test, config: default
2025-05-06 16:42:16,068 ERROR: Failed to get configs for nvidia/dynpose-100k: No (supported) data files found in nvidia/dynpose-100k
2025-05-06 16:42:16,263 INFO: Finished training and saved model/tokenizer for Liux69/sentiment-analysis-test config default
2025-05-06 16:42:16,267 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:42:16,631 INFO: Preparing data for reasonir/reasonir-data, config: hq
2025-05-06 16:42:16,917 INFO: Preparing data for SWE-bench/SWE-smith, config: default
2025-05-06 16:42:19,700 INFO: Starting model update for reasonir/reasonir-data, config: hq
2025-05-06 16:42:23,851 INFO: Finished training and saved model/tokenizer for reasonir/reasonir-data config hq
2025-05-06 16:42:23,852 INFO: Starting model update for SWE-bench/SWE-smith, config: default
2025-05-06 16:42:27,711 INFO: Finished training and saved model/tokenizer for SWE-bench/SWE-smith config default
2025-05-06 16:43:14,405 INFO: Upload successful.
2025-05-06 16:43:16,748 INFO: Preparing data for ZennyKenny/cosa-benchmark-dataset, config: default
2025-05-06 16:43:16,887 INFO: Preparing data for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset, config: default
2025-05-06 16:43:18,940 INFO: Starting model update for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset, config: default
2025-05-06 16:43:20,769 INFO: Finished training and saved model/tokenizer for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset config default
2025-05-06 16:43:20,770 INFO: Starting model update for ZennyKenny/cosa-benchmark-dataset, config: default
2025-05-06 16:43:20,770 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:43:22,456 INFO: Finished training and saved model/tokenizer for ZennyKenny/cosa-benchmark-dataset config default
2025-05-06 16:43:22,496 INFO: Preparing data for deepmind/aqua_rat, config: raw
2025-05-06 16:43:30,605 INFO: Starting model update for deepmind/aqua_rat, config: raw
2025-05-06 16:43:30,617 INFO: Preparing data for allenai/c4, config: en
2025-05-06 16:43:33,126 INFO: Finished training and saved model/tokenizer for deepmind/aqua_rat config raw
2025-05-06 16:43:38,702 INFO: Starting model update for allenai/c4, config: en
2025-05-06 16:43:40,978 INFO: Finished training and saved model/tokenizer for allenai/c4 config en
2025-05-06 16:44:17,507 INFO: Upload successful.
2025-05-06 16:44:20,888 ERROR: Failed to get configs for gaia-benchmark/GAIA: Dataset 'gaia-benchmark/GAIA' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/gaia-benchmark/GAIA to ask for access.
2025-05-06 16:44:21,431 INFO: Preparing data for HuggingFaceH4/MATH-500, config: default
2025-05-06 16:44:22,903 INFO: Starting model update for HuggingFaceH4/MATH-500, config: default
2025-05-06 16:44:24,553 INFO: Finished training and saved model/tokenizer for HuggingFaceH4/MATH-500 config default
2025-05-06 16:44:24,554 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment
2025-05-06 16:44:24,623 ERROR: Failed to get configs for cais/hle: Dataset 'cais/hle' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/cais/hle to ask for access.
2025-05-06 16:44:30,479 INFO: Preparing data for MLCommons/unsupervised_peoples_speech, config: default
2025-05-06 16:44:43,691 ERROR: Error during data preparation for MLCommons/unsupervised_peoples_speech config default: To support encoding audio data, please install 'soundfile'.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datasets/features/audio.py", line 88, in encode_example
import soundfile as sf # soundfile is a dependency of librosa, needed to decode audio files.
ModuleNotFoundError: No module named 'soundfile'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/app/app.py", line 233, in process_and_train
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 233, in <lambda>
first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None))
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__
for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None):
File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples
yield from enumerate(self.config.generator(**gen_kwargs))
File "/home/user/app/app.py", line 214, in gen_data_for_cfg
for ex in dataset_split:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1869, in __iter__
example = _apply_feature_types_on_example(
File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1779, in _apply_feature_types_on_example
encoded_example = features.encode_example(example)
File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 2049, in encode_example
return encode_nested_example(self, example)
File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1292, in encode_nested_example
{k: encode_nested_example(schema[k], obj.get(k), level=level + 1) for k in schema}
File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1292, in <dictcomp>
{k: encode_nested_example(schema[k], obj.get(k), level=level + 1) for k in schema}
File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1362, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "/usr/local/lib/python3.10/site-packages/datasets/features/audio.py", line 90, in encode_example
raise ImportError("To support encoding audio data, please install 'soundfile'.") from err
ImportError: To support encoding audio data, please install 'soundfile'.