| 2025-05-06 16:30:06,158 INFO: Gradio: Loading model and tokenizer from arnir0/Tiny-LLM (model), arnir0/Tiny-LLM (tokenizer) | |
| 2025-05-06 16:30:08,577 INFO: Gradio: Model arnir0/Tiny-LLM and tokenizer loaded. | |
| 2025-05-06 16:30:08,578 INFO: Interfaz Gradio iniciada en http://0.0.0.0:7860 | |
| 2025-05-06 16:30:08,587 INFO: Servidor de métricas Prometheus iniciado en el puerto 8000. | |
| 2025-05-06 16:30:08,587 INFO: Fetching new list of datasets... | |
| 2025-05-06 16:30:41,406 INFO: HTTP Request: GET http://localhost:7860/gradio_api/startup-events "HTTP/1.1 200 OK" | |
| 2025-05-06 16:30:41,433 INFO: HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK" | |
| 2025-05-06 16:31:07,974 INFO: Fetched 379843 datasets to process. | |
| 2025-05-06 16:31:08,585 INFO: Preparing data for nvidia/Nemotron-CrossThink, config: default | |
| 2025-05-06 16:31:12,688 INFO: Upload successful. | |
| 2025-05-06 16:31:12,688 INFO: Preparing data for nvidia/OpenMathReasoning, config: default | |
| 2025-05-06 16:31:14,303 INFO: Starting model update for nvidia/Nemotron-CrossThink, config: default | |
| 2025-05-06 16:31:17,591 INFO: Finished training and saved model/tokenizer for nvidia/Nemotron-CrossThink config default | |
| 2025-05-06 16:31:17,595 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:31:17,970 INFO: Preparing data for nvidia/OpenCodeReasoning, config: split_0 | |
| 2025-05-06 16:31:18,439 INFO: Starting model update for nvidia/OpenMathReasoning, config: default | |
| 2025-05-06 16:31:21,695 INFO: Finished training and saved model/tokenizer for nvidia/OpenMathReasoning config default | |
| 2025-05-06 16:31:21,776 ERROR: Failed to get configs for rajpurkarlab/ReXGradient-160K: Dataset 'rajpurkarlab/ReXGradient-160K' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/rajpurkarlab/ReXGradient-160K to ask for access. | |
| 2025-05-06 16:31:29,921 INFO: Starting model update for nvidia/OpenCodeReasoning, config: split_0 | |
| 2025-05-06 16:31:39,156 INFO: Finished training and saved model/tokenizer for nvidia/OpenCodeReasoning config split_0 | |
| 2025-05-06 16:32:15,293 INFO: Upload successful. | |
| 2025-05-06 16:32:17,716 WARNING: Repo card metadata block was not found. Setting CardData to empty. | |
| 2025-05-06 16:32:18,098 INFO: Preparing data for deepseek-ai/DeepSeek-ProverBench, config: default | |
| 2025-05-06 16:32:18,216 WARNING: Repo card metadata block was not found. Setting CardData to empty. | |
| 2025-05-06 16:32:18,265 INFO: Preparing data for fka/awesome-chatgpt-prompts, config: default | |
| 2025-05-06 16:32:20,585 INFO: Starting model update for fka/awesome-chatgpt-prompts, config: default | |
| 2025-05-06 16:32:22,050 INFO: Finished training and saved model/tokenizer for fka/awesome-chatgpt-prompts config default | |
| 2025-05-06 16:32:22,050 INFO: Starting model update for deepseek-ai/DeepSeek-ProverBench, config: default | |
| 2025-05-06 16:32:22,051 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:32:23,356 INFO: Finished training and saved model/tokenizer for deepseek-ai/DeepSeek-ProverBench config default | |
| 2025-05-06 16:32:23,963 INFO: Preparing data for OpenGVLab/InternVL-Data, config: default | |
| 2025-05-06 16:32:24,197 INFO: Preparing data for nvidia/Llama-Nemotron-Post-Training-Dataset, config: SFT | |
| 2025-05-06 16:32:26,974 INFO: Starting model update for nvidia/Llama-Nemotron-Post-Training-Dataset, config: SFT | |
| 2025-05-06 16:32:29,446 INFO: Finished training and saved model/tokenizer for nvidia/Llama-Nemotron-Post-Training-Dataset config SFT | |
| 2025-05-06 16:32:30,119 ERROR: Error during data preparation for OpenGVLab/InternVL-Data config default: JSON parse error: Invalid value. in row 0 | |
| Traceback (most recent call last): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 160, in _generate_tables | |
| df = pandas_read_json(f) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 38, in pandas_read_json | |
| return pd.read_json(path_or_buf, **kwargs) | |
| File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 791, in read_json | |
| json_reader = JsonReader( | |
| File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 905, in __init__ | |
| self.data = self._preprocess_data(data) | |
| File "/usr/local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 917, in _preprocess_data | |
| data = data.read() | |
| File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 827, in read_with_retries | |
| out = read(*args, **kwargs) | |
| File "/usr/local/lib/python3.10/codecs.py", line 322, in decode | |
| (result, consumed) = self._buffer_decode(data, self.errors, final) | |
| UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte | |
| During handling of the above exception, another exception occurred: | |
| Traceback (most recent call last): | |
| File "/home/user/app/app.py", line 233, in process_and_train | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread | |
| return await loop.run_in_executor(None, func_call) | |
| File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run | |
| result = self.fn(*self.args, **self.kwargs) | |
| File "/home/user/app/app.py", line 233, in <lambda> | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__ | |
| for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples | |
| yield from enumerate(self.config.generator(**gen_kwargs)) | |
| File "/home/user/app/app.py", line 214, in gen_data_for_cfg | |
| for ex in dataset_split: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 302, in __iter__ | |
| for key, pa_table in self.generate_tables_fn(**gen_kwags): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 163, in _generate_tables | |
| raise e | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 137, in _generate_tables | |
| pa_table = paj.read_json( | |
| File "pyarrow/_json.pyx", line 342, in pyarrow._json.read_json | |
| File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status | |
| File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status | |
| pyarrow.lib.ArrowInvalid: JSON parse error: Invalid value. in row 0 | |
| 2025-05-06 16:33:18,163 INFO: Upload successful. | |
| 2025-05-06 16:33:22,552 INFO: Preparing data for Eureka-Lab/PHYBench, config: default | |
| 2025-05-06 16:33:22,655 INFO: Preparing data for nyuuzyou/svgfind, config: default | |
| 2025-05-06 16:33:24,226 INFO: Starting model update for Eureka-Lab/PHYBench, config: default | |
| 2025-05-06 16:33:25,829 ERROR: Error during data preparation for nyuuzyou/svgfind config default: Compression type zstd not supported | |
| Traceback (most recent call last): | |
| File "/home/user/app/app.py", line 233, in process_and_train | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread | |
| return await loop.run_in_executor(None, func_call) | |
| File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run | |
| result = self.fn(*self.args, **self.kwargs) | |
| File "/home/user/app/app.py", line 233, in <lambda> | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__ | |
| for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples | |
| yield from enumerate(self.config.generator(**gen_kwargs)) | |
| File "/home/user/app/app.py", line 214, in gen_data_for_cfg | |
| for ex in dataset_split: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 302, in __iter__ | |
| for key, pa_table in self.generate_tables_fn(**gen_kwags): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 99, in _generate_tables | |
| for file_idx, file in enumerate(itertools.chain.from_iterable(files)): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/utils/track.py", line 49, in __iter__ | |
| for x in self.generator(*self.args): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 1366, in _iter_from_urlpaths | |
| elif xisdir(urlpath, download_config=download_config): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 799, in xisdir | |
| return fs.isdir(inner_path) | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/spec.py", line 701, in isdir | |
| return self.info(path)["type"] == "directory" | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/archive.py", line 40, in info | |
| self._get_dirs() | |
| File "/usr/local/lib/python3.10/site-packages/datasets/filesystems/compression.py", line 66, in _get_dirs | |
| f = {**self._open_with_fsspec().fs.info(self.fo), "name": self.uncompressed_name} | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 491, in open | |
| out = open_files( | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 314, in open_files | |
| [ | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 315, in <listcomp> | |
| OpenFile( | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 78, in __init__ | |
| self.compression = get_compression(path, compression) | |
| File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 544, in get_compression | |
| raise ValueError(f"Compression type {compression} not supported") | |
| ValueError: Compression type zstd not supported | |
| 2025-05-06 16:33:27,737 INFO: Finished training and saved model/tokenizer for Eureka-Lab/PHYBench config default | |
| 2025-05-06 16:33:27,738 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:33:28,089 INFO: Preparing data for FreedomIntelligence/medical-o1-reasoning-SFT, config: en | |
| 2025-05-06 16:33:28,153 INFO: Preparing data for BramVanroy/CommonCrawl-CreativeCommons, config: v1 | |
| 2025-05-06 16:33:32,653 INFO: Starting model update for FreedomIntelligence/medical-o1-reasoning-SFT, config: en | |
| 2025-05-06 16:33:36,683 INFO: Finished training and saved model/tokenizer for FreedomIntelligence/medical-o1-reasoning-SFT config en | |
| 2025-05-06 16:33:36,683 INFO: Starting model update for BramVanroy/CommonCrawl-CreativeCommons, config: v1 | |
| 2025-05-06 16:33:38,967 INFO: Finished training and saved model/tokenizer for BramVanroy/CommonCrawl-CreativeCommons config v1 | |
| 2025-05-06 16:34:26,492 INFO: Upload successful. | |
| 2025-05-06 16:34:28,175 INFO: Preparing data for Anthropic/values-in-the-wild, config: values_frequencies | |
| 2025-05-06 16:34:29,468 INFO: Starting model update for Anthropic/values-in-the-wild, config: values_frequencies | |
| 2025-05-06 16:34:30,869 INFO: Finished training and saved model/tokenizer for Anthropic/values-in-the-wild config values_frequencies | |
| 2025-05-06 16:34:30,872 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:34:31,335 INFO: Preparing data for zwhe99/DeepMath-103K, config: default | |
| 2025-05-06 16:34:36,224 INFO: Starting model update for zwhe99/DeepMath-103K, config: default | |
| 2025-05-06 16:34:43,124 INFO: Finished training and saved model/tokenizer for zwhe99/DeepMath-103K config default | |
| 2025-05-06 16:34:43,784 INFO: Preparing data for nvidia/When2Call, config: test | |
| 2025-05-06 16:34:45,485 INFO: Starting model update for nvidia/When2Call, config: test | |
| 2025-05-06 16:34:48,284 INFO: Finished training and saved model/tokenizer for nvidia/When2Call config test | |
| 2025-05-06 16:34:48,390 INFO: Preparing data for HuggingFaceFW/fineweb, config: default | |
| 2025-05-06 16:35:16,738 INFO: Starting model update for HuggingFaceFW/fineweb, config: default | |
| 2025-05-06 16:35:19,184 INFO: Finished training and saved model/tokenizer for HuggingFaceFW/fineweb config default | |
| 2025-05-06 16:35:29,000 INFO: Upload successful. | |
| 2025-05-06 16:35:31,652 INFO: Preparing data for Amod/mental_health_counseling_conversations, config: default | |
| 2025-05-06 16:35:31,699 INFO: Preparing data for Giova-tech/sentiment-analysis-test, config: default | |
| 2025-05-06 16:35:34,308 INFO: Starting model update for Giova-tech/sentiment-analysis-test, config: default | |
| 2025-05-06 16:35:35,832 INFO: Finished training and saved model/tokenizer for Giova-tech/sentiment-analysis-test config default | |
| 2025-05-06 16:35:35,833 INFO: Starting model update for Amod/mental_health_counseling_conversations, config: default | |
| 2025-05-06 16:35:35,833 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:35:37,841 INFO: Finished training and saved model/tokenizer for Amod/mental_health_counseling_conversations config default | |
| 2025-05-06 16:35:38,679 INFO: Preparing data for Mxode/Chinese-Instruct, config: stem_zh | |
| 2025-05-06 16:35:39,297 INFO: Preparing data for syCen/CameraBench, config: default | |
| 2025-05-06 16:35:40,980 INFO: Starting model update for Mxode/Chinese-Instruct, config: stem_zh | |
| 2025-05-06 16:35:43,130 INFO: Finished training and saved model/tokenizer for Mxode/Chinese-Instruct config stem_zh | |
| 2025-05-06 16:35:43,130 INFO: Starting model update for syCen/CameraBench, config: default | |
| 2025-05-06 16:35:44,612 INFO: Finished training and saved model/tokenizer for syCen/CameraBench config default | |
| 2025-05-06 16:36:31,841 INFO: Upload successful. | |
| 2025-05-06 16:36:36,228 INFO: Preparing data for open-r1/OpenR1-Math-220k, config: all | |
| 2025-05-06 16:36:36,551 INFO: Preparing data for quotientai/HalluMix, config: default | |
| 2025-05-06 16:36:40,037 INFO: Starting model update for quotientai/HalluMix, config: default | |
| 2025-05-06 16:36:43,175 INFO: Finished training and saved model/tokenizer for quotientai/HalluMix config default | |
| 2025-05-06 16:36:43,176 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:36:43,349 INFO: Starting model update for open-r1/OpenR1-Math-220k, config: all | |
| 2025-05-06 16:36:43,350 ERROR: Failed to get configs for LLM360/MegaMath: Dataset 'LLM360/MegaMath' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/LLM360/MegaMath to ask for access. | |
| 2025-05-06 16:36:45,068 INFO: Preparing data for Aleph-Alpha/Aleph-Alpha-GermanWeb, config: fineweb2 | |
| 2025-05-06 16:36:47,376 INFO: Finished training and saved model/tokenizer for open-r1/OpenR1-Math-220k config all | |
| 2025-05-06 16:36:52,446 INFO: Starting model update for Aleph-Alpha/Aleph-Alpha-GermanWeb, config: fineweb2 | |
| 2025-05-06 16:36:57,163 INFO: Finished training and saved model/tokenizer for Aleph-Alpha/Aleph-Alpha-GermanWeb config fineweb2 | |
| 2025-05-06 16:37:45,107 INFO: Upload successful. | |
| 2025-05-06 16:37:45,675 INFO: Preparing data for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset, config: default | |
| 2025-05-06 16:37:45,680 INFO: Preparing data for qwertychri/sentiment-analysis-test, config: default | |
| 2025-05-06 16:37:47,830 INFO: Starting model update for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset, config: default | |
| 2025-05-06 16:37:50,035 INFO: Finished training and saved model/tokenizer for ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset config default | |
| 2025-05-06 16:37:50,035 INFO: Starting model update for qwertychri/sentiment-analysis-test, config: default | |
| 2025-05-06 16:37:50,036 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:37:51,657 INFO: Finished training and saved model/tokenizer for qwertychri/sentiment-analysis-test config default | |
| 2025-05-06 16:37:51,753 INFO: Preparing data for openai/gsm8k, config: main | |
| 2025-05-06 16:37:52,059 INFO: Preparing data for open-thoughts/OpenThoughts-114k, config: default | |
| 2025-05-06 16:37:53,926 INFO: Starting model update for openai/gsm8k, config: main | |
| 2025-05-06 16:37:55,591 INFO: Finished training and saved model/tokenizer for openai/gsm8k config main | |
| 2025-05-06 16:37:56,166 INFO: Starting model update for open-thoughts/OpenThoughts-114k, config: default | |
| 2025-05-06 16:37:59,075 INFO: Finished training and saved model/tokenizer for open-thoughts/OpenThoughts-114k config default | |
| 2025-05-06 16:38:47,926 INFO: Upload successful. | |
| 2025-05-06 16:38:50,587 INFO: Preparing data for Felipeit/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:50,652 INFO: Preparing data for Riccardoschillaci7/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:52,831 INFO: Starting model update for Riccardoschillaci7/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:54,734 INFO: Finished training and saved model/tokenizer for Riccardoschillaci7/sentiment-analysis-test config default | |
| 2025-05-06 16:38:54,735 INFO: Starting model update for Felipeit/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:54,735 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:38:56,381 INFO: Finished training and saved model/tokenizer for Felipeit/sentiment-analysis-test config default | |
| 2025-05-06 16:38:56,556 INFO: Preparing data for Cocciadipollo/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:56,925 INFO: Preparing data for Merlinooooo/sentiment-analysis-test, config: default | |
| 2025-05-06 16:38:59,252 INFO: Starting model update for Cocciadipollo/sentiment-analysis-test, config: default | |
| 2025-05-06 16:39:00,796 INFO: Finished training and saved model/tokenizer for Cocciadipollo/sentiment-analysis-test config default | |
| 2025-05-06 16:39:00,797 INFO: Starting model update for Merlinooooo/sentiment-analysis-test, config: default | |
| 2025-05-06 16:39:02,326 INFO: Finished training and saved model/tokenizer for Merlinooooo/sentiment-analysis-test config default | |
| 2025-05-06 16:39:59,725 INFO: Upload successful. | |
| 2025-05-06 16:40:00,379 INFO: Preparing data for nvidia/describe-anything-dataset, config: COCOStuff | |
| 2025-05-06 16:40:00,492 INFO: Preparing data for happycircus1/sentiment-analysis-test, config: default | |
| 2025-05-06 16:40:02,910 INFO: Starting model update for happycircus1/sentiment-analysis-test, config: default | |
| 2025-05-06 16:40:04,558 INFO: Finished training and saved model/tokenizer for happycircus1/sentiment-analysis-test config default | |
| 2025-05-06 16:40:04,558 INFO: Starting model update for nvidia/describe-anything-dataset, config: COCOStuff | |
| 2025-05-06 16:40:04,558 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:40:07,289 INFO: Preparing data for wikimedia/wikipedia, config: 20231101.ab | |
| 2025-05-06 16:40:07,531 INFO: Finished training and saved model/tokenizer for nvidia/describe-anything-dataset config COCOStuff | |
| 2025-05-06 16:40:08,176 ERROR: Failed to get configs for Genius-Society/hoyoTTS: Loading Genius-Society/hoyoTTS requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error. | |
| 2025-05-06 16:40:10,192 INFO: Starting model update for wikimedia/wikipedia, config: 20231101.ab | |
| 2025-05-06 16:40:12,067 INFO: Finished training and saved model/tokenizer for wikimedia/wikipedia config 20231101.ab | |
| 2025-05-06 16:41:05,668 INFO: Upload successful. | |
| 2025-05-06 16:41:05,804 WARNING: Repo card metadata block was not found. Setting CardData to empty. | |
| 2025-05-06 16:41:06,132 ERROR: Failed to get configs for Genius-Society/wwTTS: Loading Genius-Society/wwTTS requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error. | |
| 2025-05-06 16:41:06,355 INFO: Preparing data for reyavir/PromptEvals, config: default | |
| 2025-05-06 16:41:06,429 WARNING: Repo card metadata block was not found. Setting CardData to empty. | |
| 2025-05-06 16:41:07,539 INFO: Preparing data for GeneralReasoning/GeneralThought-430K, config: default | |
| 2025-05-06 16:41:08,448 INFO: Starting model update for reyavir/PromptEvals, config: default | |
| 2025-05-06 16:41:10,496 INFO: Finished training and saved model/tokenizer for reyavir/PromptEvals config default | |
| 2025-05-06 16:41:10,496 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:41:10,563 ERROR: Failed to get configs for whitecircle-ai/circleguardbench_public: Dataset 'whitecircle-ai/circleguardbench_public' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/whitecircle-ai/circleguardbench_public to ask for access. | |
| 2025-05-06 16:41:15,167 INFO: Starting model update for GeneralReasoning/GeneralThought-430K, config: default | |
| 2025-05-06 16:41:20,197 INFO: Finished training and saved model/tokenizer for GeneralReasoning/GeneralThought-430K config default | |
| 2025-05-06 16:42:10,057 INFO: Upload successful. | |
| 2025-05-06 16:42:11,040 INFO: Preparing data for Liux69/sentiment-analysis-test, config: default | |
| 2025-05-06 16:42:14,025 INFO: Starting model update for Liux69/sentiment-analysis-test, config: default | |
| 2025-05-06 16:42:16,068 ERROR: Failed to get configs for nvidia/dynpose-100k: No (supported) data files found in nvidia/dynpose-100k | |
| 2025-05-06 16:42:16,263 INFO: Finished training and saved model/tokenizer for Liux69/sentiment-analysis-test config default | |
| 2025-05-06 16:42:16,267 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:42:16,631 INFO: Preparing data for reasonir/reasonir-data, config: hq | |
| 2025-05-06 16:42:16,917 INFO: Preparing data for SWE-bench/SWE-smith, config: default | |
| 2025-05-06 16:42:19,700 INFO: Starting model update for reasonir/reasonir-data, config: hq | |
| 2025-05-06 16:42:23,851 INFO: Finished training and saved model/tokenizer for reasonir/reasonir-data config hq | |
| 2025-05-06 16:42:23,852 INFO: Starting model update for SWE-bench/SWE-smith, config: default | |
| 2025-05-06 16:42:27,711 INFO: Finished training and saved model/tokenizer for SWE-bench/SWE-smith config default | |
| 2025-05-06 16:43:14,405 INFO: Upload successful. | |
| 2025-05-06 16:43:16,748 INFO: Preparing data for ZennyKenny/cosa-benchmark-dataset, config: default | |
| 2025-05-06 16:43:16,887 INFO: Preparing data for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset, config: default | |
| 2025-05-06 16:43:18,940 INFO: Starting model update for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset, config: default | |
| 2025-05-06 16:43:20,769 INFO: Finished training and saved model/tokenizer for kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset config default | |
| 2025-05-06 16:43:20,770 INFO: Starting model update for ZennyKenny/cosa-benchmark-dataset, config: default | |
| 2025-05-06 16:43:20,770 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:43:22,456 INFO: Finished training and saved model/tokenizer for ZennyKenny/cosa-benchmark-dataset config default | |
| 2025-05-06 16:43:22,496 INFO: Preparing data for deepmind/aqua_rat, config: raw | |
| 2025-05-06 16:43:30,605 INFO: Starting model update for deepmind/aqua_rat, config: raw | |
| 2025-05-06 16:43:30,617 INFO: Preparing data for allenai/c4, config: en | |
| 2025-05-06 16:43:33,126 INFO: Finished training and saved model/tokenizer for deepmind/aqua_rat config raw | |
| 2025-05-06 16:43:38,702 INFO: Starting model update for allenai/c4, config: en | |
| 2025-05-06 16:43:40,978 INFO: Finished training and saved model/tokenizer for allenai/c4 config en | |
| 2025-05-06 16:44:17,507 INFO: Upload successful. | |
| 2025-05-06 16:44:20,888 ERROR: Failed to get configs for gaia-benchmark/GAIA: Dataset 'gaia-benchmark/GAIA' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/gaia-benchmark/GAIA to ask for access. | |
| 2025-05-06 16:44:21,431 INFO: Preparing data for HuggingFaceH4/MATH-500, config: default | |
| 2025-05-06 16:44:22,903 INFO: Starting model update for HuggingFaceH4/MATH-500, config: default | |
| 2025-05-06 16:44:24,553 INFO: Finished training and saved model/tokenizer for HuggingFaceH4/MATH-500 config default | |
| 2025-05-06 16:44:24,554 ERROR: Error in background_training_loop task scheduling: local variable 'merged_model' referenced before assignment | |
| 2025-05-06 16:44:24,623 ERROR: Failed to get configs for cais/hle: Dataset 'cais/hle' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/cais/hle to ask for access. | |
| 2025-05-06 16:44:30,479 INFO: Preparing data for MLCommons/unsupervised_peoples_speech, config: default | |
| 2025-05-06 16:44:43,691 ERROR: Error during data preparation for MLCommons/unsupervised_peoples_speech config default: To support encoding audio data, please install 'soundfile'. | |
| Traceback (most recent call last): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/audio.py", line 88, in encode_example | |
| import soundfile as sf # soundfile is a dependency of librosa, needed to decode audio files. | |
| ModuleNotFoundError: No module named 'soundfile' | |
| The above exception was the direct cause of the following exception: | |
| Traceback (most recent call last): | |
| File "/home/user/app/app.py", line 233, in process_and_train | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread | |
| return await loop.run_in_executor(None, func_call) | |
| File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run | |
| result = self.fn(*self.args, **self.kwargs) | |
| File "/home/user/app/app.py", line 233, in <lambda> | |
| first_item = await asyncio.to_thread(lambda: next(iter(train_ds_instance), None)) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 222, in __iter__ | |
| for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None): | |
| File "/usr/local/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples | |
| yield from enumerate(self.config.generator(**gen_kwargs)) | |
| File "/home/user/app/app.py", line 214, in gen_data_for_cfg | |
| for ex in dataset_split: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 2266, in __iter__ | |
| for key, example in ex_iterable: | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1869, in __iter__ | |
| example = _apply_feature_types_on_example( | |
| File "/usr/local/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1779, in _apply_feature_types_on_example | |
| encoded_example = features.encode_example(example) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 2049, in encode_example | |
| return encode_nested_example(self, example) | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1292, in encode_nested_example | |
| {k: encode_nested_example(schema[k], obj.get(k), level=level + 1) for k in schema} | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1292, in <dictcomp> | |
| {k: encode_nested_example(schema[k], obj.get(k), level=level + 1) for k in schema} | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/features.py", line 1362, in encode_nested_example | |
| return schema.encode_example(obj) if obj is not None else None | |
| File "/usr/local/lib/python3.10/site-packages/datasets/features/audio.py", line 90, in encode_example | |
| raise ImportError("To support encoding audio data, please install 'soundfile'.") from err | |
| ImportError: To support encoding audio data, please install 'soundfile'. | |