AttributeError: module 'datasets.load' has no attribute 'init_dynamic_modules'

#556
by tusharagashe - opened

I am trying to run Ray hyper parameter optimization for finetuning a cell classification model and when executing my script I keep getting this compounding error. I am following the basic installation of the environment by creating a clean environment and doing 'pip install . ' within the Geneformer folder. Then to run ray I also have to pip install hyperopt. With those steps I get to the point where ray starts loading the trials but every trial errors out with this error. I have tried to resolve this with various LLMs and they always make me downgrade to datasets==2.13.0 but that leads to compounding errors with pyarrow, numpy, and more. Has anyone dealt with this I feel like this is a simple package fix but I cannot get it to work.

Full error:
2025-08-06 06:46:12,364 ERROR tune_controller.py:1331 -- Trial task failed for trial _objective_b720fe25
Traceback (most recent call last):
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/_private/worker.py", line 2858, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/_private/worker.py", line 958, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::ImplicitFunc.train() (pid=101641, ip=172.27.29.225, actor_id=4a604a9ddd7c524de6a2b0a701000000, repr=_objective)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/air/_internal/util.py", line 107, in run
self._ret = self._target(*self._args, **self._kwargs)
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 45, in
training_func=lambda: self._trainable_func(self.config),
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 261, in _trainable_func
output = fn()
File "/home/ubuntu/geneformer_env/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 396, in dynamic_modules_import_trainable
dynamic_modules_path = os.path.join(datasets.load.init_dynamic_modules(), "init.py")
AttributeError: module 'datasets.load' has no attribute 'init_dynamic_modules'

Trial _objective_b720fe25 errored after 0 iterations at 2025-08-06 06:46:12. Total running time: 2s
Error file: /tmp/ray/session_2025-08-06_06-46-05_584242_100689/artifacts/2025-08-06_06-46-09/_objective_2025-08-06_06-46-09/driver_artifacts/_objective_b720fe25_1_learning_rate=0.0002,lr_scheduler_type=polynomial,num_train_epochs=1,per_device_train_batch_size=16,seed=72._2025-08-06_06-46-09/error.txt
^C2025-08-06 06:46:15,346 WARNING tune.py:219 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.

I was also experiencing this error. I had datasets version 4.0.0 and tried downgrading to 1.18.3 and then 2.12 to accommodate the import error, but as you predicted this lead to downstream errors with pyarrow that I didn't want to go through the rabbit hole of fixing. I do think I have a workaround for the error though by replacing the import line in integration_utils.py (this is with datasets back to the original 4.0.0 version my install set up).

Replace:
dynamic_modules_path = os.path.join(datasets.load.init_dynamic_modules(), "init.py")

With:
modules_dir = datasets.config.HF_MODULES_CACHE
os.makedirs(modules_dir, exist_ok=True)
dynamic_modules_path = os.path.join(modules_dir, "init.py")
Path(dynamic_modules_path).touch(exist_ok=True)

Everything appears to run smoothly after that. I'm troubleshooting some memory errors but should be entirely unrelated as I've gotten some trials to complete.

Thanks for your comments on this. It appears this error is coming from the transformers package, so it would be best to open an issue in their repository.

ctheodoris changed discussion status to closed

Sign up or log in to comment