AttributeError: 'GptOssTopKRouter' object has no attribute 'weight'

#6
by TroyDoesAI - opened

(base) troy@BlackSheep:/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652$ conda create -n unsloth_env python=3.12 -y
Channels:

  • defaults
    Platform: linux-64
    Collecting package metadata (repodata.json): done
    Solving environment: done

Package Plan

environment location: /home/troy/anaconda3/envs/unsloth_env

added / updated specs:
- python=3.12

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
python-3.12.11             |       h22baa00_0        34.6 MB
wheel-0.45.1               |  py312h06a4308_0         147 KB
------------------------------------------------------------
                                       Total:        34.7 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
ca-certificates pkgs/main/linux-64::ca-certificates-2025.7.15-h06a4308_0
expat pkgs/main/linux-64::expat-2.7.1-h6a678d5_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
libxcb pkgs/main/linux-64::libxcb-1.17.0-h9b100fa_0
ncurses pkgs/main/linux-64::ncurses-6.5-h7934f7d_0
openssl pkgs/main/linux-64::openssl-3.0.17-h5eee18b_0
pip pkgs/main/noarch::pip-25.1-pyhc872135_2
pthread-stubs pkgs/main/linux-64::pthread-stubs-0.3-h0ce48e5_1
python pkgs/main/linux-64::python-3.12.11-h22baa00_0
readline pkgs/main/linux-64::readline-8.3-hc2a1206_0
setuptools pkgs/main/linux-64::setuptools-78.1.1-py312h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.50.2-hb25bd0a_1
tk pkgs/main/linux-64::tk-8.6.14-h993c535_1
tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0
wheel pkgs/main/linux-64::wheel-0.45.1-py312h06a4308_0
xorg-libx11 pkgs/main/linux-64::xorg-libx11-1.8.12-h9b100fa_1
xorg-libxau pkgs/main/linux-64::xorg-libxau-1.0.12-h9b100fa_0
xorg-libxdmcp pkgs/main/linux-64::xorg-libxdmcp-1.1.5-h9b100fa_0
xorg-xorgproto pkgs/main/linux-64::xorg-xorgproto-2024.1-h5eee18b_1
xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#

To activate this environment, use

$ conda activate unsloth_env

To deactivate an active environment, use

$ conda deactivate

(base) troy@BlackSheep:/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652$ conda activate unsloth_env
(unsloth_env) troy@BlackSheep:/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652$ pip install unsloth
Collecting unsloth
Using cached unsloth-2025.8.4-py3-none-any.whl.metadata (47 kB)
Collecting unsloth_zoo>=2025.8.3 (from unsloth)
Using cached unsloth_zoo-2025.8.3-py3-none-any.whl.metadata (9.4 kB)
Collecting torch>=2.4.0 (from unsloth)
Using cached torch-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
Using cached xformers-0.0.31.post1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Collecting bitsandbytes (from unsloth)
Using cached bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting triton>=3.0.0 (from unsloth)
Using cached triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB)
Collecting packaging (from unsloth)
Using cached packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting tyro (from unsloth)
Using cached tyro-0.9.27-py3-none-any.whl.metadata (11 kB)
Collecting transformers!=4.47.0,!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,>=4.51.3 (from unsloth)
Using cached transformers-4.55.0-py3-none-any.whl.metadata (39 kB)
Collecting datasets<4.0.0,>=3.4.1 (from unsloth)
Using cached datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting sentencepiece>=0.2.0 (from unsloth)
Using cached sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting tqdm (from unsloth)
Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting psutil (from unsloth)
Using cached psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Requirement already satisfied: wheel>=0.42.0 in /home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages (from unsloth) (0.45.1)
Collecting numpy (from unsloth)
Using cached numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
Collecting accelerate>=0.34.1 (from unsloth)
Using cached accelerate-1.10.0-py3-none-any.whl.metadata (19 kB)
Collecting trl!=0.15.0,!=0.19.0,!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,>=0.7.9 (from unsloth)
Using cached trl-0.21.0-py3-none-any.whl.metadata (11 kB)
Collecting peft!=0.11.0,>=0.7.1 (from unsloth)
Using cached peft-0.17.0-py3-none-any.whl.metadata (14 kB)
Collecting protobuf (from unsloth)
Using cached protobuf-6.31.1-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Collecting huggingface_hub>=0.34.0 (from unsloth)
Using cached huggingface_hub-0.34.4-py3-none-any.whl.metadata (14 kB)
Collecting hf_transfer (from unsloth)
Using cached hf_transfer-0.1.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting diffusers (from unsloth)
Using cached diffusers-0.34.0-py3-none-any.whl.metadata (20 kB)
Collecting torchvision (from unsloth)
Using cached torchvision-0.23.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Collecting filelock (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting pyarrow>=15.0.0 (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached pyarrow-21.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting pandas (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (91 kB)
Collecting requests>=2.32.2 (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting xxhash (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached xxhash-3.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached multiprocess-0.70.16-py312-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting pyyaml>=5.1 (from datasets<4.0.0,>=3.4.1->unsloth)
Using cached PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting safetensors>=0.4.3 (from accelerate>=0.34.1->unsloth)
Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting aiohappyeyeballs>=2.5.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.4.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached aiosignal-1.4.0-py3-none-any.whl.metadata (3.7 kB)
Collecting attrs>=17.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached attrs-25.3.0-py3-none-any.whl.metadata (10 kB)
Collecting frozenlist>=1.1.1 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached multidict-6.6.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (5.3 kB)
Collecting propcache>=0.2.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (73 kB)
Collecting idna>=2.0 (from yarl<2.0,>=1.17.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting typing-extensions>=4.2 (from aiosignal>=1.4.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets<4.0.0,>=3.4.1->unsloth)
Using cached typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB)
Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface_hub>=0.34.0->unsloth)
Using cached hf_xet-1.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (703 bytes)
Collecting charset_normalizer<4,>=2 (from requests>=2.32.2->datasets<4.0.0,>=3.4.1->unsloth)
Using cached charset_normalizer-3.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (35 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.32.2->datasets<4.0.0,>=3.4.1->unsloth)
Using cached urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests>=2.32.2->datasets<4.0.0,>=3.4.1->unsloth)
Using cached certifi-2025.8.3-py3-none-any.whl.metadata (2.4 kB)
Requirement already satisfied: setuptools in /home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages (from torch>=2.4.0->unsloth) (78.1.1)
Collecting sympy>=1.13.3 (from torch>=2.4.0->unsloth)
Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch>=2.4.0->unsloth)
Using cached networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch>=2.4.0->unsloth)
Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch>=2.4.0->unsloth)
Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch>=2.4.0->unsloth)
Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufft-cu12==11.3.3.83 (from torch>=2.4.0->unsloth)
Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-curand-cu12==10.3.9.90 (from torch>=2.4.0->unsloth)
Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch>=2.4.0->unsloth)
Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch>=2.4.0->unsloth)
Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cusparselt-cu12==0.7.1 (from torch>=2.4.0->unsloth)
Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting nvidia-nccl-cu12==2.27.3 (from torch>=2.4.0->unsloth)
Using cached nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting nvidia-nvtx-cu12==12.8.90 (from torch>=2.4.0->unsloth)
Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch>=2.4.0->unsloth)
Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufile-cu12==1.13.1.3 (from torch>=2.4.0->unsloth)
Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch>=2.4.0->unsloth)
Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting regex!=2019.12.17 (from transformers!=4.47.0,!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,>=4.51.3->unsloth)
Using cached regex-2025.7.34-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers!=4.47.0,!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,>=4.51.3->unsloth)
Using cached tokenizers-0.21.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting cut_cross_entropy (from unsloth_zoo>=2025.8.3->unsloth)
Using cached cut_cross_entropy-25.1.1-py3-none-any.whl.metadata (9.3 kB)
Collecting pillow (from unsloth_zoo>=2025.8.3->unsloth)
Using cached pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB)
Collecting msgspec (from unsloth_zoo>=2025.8.3->unsloth)
Using cached msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting torch>=2.4.0 (from unsloth)
Using cached torch-2.7.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.6.77 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.6.80 (from torch>=2.4.0->unsloth)
Using cached nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.5.1.17 (from torch>=2.4.0->unsloth)
Using cached nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12 (from nvidia-cudnn-cu12==9.10.2.21->torch>=2.4.0->unsloth)
Using cached nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.3.0.4 (from torch>=2.4.0->unsloth)
Using cached nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.7.77 (from torch>=2.4.0->unsloth)
Using cached nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.7.1.2 (from torch>=2.4.0->unsloth)
Using cached nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12 (from nvidia-cusolver-cu12==11.7.3.90->torch>=2.4.0->unsloth)
Using cached nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparselt-cu12==0.6.3 (from torch>=2.4.0->unsloth)
Using cached nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting nvidia-nccl-cu12==2.26.2 (from torch>=2.4.0->unsloth)
Using cached nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting nvidia-nvtx-cu12==12.6.77 (from torch>=2.4.0->unsloth)
Using cached nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cufft-cu12==11.3.3.83->torch>=2.4.0->unsloth)
Using cached nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufile-cu12==1.11.1.6 (from torch>=2.4.0->unsloth)
Using cached nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting triton>=3.0.0 (from unsloth)
Using cached triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.5 kB)
Collecting importlib_metadata (from diffusers->unsloth)
Using cached importlib_metadata-8.7.0-py3-none-any.whl.metadata (4.8 kB)
Collecting zipp>=3.20 (from importlib_metadata->diffusers->unsloth)
Using cached zipp-3.23.0-py3-none-any.whl.metadata (3.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.4.0->unsloth)
Using cached MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting python-dateutil>=2.8.2 (from pandas->datasets<4.0.0,>=3.4.1->unsloth)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas->datasets<4.0.0,>=3.4.1->unsloth)
Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->datasets<4.0.0,>=3.4.1->unsloth)
Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas->datasets<4.0.0,>=3.4.1->unsloth)
Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
INFO: pip is looking at multiple versions of torchvision to determine which version is compatible with other requirements. This could take a while.
Collecting torchvision (from unsloth)
Using cached torchvision-0.22.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Collecting docstring-parser>=0.15 (from tyro->unsloth)
Using cached docstring_parser-0.17.0-py3-none-any.whl.metadata (3.5 kB)
Collecting rich>=11.1.0 (from tyro->unsloth)
Using cached rich-14.1.0-py3-none-any.whl.metadata (18 kB)
Collecting shtab>=1.5.6 (from tyro->unsloth)
Using cached shtab-1.7.2-py3-none-any.whl.metadata (7.4 kB)
Collecting typeguard>=4.0.0 (from tyro->unsloth)
Using cached typeguard-4.4.4-py3-none-any.whl.metadata (3.3 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=11.1.0->tyro->unsloth)
Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pygments<3.0.0,>=2.13.0 (from rich>=11.1.0->tyro->unsloth)
Using cached pygments-2.19.2-py3-none-any.whl.metadata (2.5 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=11.1.0->tyro->unsloth)
Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Using cached unsloth-2025.8.4-py3-none-any.whl (306 kB)
Using cached datasets-3.6.0-py3-none-any.whl (491 kB)
Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Using cached fsspec-2025.3.0-py3-none-any.whl (193 kB)
Using cached multiprocess-0.70.16-py312-none-any.whl (146 kB)
Using cached accelerate-1.10.0-py3-none-any.whl (374 kB)
Using cached numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
Using cached aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
Using cached multidict-6.6.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (256 kB)
Using cached yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (355 kB)
Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Using cached aiosignal-1.4.0-py3-none-any.whl (7.5 kB)
Using cached attrs-25.3.0-py3-none-any.whl (63 kB)
Using cached frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (241 kB)
Using cached huggingface_hub-0.34.4-py3-none-any.whl (561 kB)
Using cached hf_xet-1.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached packaging-25.0-py3-none-any.whl (66 kB)
Using cached peft-0.17.0-py3-none-any.whl (503 kB)
Using cached propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (224 kB)
Using cached pyarrow-21.0.0-cp312-cp312-manylinux_2_28_x86_64.whl (42.8 MB)
Using cached PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (767 kB)
Using cached requests-2.32.4-py3-none-any.whl (64 kB)
Using cached charset_normalizer-3.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (148 kB)
Using cached urllib3-2.5.0-py3-none-any.whl (129 kB)
Using cached certifi-2025.8.3-py3-none-any.whl (161 kB)
Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
Using cached sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Using cached transformers-4.55.0-py3-none-any.whl (11.3 MB)
Using cached tokenizers-0.21.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Using cached regex-2025.7.34-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (801 kB)
Using cached trl-0.21.0-py3-none-any.whl (511 kB)
Using cached typing_extensions-4.14.1-py3-none-any.whl (43 kB)
Using cached unsloth_zoo-2025.8.3-py3-none-any.whl (176 kB)
Using cached xformers-0.0.31.post1-cp39-abi3-manylinux_2_28_x86_64.whl (117.1 MB)
Using cached torch-2.7.1-cp312-cp312-manylinux_2_28_x86_64.whl (821.0 MB)
Using cached nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (393.1 MB)
Using cached nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (8.9 MB)
Using cached nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (23.7 MB)
Using cached nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (897 kB)
Using cached nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl (571.0 MB)
Using cached nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB)
Using cached nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB)
Using cached nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (56.3 MB)
Using cached nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (158.2 MB)
Using cached nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (216.6 MB)
Using cached nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB)
Using cached nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB)
Using cached nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (19.7 MB)
Using cached nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB)
Using cached triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.7 MB)
Using cached bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl (72.9 MB)
Using cached cut_cross_entropy-25.1.1-py3-none-any.whl (22 kB)
Using cached diffusers-0.34.0-py3-none-any.whl (3.8 MB)
Using cached filelock-3.18.0-py3-none-any.whl (16 kB)
Using cached hf_transfer-0.1.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
Using cached importlib_metadata-8.7.0-py3-none-any.whl (27 kB)
Using cached zipp-3.23.0-py3-none-any.whl (10 kB)
Using cached jinja2-3.1.6-py3-none-any.whl (134 kB)
Using cached MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Using cached msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
Using cached networkx-3.5-py3-none-any.whl (2.0 MB)
Using cached pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.0 MB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Using cached pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
Using cached protobuf-6.31.1-cp39-abi3-manylinux2014_x86_64.whl (321 kB)
Using cached psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (277 kB)
Using cached torchvision-0.22.1-cp312-cp312-manylinux_2_28_x86_64.whl (7.5 MB)
Using cached tyro-0.9.27-py3-none-any.whl (129 kB)
Using cached docstring_parser-0.17.0-py3-none-any.whl (36 kB)
Using cached rich-14.1.0-py3-none-any.whl (243 kB)
Using cached pygments-2.19.2-py3-none-any.whl (1.2 MB)
Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Using cached shtab-1.7.2-py3-none-any.whl (14 kB)
Using cached typeguard-4.4.4-py3-none-any.whl (34 kB)
Using cached xxhash-3.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
Installing collected packages: sentencepiece, pytz, nvidia-cusparselt-cu12, mpmath, zipp, xxhash, urllib3, tzdata, typing-extensions, triton, tqdm, sympy, six, shtab, safetensors, regex, pyyaml, pygments, pyarrow, psutil, protobuf, propcache, pillow, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, multidict, msgspec, mdurl, MarkupSafe, idna, hf-xet, hf_transfer, fsspec, frozenlist, filelock, docstring-parser, dill, charset_normalizer, certifi, attrs, aiohappyeyeballs, yarl, typeguard, requests, python-dateutil, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, multiprocess, markdown-it-py, jinja2, importlib_metadata, aiosignal, rich, pandas, nvidia-cusolver-cu12, huggingface_hub, aiohttp, tyro, torch, tokenizers, diffusers, xformers, transformers, torchvision, datasets, cut_cross_entropy, bitsandbytes, accelerate, trl, peft, unsloth_zoo, unsloth
Successfully installed MarkupSafe-3.0.2 accelerate-1.10.0 aiohappyeyeballs-2.6.1 aiohttp-3.12.15 aiosignal-1.4.0 attrs-25.3.0 bitsandbytes-0.46.1 certifi-2025.8.3 charset_normalizer-3.4.2 cut_cross_entropy-25.1.1 datasets-3.6.0 diffusers-0.34.0 dill-0.3.8 docstring-parser-0.17.0 filelock-3.18.0 frozenlist-1.7.0 fsspec-2025.3.0 hf-xet-1.1.7 hf_transfer-0.1.9 huggingface_hub-0.34.4 idna-3.10 importlib_metadata-8.7.0 jinja2-3.1.6 markdown-it-py-3.0.0 mdurl-0.1.2 mpmath-1.3.0 msgspec-0.19.0 multidict-6.6.3 multiprocess-0.70.16 networkx-3.5 numpy-2.3.2 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 packaging-25.0 pandas-2.3.1 peft-0.17.0 pillow-11.3.0 propcache-0.3.2 protobuf-6.31.1 psutil-7.0.0 pyarrow-21.0.0 pygments-2.19.2 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.2 regex-2025.7.34 requests-2.32.4 rich-14.1.0 safetensors-0.6.2 sentencepiece-0.2.0 shtab-1.7.2 six-1.17.0 sympy-1.14.0 tokenizers-0.21.4 torch-2.7.1 torchvision-0.22.1 tqdm-4.67.1 transformers-4.55.0 triton-3.3.1 trl-0.21.0 typeguard-4.4.4 typing-extensions-4.14.1 tyro-0.9.27 tzdata-2025.2 unsloth-2025.8.4 unsloth_zoo-2025.8.3 urllib3-2.5.0 xformers-0.0.31.post1 xxhash-3.5.0 yarl-1.20.1 zipp-3.23.0
(unsloth_env) troy@BlackSheep:/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652$ python3 gpt_oss_(20b)fine_tuning.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2025.8.4: Fast Gpt_Oss patching. Transformers: 4.55.0.
\ /| NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 23.537 GB. Platform: Linux.
O^O/ _/ \ Torch: 2.7.1+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.3.1
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
"-
__-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Traceback (most recent call last):
File "/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652/gpt_oss
(20b)fine_tuning.py", line 381, in
main()
File "/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652/gpt_oss
(20b)_fine_tuning.py", line 285, in main
model, tokenizer = FastLanguageModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/unsloth/models/loader.py", line 340, in from_pretrained
return FastModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/unsloth/models/loader.py", line 812, in from_pretrained
model, tokenizer = FastBaseModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/unsloth/models/vision.py", line 444, in from_pretrained
model = auto_model.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 316, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5061, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5378, in _load_pretrained_model
model._initialize_missing_keys(checkpoint_keys, ignore_mismatched_sizes, is_quantized)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5950, in _initialize_missing_keys
self.initialize_weights()
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2989, in initialize_weights
self.smart_apply(self._initialize_weights)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2980, in smart_apply
module.smart_apply(module._initialize_weights)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2982, in smart_apply
module.smart_apply(fn)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2982, in smart_apply
module.smart_apply(fn)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2982, in smart_apply
module.smart_apply(fn)
[Previous line repeated 1 more time]
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2983, in smart_apply
fn(self)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2957, in _initialize_weights
self._init_weights(module)
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 419, in init_weights
module.weight.data.normal
(mean=0.0, std=std)
^^^^^^^^^^^^^
File "/home/troy/anaconda3/envs/unsloth_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1940, in getattr
raise AttributeError(
AttributeError: 'GptOssTopKRouter' object has no attribute 'weight'
(unsloth_env) troy@BlackSheep:/media/troy/e6fcee36-1c26-46d6-9776-bd28a9dce652$ python3 gpt_oss_(20b)_fine_tuning.py

Having the same issue. Did you download it manually from this page by any chance?

Having the same issue. Did you download it manually from this page by any chance?

Nope, downloaded from hugginface cli to the hub folder in the .cache

Sign up or log in to comment