File size: 11,199 Bytes
6eca1eb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | {
"cells": [
{
"cell_type": "markdown",
"id": "131de9a7-0245-4e12-9554-c8f91eec2d21",
"metadata": {},
"source": [
"# Hugging Face Setup\n",
"\n",
"Let's quickly make sure HF is set up and that you are able to access downloads from Hugging Face Hub using your Token."
]
},
{
"cell_type": "markdown",
"id": "2e0b59f0-1180-4297-94dc-b26115761264",
"metadata": {},
"source": [
"## Python Libraries Install:\n",
"\n",
"Note that we use various versions of these libraries throughout the course, make sure to watch the video to know which version to use!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9a2f2dca-fefc-4f36-91f3-4dbcc07345a6",
"metadata": {},
"outputs": [],
"source": [
"# !pip install transformers diffusers datasets evaluate accelerate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ac9d74b6-a136-4858-ab8b-1e7791993380",
"metadata": {},
"outputs": [],
"source": [
"from huggingface_hub import notebook_login"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0899dcf9-7504-4097-beb8-c42a519305c3",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "92c0780a54384af9ae5d07a3f5d4766d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svβ¦"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"notebook_login()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7d706a5b-456f-41d1-83a9-98fa397ce90a",
"metadata": {},
"outputs": [],
"source": [
"from huggingface_hub import scan_cache_dir\n",
"\n",
"hf_cache_info = scan_cache_dir()\n",
"# print(hf_cache_info)"
]
},
{
"cell_type": "markdown",
"id": "0d0cc79a-6eb3-44c5-b98b-5deb40acd62b",
"metadata": {},
"source": [
"When you work with Hugging Face's Python libraries, such as the `transformers` library, you'll often download pre-trained models and datasets. These downloaded files are stored locally on your machine to avoid repeated downloads and to ensure quick access in future uses. Let's explore where and how these files are stored.\n",
"\n",
"## Where Are Hugging Face Models Stored?\n",
"\n",
"By default, Hugging Face stores downloaded models in a directory under your home directory. Specifically, it uses a hidden folder named `.cache`. The typical path looks like this:\n",
"\n",
"- On Unix-based systems (Linux, macOS):\n",
" ```\n",
" ~/.cache/huggingface/ \n",
" ```\n",
"\n",
"- On Windows systems:\n",
" ```\n",
" C:\\Users\\<YourUsername>\\.cache\\huggingface\\ \n",
" ```\n",
" \n",
"**NOTE - .cache is hidden by default! You will need to set hidden files viewable to see it!\n",
"\n",
"----\n",
"\n",
"Hidden directories are often used to store configuration files and caches. These directories are typically not shown in default file explorer views. Hereβs how you can view hidden directories on different operating systems:\n",
"\n",
"## Viewing Hidden Directories on Different Operating Systems\n",
"\n",
"### macOS\n",
"\n",
"On macOS, hidden directories and files (those starting with a dot, such as `.cache`) can be made visible in Finder:\n",
"\n",
"1. **Using Finder:**\n",
" - Open Finder.\n",
" - Press `Command + Shift + .` (period). This will toggle the visibility of hidden files and directories.\n",
"\n",
"2. **Using Terminal:**\n",
" - Open Terminal.\n",
" - To list hidden files in a directory, use the following command:\n",
" ```bash\n",
" ls -la\n",
" ```\n",
" - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
"\n",
"### Linux\n",
"\n",
"On Linux, hidden files and directories can be viewed in the file manager or terminal:\n",
"\n",
"1. **Using File Manager (e.g., Nautilus):**\n",
" - Open your file manager.\n",
" - Press `Ctrl + H`. This will toggle the visibility of hidden files and directories.\n",
"\n",
"2. **Using Terminal:**\n",
" - Open Terminal.\n",
" - To list hidden files in a directory, use the following command:\n",
" ```bash\n",
" ls -la\n",
" ```\n",
" - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
"\n",
"### Windows\n",
"\n",
"On Windows, hidden files and directories can be viewed in File Explorer:\n",
"\n",
"1. **Using File Explorer:**\n",
" - Open File Explorer.\n",
" - Click on the `View` tab at the top.\n",
" - Check the box for `Hidden items` in the Show/hide group. This will toggle the visibility of hidden files and directories.\n",
"\n",
"2. **Using Command Prompt:**\n",
" - Open Command Prompt.\n",
" - To list hidden files in a directory, use the following command:\n",
" ```cmd\n",
" dir /a\n",
" ```\n",
" - The `/a` flag lists all files, including hidden ones.\n",
"\n",
"## Summary\n",
"\n",
"Viewing hidden directories on different operating systems is straightforward:\n",
"\n",
"- **macOS:** Press `Command + Shift + .` in Finder or use `ls -la` in Terminal.\n",
"- **Linux:** Press `Ctrl + H` in the file manager or use `ls -la` in Terminal.\n",
"- **Windows:** Check `Hidden items` in File Explorerβs View tab or use `dir /a` in Command Prompt.\n",
"\n",
"These methods allow you to easily access and manage hidden files and directories on your system.\n",
"\n",
"----\n",
"\n",
"**Ok, let's move on, back to Hugging Face topics!**"
]
},
{
"cell_type": "markdown",
"id": "b5346ae4-c22c-4bc6-ab7d-21124699bce8",
"metadata": {},
"source": [
"The Hugging Face Python libraries store downloaded models in a centralized cache directory. This cache system is designed to be shared across various libraries that depend on the Hugging Face Hub. Here is a detailed explanation of where and how these models are stored:\n",
"\n",
"## Cache Directory Structure\n",
"\n",
"The cache directory is typically located in the user's home directory, but it can be customized using the `cache_dir` argument in methods or by setting the `HF_HOME` or `HF_HUB_CACHE` environment variables. The structure of the cache directory is as follows:\n",
"\n",
"```\n",
"<CACHE_DIR>\n",
"ββ <MODELS>\n",
"ββ <DATASETS>\n",
"ββ <SPACES>\n",
"```\n",
"\n",
"Within these main folders, the cache is further organized by repository type, namespace (if applicable), and repository name. For example:\n",
"\n",
"```\n",
"<CACHE_DIR>\n",
"ββ models--julien-c--EsperBERTo-small\n",
"ββ models--lysandrejik--arxiv-nlp\n",
"ββ models--bert-base-cased\n",
"ββ datasets--glue\n",
"ββ datasets--huggingface--DataMeasurementsFiles\n",
"ββ spaces--dalle-mini--dalle-mini\n",
"```\n",
"\n",
"## Detailed Folder Structure\n",
"\n",
"Each repository folder contains subfolders that store different types of files, such as references, blobs, and snapshots. Here is an example of the folder structure for a dataset:\n",
"\n",
"```\n",
"<CACHE_DIR>\n",
"ββ datasets--glue\n",
"β ββ refs\n",
"β ββ blobs\n",
"β ββ snapshots\n",
"```\n",
"\n",
"## Managing the Cache\n",
"\n",
"### Scanning the Cache\n",
"\n",
"To manage and inspect the cache, you can use the `huggingface-cli` tool or the `scan_cache_dir` function from the `huggingface_hub` library. This allows you to see which repositories and revisions are taking up disk space. For example:\n",
"\n",
"```python\n",
"from huggingface_hub import scan_cache_dir\n",
"\n",
"hf_cache_info = scan_cache_dir()\n",
"print(hf_cache_info)\n",
"```\n",
"\n",
"This will return an `HFCacheInfo` object containing details about the cached repositories, their sizes, and any warnings about corrupted caches.\n",
"\n",
"### Example Command\n",
"\n",
"Using the `huggingface-cli` to scan the cache:\n",
"\n",
"```bash\n",
"huggingface-cli scan-cache\n",
"```\n",
"\n",
"This command will output a detailed report of the cache, including repository IDs, types, sizes, and paths.\n",
"\n",
"## Customizing the Cache Directory\n",
"\n",
"You can customize the cache directory by setting the `cache_dir` argument in methods or by using environment variables. For example:\n",
"\n",
"```python\n",
"from huggingface_hub import cached_assets_path\n",
"\n",
"path = cached_assets_path(library_name=\"datasets\", namespace=\"SQuAD\", subfolder=\"download\")\n",
"print(path)\n",
"```\n",
"\n",
"This will return the path to the cached assets for the specified library, namespace, and subfolder.\n",
"\n",
"## Conclusion\n",
"\n",
"The Hugging Face cache system is designed to efficiently store and manage models, datasets, and other resources. By understanding the structure and management tools available, users can effectively control their cache usage and ensure optimal performance.\n",
"\n",
"For more detailed information, you can refer to the Hugging Face documentation on managing the cache system[1][3][5][7].\n",
"\n",
"Citations:\n",
"[1] https://huggingface.co/docs/huggingface_hub/guides/manage-cache\n",
"[2] https://discuss.huggingface.co/t/model-caching-and-locking/44152\n",
"[3] https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache\n",
"[4] https://huggingface.co/docs/hub/en/models\n",
"[5] https://huggingface.co/docs/huggingface_hub/package_reference/cache\n",
"[6] https://huggingface.co/docs/hub/en/models-libraries\n",
"[7] https://huggingface.co/docs/huggingface_hub/en/package_reference/cache\n",
"[8] https://huggingface.co/docs/hub/en/models-adding-libraries\n",
"[9] https://discuss.huggingface.co/t/how-to-save-my-model-to-use-it-later/20568\n",
"[10] https://huggingface.co/docs/transformers/en/main_classes/model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d40ac023-8adb-4206-8954-457f352b4d76",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|