File size: 11,199 Bytes
6eca1eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "131de9a7-0245-4e12-9554-c8f91eec2d21",
   "metadata": {},
   "source": [
    "# Hugging Face Setup\n",
    "\n",
    "Let's quickly make sure HF is set up and that you are able to access downloads from Hugging Face Hub using your Token."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e0b59f0-1180-4297-94dc-b26115761264",
   "metadata": {},
   "source": [
    "## Python Libraries Install:\n",
    "\n",
    "Note that we use various versions of these libraries throughout the course, make sure to watch the video to know which version to use!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9a2f2dca-fefc-4f36-91f3-4dbcc07345a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install transformers diffusers datasets evaluate accelerate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ac9d74b6-a136-4858-ab8b-1e7791993380",
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0899dcf9-7504-4097-beb8-c42a519305c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "92c0780a54384af9ae5d07a3f5d4766d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "notebook_login()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7d706a5b-456f-41d1-83a9-98fa397ce90a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import scan_cache_dir\n",
    "\n",
    "hf_cache_info = scan_cache_dir()\n",
    "# print(hf_cache_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d0cc79a-6eb3-44c5-b98b-5deb40acd62b",
   "metadata": {},
   "source": [
    "When you work with Hugging Face's Python libraries, such as the `transformers` library, you'll often download pre-trained models and datasets. These downloaded files are stored locally on your machine to avoid repeated downloads and to ensure quick access in future uses. Let's explore where and how these files are stored.\n",
    "\n",
    "## Where Are Hugging Face Models Stored?\n",
    "\n",
    "By default, Hugging Face stores downloaded models in a directory under your home directory. Specifically, it uses a hidden folder named `.cache`. The typical path looks like this:\n",
    "\n",
    "- On Unix-based systems (Linux, macOS):\n",
    "  ```\n",
    "  ~/.cache/huggingface/ \n",
    "  ```\n",
    "\n",
    "- On Windows systems:\n",
    "  ```\n",
    "  C:\\Users\\<YourUsername>\\.cache\\huggingface\\ \n",
    "  ```\n",
    " \n",
    "**NOTE - .cache is hidden by default! You will need to set hidden files viewable to see it!\n",
    "\n",
    "----\n",
    "\n",
    "Hidden directories are often used to store configuration files and caches. These directories are typically not shown in default file explorer views. Here’s how you can view hidden directories on different operating systems:\n",
    "\n",
    "## Viewing Hidden Directories on Different Operating Systems\n",
    "\n",
    "### macOS\n",
    "\n",
    "On macOS, hidden directories and files (those starting with a dot, such as `.cache`) can be made visible in Finder:\n",
    "\n",
    "1. **Using Finder:**\n",
    "   - Open Finder.\n",
    "   - Press `Command + Shift + .` (period). This will toggle the visibility of hidden files and directories.\n",
    "\n",
    "2. **Using Terminal:**\n",
    "   - Open Terminal.\n",
    "   - To list hidden files in a directory, use the following command:\n",
    "     ```bash\n",
    "     ls -la\n",
    "     ```\n",
    "   - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
    "\n",
    "### Linux\n",
    "\n",
    "On Linux, hidden files and directories can be viewed in the file manager or terminal:\n",
    "\n",
    "1. **Using File Manager (e.g., Nautilus):**\n",
    "   - Open your file manager.\n",
    "   - Press `Ctrl + H`. This will toggle the visibility of hidden files and directories.\n",
    "\n",
    "2. **Using Terminal:**\n",
    "   - Open Terminal.\n",
    "   - To list hidden files in a directory, use the following command:\n",
    "     ```bash\n",
    "     ls -la\n",
    "     ```\n",
    "   - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
    "\n",
    "### Windows\n",
    "\n",
    "On Windows, hidden files and directories can be viewed in File Explorer:\n",
    "\n",
    "1. **Using File Explorer:**\n",
    "   - Open File Explorer.\n",
    "   - Click on the `View` tab at the top.\n",
    "   - Check the box for `Hidden items` in the Show/hide group. This will toggle the visibility of hidden files and directories.\n",
    "\n",
    "2. **Using Command Prompt:**\n",
    "   - Open Command Prompt.\n",
    "   - To list hidden files in a directory, use the following command:\n",
    "     ```cmd\n",
    "     dir /a\n",
    "     ```\n",
    "   - The `/a` flag lists all files, including hidden ones.\n",
    "\n",
    "## Summary\n",
    "\n",
    "Viewing hidden directories on different operating systems is straightforward:\n",
    "\n",
    "- **macOS:** Press `Command + Shift + .` in Finder or use `ls -la` in Terminal.\n",
    "- **Linux:** Press `Ctrl + H` in the file manager or use `ls -la` in Terminal.\n",
    "- **Windows:** Check `Hidden items` in File Explorer’s View tab or use `dir /a` in Command Prompt.\n",
    "\n",
    "These methods allow you to easily access and manage hidden files and directories on your system.\n",
    "\n",
    "----\n",
    "\n",
    "**Ok, let's move on, back to Hugging Face topics!**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5346ae4-c22c-4bc6-ab7d-21124699bce8",
   "metadata": {},
   "source": [
    "The Hugging Face Python libraries store downloaded models in a centralized cache directory. This cache system is designed to be shared across various libraries that depend on the Hugging Face Hub. Here is a detailed explanation of where and how these models are stored:\n",
    "\n",
    "## Cache Directory Structure\n",
    "\n",
    "The cache directory is typically located in the user's home directory, but it can be customized using the `cache_dir` argument in methods or by setting the `HF_HOME` or `HF_HUB_CACHE` environment variables. The structure of the cache directory is as follows:\n",
    "\n",
    "```\n",
    "<CACHE_DIR>\n",
    "β”œβ”€ <MODELS>\n",
    "β”œβ”€ <DATASETS>\n",
    "β”œβ”€ <SPACES>\n",
    "```\n",
    "\n",
    "Within these main folders, the cache is further organized by repository type, namespace (if applicable), and repository name. For example:\n",
    "\n",
    "```\n",
    "<CACHE_DIR>\n",
    "β”œβ”€ models--julien-c--EsperBERTo-small\n",
    "β”œβ”€ models--lysandrejik--arxiv-nlp\n",
    "β”œβ”€ models--bert-base-cased\n",
    "β”œβ”€ datasets--glue\n",
    "β”œβ”€ datasets--huggingface--DataMeasurementsFiles\n",
    "β”œβ”€ spaces--dalle-mini--dalle-mini\n",
    "```\n",
    "\n",
    "## Detailed Folder Structure\n",
    "\n",
    "Each repository folder contains subfolders that store different types of files, such as references, blobs, and snapshots. Here is an example of the folder structure for a dataset:\n",
    "\n",
    "```\n",
    "<CACHE_DIR>\n",
    "β”œβ”€ datasets--glue\n",
    "β”‚   β”œβ”€ refs\n",
    "β”‚   β”œβ”€ blobs\n",
    "β”‚   β”œβ”€ snapshots\n",
    "```\n",
    "\n",
    "## Managing the Cache\n",
    "\n",
    "### Scanning the Cache\n",
    "\n",
    "To manage and inspect the cache, you can use the `huggingface-cli` tool or the `scan_cache_dir` function from the `huggingface_hub` library. This allows you to see which repositories and revisions are taking up disk space. For example:\n",
    "\n",
    "```python\n",
    "from huggingface_hub import scan_cache_dir\n",
    "\n",
    "hf_cache_info = scan_cache_dir()\n",
    "print(hf_cache_info)\n",
    "```\n",
    "\n",
    "This will return an `HFCacheInfo` object containing details about the cached repositories, their sizes, and any warnings about corrupted caches.\n",
    "\n",
    "### Example Command\n",
    "\n",
    "Using the `huggingface-cli` to scan the cache:\n",
    "\n",
    "```bash\n",
    "huggingface-cli scan-cache\n",
    "```\n",
    "\n",
    "This command will output a detailed report of the cache, including repository IDs, types, sizes, and paths.\n",
    "\n",
    "## Customizing the Cache Directory\n",
    "\n",
    "You can customize the cache directory by setting the `cache_dir` argument in methods or by using environment variables. For example:\n",
    "\n",
    "```python\n",
    "from huggingface_hub import cached_assets_path\n",
    "\n",
    "path = cached_assets_path(library_name=\"datasets\", namespace=\"SQuAD\", subfolder=\"download\")\n",
    "print(path)\n",
    "```\n",
    "\n",
    "This will return the path to the cached assets for the specified library, namespace, and subfolder.\n",
    "\n",
    "## Conclusion\n",
    "\n",
    "The Hugging Face cache system is designed to efficiently store and manage models, datasets, and other resources. By understanding the structure and management tools available, users can effectively control their cache usage and ensure optimal performance.\n",
    "\n",
    "For more detailed information, you can refer to the Hugging Face documentation on managing the cache system[1][3][5][7].\n",
    "\n",
    "Citations:\n",
    "[1] https://huggingface.co/docs/huggingface_hub/guides/manage-cache\n",
    "[2] https://discuss.huggingface.co/t/model-caching-and-locking/44152\n",
    "[3] https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache\n",
    "[4] https://huggingface.co/docs/hub/en/models\n",
    "[5] https://huggingface.co/docs/huggingface_hub/package_reference/cache\n",
    "[6] https://huggingface.co/docs/hub/en/models-libraries\n",
    "[7] https://huggingface.co/docs/huggingface_hub/en/package_reference/cache\n",
    "[8] https://huggingface.co/docs/hub/en/models-adding-libraries\n",
    "[9] https://discuss.huggingface.co/t/how-to-save-my-model-to-use-it-later/20568\n",
    "[10] https://huggingface.co/docs/transformers/en/main_classes/model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d40ac023-8adb-4206-8954-457f352b4d76",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}