Spaces:
Running
Running
Commit Β·
4145e94
1
Parent(s): c3413c4
feat: Expand Hugging Face module to comprehensive deep-dive
Browse files- 9 in-depth concept sections: Transformers, Pipelines, Tokenizers,
Datasets, Trainer API, Accelerate, Model Hub, Libraries, Spaces
- 7 comprehensive code examples: Pipelines, Tokenizers, Datasets,
Model Loading (basic/fp16/4bit/flash), Trainer, Gradio, Hub API
- 6 expert interview questions covering from_pretrained vs pipeline,
device_map sharding, Arrow internals, chat templates, gated models,
safetensors security
- GenAI-AgenticAI/app.js +257 -42
GenAI-AgenticAI/app.js
CHANGED
|
@@ -169,81 +169,296 @@ attn_layer0 = outputs.attentions[<span class="number">0</span>] <span class="co
|
|
| 169 |
'huggingface': {
|
| 170 |
concepts: `
|
| 171 |
<div class="section">
|
| 172 |
-
<h2>π€ Hugging Face Ecosystem</h2>
|
| 173 |
<div class="info-box">
|
| 174 |
<div class="box-title">β‘ The GitHub of AI</div>
|
| 175 |
-
<div class="box-content">Hugging Face (HF) is the central hub for the ML community. With
|
| 176 |
</div>
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
<table>
|
| 179 |
-
<tr><th>
|
| 180 |
-
<tr><td><
|
| 181 |
-
<tr><td><
|
| 182 |
-
<tr><td><
|
| 183 |
-
<tr><td><code>peft</code></td><td>Parameter-efficient fine-tuning</td><td>LoraConfig, get_peft_model</td></tr>
|
| 184 |
-
<tr><td><code>accelerate</code></td><td>Distributed training / mixed precision</td><td>Accelerator, prepare()</td></tr>
|
| 185 |
-
<tr><td><code>huggingface_hub</code></td><td>Interact with Model Hub</td><td>hf_hub_download, push_to_hub</td></tr>
|
| 186 |
</table>
|
| 187 |
-
<
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
</div>`,
|
| 194 |
code: `
|
| 195 |
<div class="section">
|
| 196 |
-
<h2>π» Hugging Face Code Examples</h2>
|
| 197 |
-
|
|
|
|
| 198 |
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> pipeline
|
| 199 |
|
| 200 |
-
<span class="comment"># Text
|
| 201 |
gen = pipeline(<span class="string">"text-generation"</span>, model=<span class="string">"meta-llama/Llama-3.2-1B-Instruct"</span>)
|
| 202 |
result = gen(<span class="string">"Explain RAG in one paragraph:"</span>, max_new_tokens=<span class="number">200</span>)
|
| 203 |
<span class="function">print</span>(result[<span class="number">0</span>][<span class="string">"generated_text"</span>])
|
| 204 |
|
| 205 |
-
<span class="comment"># Sentiment
|
| 206 |
sa = pipeline(<span class="string">"sentiment-analysis"</span>)
|
| 207 |
-
<span class="function">print</span>(sa(<span class="string">"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
|
| 209 |
-
<
|
| 210 |
-
summ = pipeline(<span class="string">"summarization"</span>, model=<span class="string">"facebook/bart-large-cnn"</span>)
|
| 211 |
-
<span class="function">print</span>(summ(long_article, max_length=<span class="number">130</span>))</div>
|
| 212 |
-
<h3>Loading Models with BitsAndBytes Quantization</h3>
|
| 213 |
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 214 |
<span class="keyword">import</span> torch
|
| 215 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
bnb_config = BitsAndBytesConfig(
|
| 217 |
load_in_4bit=<span class="keyword">True</span>,
|
| 218 |
-
bnb_4bit_quant_type=<span class="string">"nf4"</span>,
|
| 219 |
bnb_4bit_compute_dtype=torch.bfloat16,
|
| 220 |
-
bnb_4bit_use_double_quant=<span class="keyword">True</span>
|
| 221 |
)
|
| 222 |
-
|
| 223 |
model = AutoModelForCausalLM.from_pretrained(
|
| 224 |
<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>,
|
| 225 |
quantization_config=bnb_config,
|
| 226 |
device_map=<span class="string">"auto"</span>
|
| 227 |
)
|
| 228 |
-
tokenizer = AutoTokenizer.from_pretrained(<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>)</div>
|
| 229 |
-
<h3>Push Model to Hub</h3>
|
| 230 |
-
<div class="code-block"><span class="keyword">from</span> huggingface_hub <span class="keyword">import</span> HfApi
|
| 231 |
-
<span class="keyword">from</span> transformers <span class="keyword">import</span> AutoModelForCausalLM
|
| 232 |
|
| 233 |
-
<span class="comment">#
|
| 234 |
-
model
|
| 235 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
|
| 237 |
-
<span class="comment">#
|
| 238 |
-
|
| 239 |
</div>`,
|
| 240 |
interview: `
|
| 241 |
<div class="section">
|
| 242 |
-
<h2>π― Hugging Face Interview Questions</h2>
|
| 243 |
-
<div class="interview-box"><strong>Q1: What's the difference between <code>from_pretrained</code> and <code>pipeline</code>?</strong><p><strong>Answer:</strong> <code>pipeline()</code> is a high-level convenience wrapper
|
| 244 |
-
<div class="interview-box"><strong>Q2: What is <code>device_map="auto"</code>?</strong><p><strong>Answer:</strong> It uses the <code>accelerate</code> library to automatically
|
| 245 |
-
<div class="interview-box"><strong>Q3:
|
| 246 |
-
<div class="interview-box"><strong>Q4:
|
|
|
|
|
|
|
| 247 |
</div>`
|
| 248 |
},
|
| 249 |
'finetuning': {
|
|
|
|
| 169 |
'huggingface': {
|
| 170 |
concepts: `
|
| 171 |
<div class="section">
|
| 172 |
+
<h2>π€ Hugging Face Deep Dive β The Complete Ecosystem</h2>
|
| 173 |
<div class="info-box">
|
| 174 |
<div class="box-title">β‘ The GitHub of AI</div>
|
| 175 |
+
<div class="box-content">Hugging Face (HF) is the central hub for the ML community. With <strong>700,000+ models</strong>, <strong>150,000+ datasets</strong>, and 15+ libraries, it's the standard toolchain for modern AI β from experimentation to production deployment. Understanding HF deeply is essential for any GenAI practitioner.</div>
|
| 176 |
</div>
|
| 177 |
+
|
| 178 |
+
<h3>1. Transformers Library β The Core Engine</h3>
|
| 179 |
+
<p>The <code>transformers</code> library provides a unified API to load, run, and fine-tune any model architecture. It wraps 200+ architectures (GPT, LLaMA, Mistral, Gemma, T5, BERT, ViT, Whisper, etc.) behind <strong>Auto Classes</strong> that detect architecture automatically from <code>config.json</code>.</p>
|
| 180 |
+
<table>
|
| 181 |
+
<tr><th>AutoClass</th><th>Use Case</th><th>Example Models</th></tr>
|
| 182 |
+
<tr><td>AutoModelForCausalLM</td><td>Text generation (decoder-only)</td><td>LLaMA, GPT-2, Mistral, Gemma</td></tr>
|
| 183 |
+
<tr><td>AutoModelForSeq2SeqLM</td><td>Translation, summarization</td><td>T5, BART, mT5, Flan-T5</td></tr>
|
| 184 |
+
<tr><td>AutoModelForSequenceClassification</td><td>Text classification, sentiment</td><td>BERT, RoBERTa, DeBERTa</td></tr>
|
| 185 |
+
<tr><td>AutoModelForTokenClassification</td><td>NER, POS tagging</td><td>BERT-NER, SpanBERT</td></tr>
|
| 186 |
+
<tr><td>AutoModelForQuestionAnswering</td><td>Extractive QA</td><td>BERT-QA, RoBERTa-QA</td></tr>
|
| 187 |
+
<tr><td>AutoModel (base)</td><td>Embeddings, custom heads</td><td>Any backbone</td></tr>
|
| 188 |
+
</table>
|
| 189 |
+
<div class="callout tip">
|
| 190 |
+
<div class="callout-title">π‘ Key from_pretrained() Arguments</div>
|
| 191 |
+
<p><code>torch_dtype=torch.bfloat16</code> β half precision, saves 50% memory<br>
|
| 192 |
+
<code>device_map="auto"</code> β auto-shard across GPUs/CPU/disk<br>
|
| 193 |
+
<code>load_in_4bit=True</code> β 4-bit quantization via BitsAndBytes<br>
|
| 194 |
+
<code>attn_implementation="flash_attention_2"</code> β use FlashAttention for 2-4x faster inference<br>
|
| 195 |
+
<code>trust_remote_code=True</code> β needed for custom architectures with code on the Hub</p>
|
| 196 |
+
</div>
|
| 197 |
+
|
| 198 |
+
<h3>2. Pipelines β 20+ Tasks in One Line</h3>
|
| 199 |
+
<p>The <code>pipeline()</code> function wraps tokenization + model + post-processing into a single call. Under the hood: tokenize β model forward pass β decode/format output. Supports these key tasks:</p>
|
| 200 |
+
<table>
|
| 201 |
+
<tr><th>Task</th><th>Pipeline Name</th><th>Default Model</th></tr>
|
| 202 |
+
<tr><td>Text Generation</td><td><code>"text-generation"</code></td><td>gpt2</td></tr>
|
| 203 |
+
<tr><td>Sentiment Analysis</td><td><code>"sentiment-analysis"</code></td><td>distilbert-sst2</td></tr>
|
| 204 |
+
<tr><td>Named Entity Recognition</td><td><code>"ner"</code></td><td>dbmdz/bert-large-NER</td></tr>
|
| 205 |
+
<tr><td>Summarization</td><td><code>"summarization"</code></td><td>sshleifer/distilbart-cnn</td></tr>
|
| 206 |
+
<tr><td>Translation</td><td><code>"translation_en_to_fr"</code></td><td>Helsinki-NLP/opus-mt</td></tr>
|
| 207 |
+
<tr><td>Zero-Shot Classification</td><td><code>"zero-shot-classification"</code></td><td>facebook/bart-large-mnli</td></tr>
|
| 208 |
+
<tr><td>Feature Extraction (Embeddings)</td><td><code>"feature-extraction"</code></td><td>sentence-transformers</td></tr>
|
| 209 |
+
<tr><td>Image Classification</td><td><code>"image-classification"</code></td><td>google/vit-base</td></tr>
|
| 210 |
+
<tr><td>Speech Recognition</td><td><code>"automatic-speech-recognition"</code></td><td>openai/whisper-base</td></tr>
|
| 211 |
+
<tr><td>Text-to-Image</td><td><code>"text-to-image"</code></td><td>stabilityai/sdxl</td></tr>
|
| 212 |
+
</table>
|
| 213 |
+
|
| 214 |
+
<h3>3. Tokenizers Library β Rust-Powered Speed</h3>
|
| 215 |
+
<p>The <code>tokenizers</code> library is written in Rust with Python bindings. It's 10-100x faster than pure-Python tokenizers. Key tokenization algorithms used by modern LLMs:</p>
|
| 216 |
<table>
|
| 217 |
+
<tr><th>Algorithm</th><th>Used By</th><th>How It Works</th></tr>
|
| 218 |
+
<tr><td>BPE (Byte-Pair Encoding)</td><td>GPT-2, GPT-4, LLaMA</td><td>Repeatedly merges most frequent byte pairs. "unbelievable" β ["un", "believ", "able"]</td></tr>
|
| 219 |
+
<tr><td>SentencePiece (Unigram)</td><td>T5, ALBERT, XLNet</td><td>Statistical model that finds optimal subword segmentation probabilistically</td></tr>
|
| 220 |
+
<tr><td>WordPiece</td><td>BERT, DistilBERT</td><td>Greedy algorithm; splits by maximizing likelihood. Uses "##" prefix for sub-tokens</td></tr>
|
|
|
|
|
|
|
|
|
|
| 221 |
</table>
|
| 222 |
+
<div class="callout warning">
|
| 223 |
+
<div class="callout-title">β οΈ Tokenizer Gotchas</div>
|
| 224 |
+
<p>β’ Numbers tokenize unpredictably: "1234" might be 1-4 tokens depending on the model<br>
|
| 225 |
+
β’ Whitespace matters: " hello" and "hello" produce different tokens in GPT<br>
|
| 226 |
+
β’ Non-English languages use more tokens per word (higher cost per concept)<br>
|
| 227 |
+
β’ Always use the model's own tokenizer β never mix tokenizers between models</p>
|
| 228 |
+
</div>
|
| 229 |
+
|
| 230 |
+
<h3>4. Datasets Library β Apache Arrow Under the Hood</h3>
|
| 231 |
+
<p><code>datasets</code> uses <strong>Apache Arrow</strong> for columnar, memory-mapped storage. A 100GB dataset can be iterated without loading into RAM. Key features:</p>
|
| 232 |
+
<p><strong>Memory Mapping:</strong> Data stays on disk; only accessed rows are loaded into memory. <strong>Streaming:</strong> <code>load_dataset(..., streaming=True)</code> returns an iterable β process terabytes with constant memory. <strong>Map/Filter:</strong> Apply transformations with automatic caching and multiprocessing. <strong>Hub Integration:</strong> 150,000+ datasets available via <code>load_dataset("dataset_name")</code>.</p>
|
| 233 |
+
|
| 234 |
+
<h3>5. Trainer API β High-Level Training Loop</h3>
|
| 235 |
+
<p>The <code>Trainer</code> class handles: training loop, evaluation, checkpointing, logging (to TensorBoard/W&B), mixed precision, gradient accumulation, distributed training, and early stopping. You just provide model + dataset + TrainingArguments. For instruction-tuning LLMs, use <strong>TRL's SFTTrainer</strong> (built on top of Trainer) which handles chat templates and packing automatically.</p>
|
| 236 |
+
|
| 237 |
+
<h3>6. Accelerate β Distributed Training Made Easy</h3>
|
| 238 |
+
<p><code>accelerate</code> abstracts away multi-GPU, TPU, and mixed-precision complexity. Write your training loop once; run on 1 GPU or 64 GPUs with zero code changes. Key feature: <code>Accelerator</code> class wraps your model, optimizer, and dataloader. It handles data sharding, gradient synchronization, and device placement automatically.</p>
|
| 239 |
+
|
| 240 |
+
<h3>7. Model Hub β Everything Is a Git Repo</h3>
|
| 241 |
+
<p>Every model on HF Hub is a <strong>Git LFS repo</strong> containing: <code>config.json</code> (architecture), <code>model.safetensors</code> (weights), <code>tokenizer.json</code>, and a <code>README.md</code> (model card). You can push your own models with <code>model.push_to_hub()</code>. The Hub supports: model versioning (Git branches/tags), automatic model cards, gated access (license agreements), and API inference endpoints.</p>
|
| 242 |
+
|
| 243 |
+
<h3>8. Additional HF Libraries</h3>
|
| 244 |
+
<table>
|
| 245 |
+
<tr><th>Library</th><th>Purpose</th><th>Key Feature</th></tr>
|
| 246 |
+
<tr><td><code>peft</code></td><td>Parameter-efficient fine-tuning</td><td>LoRA, QLoRA, Adapters, Prompt Tuning</td></tr>
|
| 247 |
+
<tr><td><code>trl</code></td><td>RLHF and alignment training</td><td>SFTTrainer, DPOTrainer, PPOTrainer, RewardTrainer</td></tr>
|
| 248 |
+
<tr><td><code>diffusers</code></td><td>Image/video generation</td><td>Stable Diffusion, SDXL, ControlNet, IP-Adapter</td></tr>
|
| 249 |
+
<tr><td><code>evaluate</code></td><td>Metrics computation</td><td>BLEU, ROUGE, accuracy, perplexity, and 100+ metrics</td></tr>
|
| 250 |
+
<tr><td><code>gradio</code></td><td>Build ML demos</td><td>Web UI for any model in 5 lines of code</td></tr>
|
| 251 |
+
<tr><td><code>smolagents</code></td><td>Lightweight AI agents</td><td>Code-based tool calling, HF model integration</td></tr>
|
| 252 |
+
<tr><td><code>safetensors</code></td><td>Safe model format</td><td>Fast, safe, and efficient tensor serialization (replaces pickle)</td></tr>
|
| 253 |
+
<tr><td><code>huggingface_hub</code></td><td>Hub API client</td><td>Download files, push models, create repos, manage spaces</td></tr>
|
| 254 |
+
</table>
|
| 255 |
+
|
| 256 |
+
<h3>9. Spaces β Deploy ML Apps Free</h3>
|
| 257 |
+
<p>HF Spaces lets you deploy <strong>Gradio</strong> or <strong>Streamlit</strong> apps on managed infrastructure. Free CPU tier for demos; upgrade to T4 ($0.60/hr) or A100 ($3.09/hr) for GPU workloads. Spaces support Docker, static HTML, and custom environments. They auto-build from a Git repo with a simple <code>requirements.txt</code>. Ideal for: model demos, portfolio projects, internal tools, and quick prototypes.</p>
|
| 258 |
</div>`,
|
| 259 |
code: `
|
| 260 |
<div class="section">
|
| 261 |
+
<h2>π» Hugging Face β Comprehensive Code Examples</h2>
|
| 262 |
+
|
| 263 |
+
<h3>1. Pipelines β Every Task</h3>
|
| 264 |
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> pipeline
|
| 265 |
|
| 266 |
+
<span class="comment"># βββ Text Generation βββ</span>
|
| 267 |
gen = pipeline(<span class="string">"text-generation"</span>, model=<span class="string">"meta-llama/Llama-3.2-1B-Instruct"</span>)
|
| 268 |
result = gen(<span class="string">"Explain RAG in one paragraph:"</span>, max_new_tokens=<span class="number">200</span>)
|
| 269 |
<span class="function">print</span>(result[<span class="number">0</span>][<span class="string">"generated_text"</span>])
|
| 270 |
|
| 271 |
+
<span class="comment"># βββ Sentiment Analysis βββ</span>
|
| 272 |
sa = pipeline(<span class="string">"sentiment-analysis"</span>)
|
| 273 |
+
<span class="function">print</span>(sa(<span class="string">"Hugging Face is amazing!"</span>))
|
| 274 |
+
<span class="comment"># [{'label': 'POSITIVE', 'score': 0.9998}]</span>
|
| 275 |
+
|
| 276 |
+
<span class="comment"># βββ Named Entity Recognition βββ</span>
|
| 277 |
+
ner = pipeline(<span class="string">"ner"</span>, grouped_entities=<span class="keyword">True</span>)
|
| 278 |
+
<span class="function">print</span>(ner(<span class="string">"Elon Musk founded SpaceX in California"</span>))
|
| 279 |
+
<span class="comment"># [{'entity_group': 'PER', 'word': 'Elon Musk'}, ...]</span>
|
| 280 |
+
|
| 281 |
+
<span class="comment"># βββ Zero-Shot Classification (no training needed!) βββ</span>
|
| 282 |
+
zsc = pipeline(<span class="string">"zero-shot-classification"</span>)
|
| 283 |
+
result = zsc(<span class="string">"I need to fix a bug in my Python code"</span>,
|
| 284 |
+
candidate_labels=[<span class="string">"programming"</span>, <span class="string">"cooking"</span>, <span class="string">"sports"</span>])
|
| 285 |
+
<span class="function">print</span>(result[<span class="string">"labels"</span>][<span class="number">0</span>]) <span class="comment"># "programming"</span>
|
| 286 |
+
|
| 287 |
+
<span class="comment"># βββ Speech Recognition (Whisper) βββ</span>
|
| 288 |
+
asr = pipeline(<span class="string">"automatic-speech-recognition"</span>, model=<span class="string">"openai/whisper-large-v3"</span>)
|
| 289 |
+
<span class="function">print</span>(asr(<span class="string">"audio.mp3"</span>)[<span class="string">"text"</span>])</div>
|
| 290 |
+
|
| 291 |
+
<h3>2. Tokenizers Deep Dive</h3>
|
| 292 |
+
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> AutoTokenizer
|
| 293 |
+
|
| 294 |
+
tokenizer = AutoTokenizer.from_pretrained(<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>)
|
| 295 |
+
|
| 296 |
+
<span class="comment"># Basic tokenization</span>
|
| 297 |
+
text = <span class="string">"Hugging Face transformers are powerful!"</span>
|
| 298 |
+
tokens = tokenizer.tokenize(text)
|
| 299 |
+
ids = tokenizer.encode(text)
|
| 300 |
+
<span class="function">print</span>(<span class="string">f"Tokens: {tokens}"</span>)
|
| 301 |
+
<span class="function">print</span>(<span class="string">f"IDs: {ids}"</span>)
|
| 302 |
+
<span class="function">print</span>(<span class="string">f"Decoded: {tokenizer.decode(ids)}"</span>)
|
| 303 |
+
|
| 304 |
+
<span class="comment"># Chat template (critical for instruction models)</span>
|
| 305 |
+
messages = [
|
| 306 |
+
{<span class="string">"role"</span>: <span class="string">"system"</span>, <span class="string">"content"</span>: <span class="string">"You are a helpful assistant."</span>},
|
| 307 |
+
{<span class="string">"role"</span>: <span class="string">"user"</span>, <span class="string">"content"</span>: <span class="string">"What is LoRA?"</span>}
|
| 308 |
+
]
|
| 309 |
+
formatted = tokenizer.apply_chat_template(messages, tokenize=<span class="keyword">False</span>)
|
| 310 |
+
<span class="function">print</span>(formatted) <span class="comment"># Proper <|start_header|> format for Llama</span>
|
| 311 |
+
|
| 312 |
+
<span class="comment"># Batch tokenization with padding</span>
|
| 313 |
+
batch = tokenizer(
|
| 314 |
+
[<span class="string">"short"</span>, <span class="string">"a much longer sentence here"</span>],
|
| 315 |
+
padding=<span class="keyword">True</span>,
|
| 316 |
+
truncation=<span class="keyword">True</span>,
|
| 317 |
+
max_length=<span class="number">512</span>,
|
| 318 |
+
return_tensors=<span class="string">"pt"</span> <span class="comment"># Returns PyTorch tensors</span>
|
| 319 |
+
)
|
| 320 |
+
<span class="function">print</span>(batch.keys()) <span class="comment"># input_ids, attention_mask</span></div>
|
| 321 |
+
|
| 322 |
+
<h3>3. Datasets Library β Load, Process, Stream</h3>
|
| 323 |
+
<div class="code-block"><span class="keyword">from</span> datasets <span class="keyword">import</span> load_dataset, Dataset
|
| 324 |
+
|
| 325 |
+
<span class="comment"># Load from Hub</span>
|
| 326 |
+
ds = load_dataset(<span class="string">"imdb"</span>)
|
| 327 |
+
<span class="function">print</span>(ds) <span class="comment"># DatasetDict with 'train' and 'test' splits</span>
|
| 328 |
+
<span class="function">print</span>(ds[<span class="string">"train"</span>][<span class="number">0</span>]) <span class="comment"># First example</span>
|
| 329 |
+
|
| 330 |
+
<span class="comment"># Streaming (constant memory for huge datasets)</span>
|
| 331 |
+
stream = load_dataset(<span class="string">"allenai/c4"</span>, split=<span class="string">"train"</span>, streaming=<span class="keyword">True</span>)
|
| 332 |
+
<span class="keyword">for</span> i, example <span class="keyword">in</span> enumerate(stream):
|
| 333 |
+
<span class="keyword">if</span> i >= <span class="number">5</span>: <span class="keyword">break</span>
|
| 334 |
+
<span class="function">print</span>(example[<span class="string">"text"</span>][:<span class="number">100</span>])
|
| 335 |
+
|
| 336 |
+
<span class="comment"># Map with parallel processing</span>
|
| 337 |
+
<span class="keyword">def</span> <span class="function">tokenize_fn</span>(examples):
|
| 338 |
+
<span class="keyword">return</span> tokenizer(examples[<span class="string">"text"</span>], truncation=<span class="keyword">True</span>, max_length=<span class="number">512</span>)
|
| 339 |
+
|
| 340 |
+
tokenized = ds[<span class="string">"train"</span>].map(tokenize_fn, batched=<span class="keyword">True</span>, num_proc=<span class="number">4</span>)
|
| 341 |
+
|
| 342 |
+
<span class="comment"># Create custom dataset from dict/pandas</span>
|
| 343 |
+
my_data = Dataset.from_dict({
|
| 344 |
+
<span class="string">"text"</span>: [<span class="string">"Hello world"</span>, <span class="string">"AI is great"</span>],
|
| 345 |
+
<span class="string">"label"</span>: [<span class="number">1</span>, <span class="number">0</span>]
|
| 346 |
+
})
|
| 347 |
+
|
| 348 |
+
<span class="comment"># Push your dataset to Hub</span>
|
| 349 |
+
my_data.push_to_hub(<span class="string">"your-username/my-dataset"</span>)</div>
|
| 350 |
|
| 351 |
+
<h3>4. Model Loading β From Basic to Production</h3>
|
|
|
|
|
|
|
|
|
|
| 352 |
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 353 |
<span class="keyword">import</span> torch
|
| 354 |
|
| 355 |
+
<span class="comment"># βββ Basic Loading (full precision) βββ</span>
|
| 356 |
+
model = AutoModelForCausalLM.from_pretrained(<span class="string">"gpt2"</span>)
|
| 357 |
+
|
| 358 |
+
<span class="comment"># βββ Half Precision (saves 50% VRAM) βββ</span>
|
| 359 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 360 |
+
<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>,
|
| 361 |
+
torch_dtype=torch.bfloat16,
|
| 362 |
+
device_map=<span class="string">"auto"</span>
|
| 363 |
+
)
|
| 364 |
+
|
| 365 |
+
<span class="comment"># βββ 4-bit Quantization (QLoRA-ready) βββ</span>
|
| 366 |
bnb_config = BitsAndBytesConfig(
|
| 367 |
load_in_4bit=<span class="keyword">True</span>,
|
| 368 |
+
bnb_4bit_quant_type=<span class="string">"nf4"</span>, <span class="comment"># NormalFloat4 β better than uniform int4</span>
|
| 369 |
bnb_4bit_compute_dtype=torch.bfloat16,
|
| 370 |
+
bnb_4bit_use_double_quant=<span class="keyword">True</span> <span class="comment"># Quantize the quantization constants too</span>
|
| 371 |
)
|
|
|
|
| 372 |
model = AutoModelForCausalLM.from_pretrained(
|
| 373 |
<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>,
|
| 374 |
quantization_config=bnb_config,
|
| 375 |
device_map=<span class="string">"auto"</span>
|
| 376 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 377 |
|
| 378 |
+
<span class="comment"># βββ Flash Attention 2 (2-4x faster) βββ</span>
|
| 379 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 380 |
+
<span class="string">"meta-llama/Llama-3.1-8B-Instruct"</span>,
|
| 381 |
+
torch_dtype=torch.bfloat16,
|
| 382 |
+
attn_implementation=<span class="string">"flash_attention_2"</span>,
|
| 383 |
+
device_map=<span class="string">"auto"</span>
|
| 384 |
+
)</div>
|
| 385 |
+
|
| 386 |
+
<h3>5. Trainer API β Full Training Loop</h3>
|
| 387 |
+
<div class="code-block"><span class="keyword">from</span> transformers <span class="keyword">import</span> AutoModelForSequenceClassification, Trainer, TrainingArguments
|
| 388 |
+
<span class="keyword">from</span> datasets <span class="keyword">import</span> load_dataset
|
| 389 |
+
|
| 390 |
+
<span class="comment"># Load model and dataset</span>
|
| 391 |
+
model = AutoModelForSequenceClassification.from_pretrained(<span class="string">"bert-base-uncased"</span>, num_labels=<span class="number">2</span>)
|
| 392 |
+
ds = load_dataset(<span class="string">"imdb"</span>)
|
| 393 |
+
tokenized = ds.map(<span class="keyword">lambda</span> x: tokenizer(x[<span class="string">"text"</span>], truncation=<span class="keyword">True</span>, max_length=<span class="number">512</span>), batched=<span class="keyword">True</span>)
|
| 394 |
+
|
| 395 |
+
<span class="comment"># Configure training</span>
|
| 396 |
+
args = TrainingArguments(
|
| 397 |
+
output_dir=<span class="string">"./results"</span>,
|
| 398 |
+
num_train_epochs=<span class="number">3</span>,
|
| 399 |
+
per_device_train_batch_size=<span class="number">16</span>,
|
| 400 |
+
per_device_eval_batch_size=<span class="number">64</span>,
|
| 401 |
+
learning_rate=<span class="number">2e-5</span>,
|
| 402 |
+
weight_decay=<span class="number">0.01</span>,
|
| 403 |
+
eval_strategy=<span class="string">"epoch"</span>,
|
| 404 |
+
save_strategy=<span class="string">"epoch"</span>,
|
| 405 |
+
load_best_model_at_end=<span class="keyword">True</span>,
|
| 406 |
+
fp16=<span class="keyword">True</span>, <span class="comment"># Mixed precision</span>
|
| 407 |
+
gradient_accumulation_steps=<span class="number">4</span>,
|
| 408 |
+
logging_steps=<span class="number">100</span>,
|
| 409 |
+
report_to=<span class="string">"wandb"</span>, <span class="comment"># Log to Weights & Biases</span>
|
| 410 |
+
)
|
| 411 |
+
|
| 412 |
+
trainer = Trainer(model=model, args=args, train_dataset=tokenized[<span class="string">"train"</span>], eval_dataset=tokenized[<span class="string">"test"</span>])
|
| 413 |
+
trainer.train()
|
| 414 |
+
trainer.push_to_hub() <span class="comment"># Push trained model directly</span></div>
|
| 415 |
+
|
| 416 |
+
<h3>6. Gradio β Build a Demo in 5 Lines</h3>
|
| 417 |
+
<div class="code-block"><span class="keyword">import</span> gradio <span class="keyword">as</span> gr
|
| 418 |
+
<span class="keyword">from</span> transformers <span class="keyword">import</span> pipeline
|
| 419 |
+
|
| 420 |
+
pipe = pipeline(<span class="string">"sentiment-analysis"</span>)
|
| 421 |
+
|
| 422 |
+
<span class="keyword">def</span> <span class="function">analyze</span>(text):
|
| 423 |
+
result = pipe(text)[<span class="number">0</span>]
|
| 424 |
+
<span class="keyword">return</span> <span class="string">f"{result['label']} ({result['score']:.2%})"</span>
|
| 425 |
+
|
| 426 |
+
gr.Interface(fn=analyze, inputs=<span class="string">"text"</span>, outputs=<span class="string">"text"</span>,
|
| 427 |
+
title=<span class="string">"Sentiment Analyzer"</span>).launch()
|
| 428 |
+
<span class="comment"># Runs at http://localhost:7860 β deploy to HF Spaces for free!</span></div>
|
| 429 |
+
|
| 430 |
+
<h3>7. Hub API β Programmatic Access</h3>
|
| 431 |
+
<div class="code-block"><span class="keyword">from</span> huggingface_hub <span class="keyword">import</span> HfApi, hf_hub_download, login
|
| 432 |
+
|
| 433 |
+
<span class="comment"># Login</span>
|
| 434 |
+
login(token=<span class="string">"hf_your_token"</span>) <span class="comment"># or: huggingface-cli login</span>
|
| 435 |
+
|
| 436 |
+
api = HfApi()
|
| 437 |
+
|
| 438 |
+
<span class="comment"># List models by task</span>
|
| 439 |
+
models = api.list_models(filter=<span class="string">"text-generation"</span>, sort=<span class="string">"downloads"</span>, limit=<span class="number">5</span>)
|
| 440 |
+
<span class="keyword">for</span> m <span class="keyword">in</span> models:
|
| 441 |
+
<span class="function">print</span>(<span class="string">f"{m.id}: {m.downloads} downloads"</span>)
|
| 442 |
+
|
| 443 |
+
<span class="comment"># Download specific file</span>
|
| 444 |
+
path = hf_hub_download(repo_id=<span class="string">"meta-llama/Llama-3.1-8B"</span>, filename=<span class="string">"config.json"</span>)
|
| 445 |
+
|
| 446 |
+
<span class="comment"># Push model to Hub</span>
|
| 447 |
+
model.push_to_hub(<span class="string">"your-username/my-model"</span>)
|
| 448 |
+
tokenizer.push_to_hub(<span class="string">"your-username/my-model"</span>)
|
| 449 |
|
| 450 |
+
<span class="comment"># Create a new Space</span>
|
| 451 |
+
api.create_repo(<span class="string">"your-username/my-demo"</span>, repo_type=<span class="string">"space"</span>, space_sdk=<span class="string">"gradio"</span>)</div>
|
| 452 |
</div>`,
|
| 453 |
interview: `
|
| 454 |
<div class="section">
|
| 455 |
+
<h2>π― Hugging Face β In-Depth Interview Questions</h2>
|
| 456 |
+
<div class="interview-box"><strong>Q1: What's the difference between <code>from_pretrained</code> and <code>pipeline</code>?</strong><p><strong>Answer:</strong> <code>pipeline()</code> is a high-level convenience wrapper β it auto-detects the task, loads both model + tokenizer, handles tokenization/decoding, and returns human-readable output. <code>from_pretrained()</code> gives raw access to model weights for: custom inference loops, fine-tuning, extracting embeddings, modifying the model architecture, or anything beyond standard inference. Rule: prototyping β pipeline, production/training β from_pretrained.</p></div>
|
| 457 |
+
<div class="interview-box"><strong>Q2: What is <code>device_map="auto"</code> and how does model sharding work?</strong><p><strong>Answer:</strong> It uses the <code>accelerate</code> library to automatically distribute model layers across available hardware. The algorithm: (1) Measure available memory on each GPU, CPU, and disk; (2) Place layers sequentially, filling GPU first, spilling to CPU, then disk. For a 70B model on two 24GB GPUs: layers 0-40 on GPU 0, layers 41-80 on GPU 1. CPU/disk offloading adds latency but enables running models that don't fit in GPU memory at all. Use <code>max_memory</code> param to control allocation.</p></div>
|
| 458 |
+
<div class="interview-box"><strong>Q3: Why use HF Datasets over pandas, and how does Apache Arrow help?</strong><p><strong>Answer:</strong> Datasets uses <strong>Apache Arrow</strong> β a columnar, memory-mapped format. Key advantages: (1) <strong>Memory mapping:</strong> A 100GB dataset uses near-zero RAM β data stays on disk but accessed at near-RAM speed via OS page cache. (2) <strong>Zero-copy:</strong> Slicing doesn't duplicate data. (3) <strong>Streaming:</strong> Process datasets larger than disk with <code>streaming=True</code>. (4) <strong>Parallel map:</strong> <code>num_proc=N</code> for multi-core preprocessing. (5) <strong>Caching:</strong> Processed results are automatically cached to disk. Pandas loads everything into RAM β impossible for large-scale ML datasets.</p></div>
|
| 459 |
+
<div class="interview-box"><strong>Q4: What is a chat template and why does it matter?</strong><p><strong>Answer:</strong> Each instruction-tuned model is trained with a specific format for system/user/assistant messages. Llama uses <code><|begin_of_text|><|start_header_id|>system<|end_header_id|></code>, while ChatML uses <code><|im_start|>system</code>. If you format input incorrectly, the model behaves like a base model (no instruction following). <code>tokenizer.apply_chat_template()</code> auto-formats messages correctly for any model. This is the #1 mistake beginners make β using raw text instead of the chat template.</p></div>
|
| 460 |
+
<div class="interview-box"><strong>Q5: How do you handle gated models (Llama, Gemma) in production?</strong><p><strong>Answer:</strong> (1) Accept the model license on the Hub model page. (2) Create a read token at hf.co/settings/tokens. (3) For local: <code>huggingface-cli login</code>. (4) In CI/CD: set <code>HF_TOKEN</code> environment variable. (5) In code: pass <code>token="hf_xxx"</code> to <code>from_pretrained()</code>. For Docker: bake the token as a secret, never in the image. For Kubernetes: use a Secret mounted as an env var. The token is only needed for download β once cached locally, no token is needed for inference.</p></div>
|
| 461 |
+
<div class="interview-box"><strong>Q6: What is safetensors and why replace pickle?</strong><p><strong>Answer:</strong> Traditional PyTorch models use Python's <code>pickle</code> format, which can execute arbitrary code during loading β a <strong>security vulnerability</strong>. A malicious model file could run code on your machine when loaded. <code>safetensors</code> is a safe, fast tensor format that: (1) Cannot execute code (pure data), (2) Supports zero-copy loading (memory-mapped), (3) Is 2-5x faster to load than pickle, (4) Supports lazy loading (load only specific tensors). It's now the default format on HF Hub.</p></div>
|
| 462 |
</div>`
|
| 463 |
},
|
| 464 |
'finetuning': {
|