Spaces:
Running
Running
File size: 5,591 Bytes
62a67da d153152 62a67da | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | # Using Local Pre-Cached Models
## Option 1: Download Models & Commit to Git (RECOMMENDED for your setup)
This approach stores models **directly in the repo**, so they're always available without any network dependency.
### Step 1: Download Lightweight Models
```bash
python3 scripts/download_lightweight_models.py
```
This downloads smaller models (~500MB total) and saves them to `models/` directory.
### Step 2: Commit Models to Git
```bash
cd /Users/shouryaangrish/Documents/Work/HugginFaceInfy/infy
git add models/
git commit -m "Add pre-cached models for offline use"
git push origin main
```
### Step 3: Update App to Use Local Models
Option A - Modify your app to use local models:
```python
# In app.py, change:
import config
# To:
from scripts.config_local import SENTIMENT_MODEL, NER_MODEL, ...
```
Option B - Replace config.py entirely:
```bash
cp scripts/config_local.py config.py
git add config.py
git commit -m "Switch to local model loading"
git push origin main
```
### Step 4: Test Locally
```bash
python3 app.py
```
Then click buttons - models will load from `models/` directory (instant, no download!)
---
## Benefits of This Approach
β
**No network dependency** β Models stored locally in repo
β
**Bypasses HF whitelist** β Company firewall won't block
β
**Instant loading** β Models already on disk
β
**Consistent deployments** β Same models for everyone
β
**Reproducible** β Models don't change versions
β
**Works on Spaces** β If you push to Spaces, models go with it
---
## What Models Are Included
| Model | Size | Task |
|-------|------|------|
| DistilBERT (Sentiment) | ~260 MB | Sentiment Analysis |
| BERT (Tokenizer) | ~440 MB | Tokenization |
| **Total** | **~500-700 MB** | |
*Note: NER, QA, Summarization still download from HF (too large for repo), but can be added if needed*
---
## How It Works
When you load models:
```python
# config.py checks if local models exist
if Path("models/sentiment").exists():
SENTIMENT_MODEL = "models/sentiment/model" # Load locally
else:
SENTIMENT_MODEL = "distilbert-base-uncased-..." # Download from HF
```
So if models are in the repo, they load instantly. If not, they download from HF as fallback.
---
## Step-by-Step Setup
### For Your Laptop (Quick Demo Prep)
```bash
# 1. Download lightweight models (~500MB)
python3 scripts/download_lightweight_models.py
# 2. Test locally
python3 app.py
# Click "Analyze Sentiment" - should be instant (models loaded from "models/" dir)
# 3. Ready for demo!
```
### For Spaces Deployment
```bash
# 1. Models already in repo from above
# 2. Push to Spaces
git push origin main
# 3. Spaces auto-deploys with pre-cached models
# π Demos run instantly!
```
---
## File Structure After Setup
```
infy/
βββ models/ β Pre-downloaded models
β βββ sentiment/
β β βββ model/ β Model files
β β βββ tokenizer/ β Tokenizer files
β βββ tokenizer/
β βββ model/
β βββ tokenizer/
βββ app.py β Uses local models
βββ config.py β Loads from "models/"
βββ utils.py
βββ requirements.txt
βββ scripts/
βββ download_lightweight_models.py
βββ config_local.py
βββ README.md
```
---
## Troubleshooting
### Models directory too large for git?
Git has limits on file size. If you exceed them:
```bash
# Install Git LFS (Large File Storage)
brew install git-lfs
git lfs install
# Then add models to LFS
git lfs track "models/**/*.bin"
git lfs track "models/**/*.safetensors"
git add .gitattributes models/
git commit -m "Use Git LFS for large model files"
git push origin main
```
Note: *Repo already has `.gitattributes` set up for this!*
### "Models still downloading during demo"?
- Make sure `python3 scripts/download_lightweight_models.py` completed
- Check `models/` directory exists: `ls -la models/`
- Verify config.py is using local paths
- Restart app: `python3 app.py`
### Want offline-only (no HF fallback)?
Edit `scripts/config_local.py`:
```python
# Change this (current):
NER_MODEL = "dslim/bert-base-NER"
# To this (local only):
NER_MODEL = str(MODELS_DIR / "ner" / "model")
# Then download it: python3 scripts/download_lightweight_models.py
```
---
## Estimated File Sizes
| Component | Size |
|-----------|------|
| DistilBERT (sentiment) | ~260 MB |
| BERT base (tokenizer) | ~440 MB |
| Config/tokenizer files | ~5 MB |
| **Total for 2 models** | **~700 MB** |
| Git repo (with models) | ~750 MB |
Git can handle this fine. For many more models, use Git LFS (already configured in `.gitattributes`)
---
## Next Steps
1. **Run:** `python3 scripts/download_lightweight_models.py`
2. **Test:** `python3 app.py` β click a button β instant loading β
3. **Commit:** `git add models/` β `git push origin main`
4. **Demo:** Perfect for your session!
---
## Why This Solves Your Problem
| Issue | Solution |
|-------|----------|
| Company firewall blocks HF | β
Models stored locally, no external download |
| Slow network during demo | β
Instant loading from disk |
| Attendees can't download | β
Everything in repo, cloneable |
| Spaces issues | β
Models come with Spaces push |
| Repeatability | β
Same models for everyone |
---
**Ready?** Run this on your laptop now:
```bash
python3 scripts/download_lightweight_models.py
```
Then let me know what the size is and we can decide if we add more models! π
|