Spaces:
Running
Running
| # Using Local Pre-Cached Models | |
| ## Option 1: Download Models & Commit to Git (RECOMMENDED for your setup) | |
| This approach stores models **directly in the repo**, so they're always available without any network dependency. | |
| ### Step 1: Download Lightweight Models | |
| ```bash | |
| python3 scripts/download_lightweight_models.py | |
| ``` | |
| This downloads smaller models (~500MB total) and saves them to `models/` directory. | |
| ### Step 2: Commit Models to Git | |
| ```bash | |
| cd /Users/shouryaangrish/Documents/Work/HugginFaceInfy/infy | |
| git add models/ | |
| git commit -m "Add pre-cached models for offline use" | |
| git push origin main | |
| ``` | |
| ### Step 3: Update App to Use Local Models | |
| Option A - Modify your app to use local models: | |
| ```python | |
| # In app.py, change: | |
| import config | |
| # To: | |
| from scripts.config_local import SENTIMENT_MODEL, NER_MODEL, ... | |
| ``` | |
| Option B - Replace config.py entirely: | |
| ```bash | |
| cp scripts/config_local.py config.py | |
| git add config.py | |
| git commit -m "Switch to local model loading" | |
| git push origin main | |
| ``` | |
| ### Step 4: Test Locally | |
| ```bash | |
| python3 app.py | |
| ``` | |
| Then click buttons - models will load from `models/` directory (instant, no download!) | |
| --- | |
| ## Benefits of This Approach | |
| β **No network dependency** β Models stored locally in repo | |
| β **Bypasses HF whitelist** β Company firewall won't block | |
| β **Instant loading** β Models already on disk | |
| β **Consistent deployments** β Same models for everyone | |
| β **Reproducible** β Models don't change versions | |
| β **Works on Spaces** β If you push to Spaces, models go with it | |
| --- | |
| ## What Models Are Included | |
| | Model | Size | Task | | |
| |-------|------|------| | |
| | DistilBERT (Sentiment) | ~260 MB | Sentiment Analysis | | |
| | BERT (Tokenizer) | ~440 MB | Tokenization | | |
| | **Total** | **~500-700 MB** | | | |
| *Note: NER, QA, Summarization still download from HF (too large for repo), but can be added if needed* | |
| --- | |
| ## How It Works | |
| When you load models: | |
| ```python | |
| # config.py checks if local models exist | |
| if Path("models/sentiment").exists(): | |
| SENTIMENT_MODEL = "models/sentiment/model" # Load locally | |
| else: | |
| SENTIMENT_MODEL = "distilbert-base-uncased-..." # Download from HF | |
| ``` | |
| So if models are in the repo, they load instantly. If not, they download from HF as fallback. | |
| --- | |
| ## Step-by-Step Setup | |
| ### For Your Laptop (Quick Demo Prep) | |
| ```bash | |
| # 1. Download lightweight models (~500MB) | |
| python3 scripts/download_lightweight_models.py | |
| # 2. Test locally | |
| python3 app.py | |
| # Click "Analyze Sentiment" - should be instant (models loaded from "models/" dir) | |
| # 3. Ready for demo! | |
| ``` | |
| ### For Spaces Deployment | |
| ```bash | |
| # 1. Models already in repo from above | |
| # 2. Push to Spaces | |
| git push origin main | |
| # 3. Spaces auto-deploys with pre-cached models | |
| # π Demos run instantly! | |
| ``` | |
| --- | |
| ## File Structure After Setup | |
| ``` | |
| infy/ | |
| βββ models/ β Pre-downloaded models | |
| β βββ sentiment/ | |
| β β βββ model/ β Model files | |
| β β βββ tokenizer/ β Tokenizer files | |
| β βββ tokenizer/ | |
| β βββ model/ | |
| β βββ tokenizer/ | |
| βββ app.py β Uses local models | |
| βββ config.py β Loads from "models/" | |
| βββ utils.py | |
| βββ requirements.txt | |
| βββ scripts/ | |
| βββ download_lightweight_models.py | |
| βββ config_local.py | |
| βββ README.md | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Models directory too large for git? | |
| Git has limits on file size. If you exceed them: | |
| ```bash | |
| # Install Git LFS (Large File Storage) | |
| brew install git-lfs | |
| git lfs install | |
| # Then add models to LFS | |
| git lfs track "models/**/*.bin" | |
| git lfs track "models/**/*.safetensors" | |
| git add .gitattributes models/ | |
| git commit -m "Use Git LFS for large model files" | |
| git push origin main | |
| ``` | |
| Note: *Repo already has `.gitattributes` set up for this!* | |
| ### "Models still downloading during demo"? | |
| - Make sure `python3 scripts/download_lightweight_models.py` completed | |
| - Check `models/` directory exists: `ls -la models/` | |
| - Verify config.py is using local paths | |
| - Restart app: `python3 app.py` | |
| ### Want offline-only (no HF fallback)? | |
| Edit `scripts/config_local.py`: | |
| ```python | |
| # Change this (current): | |
| NER_MODEL = "dslim/bert-base-NER" | |
| # To this (local only): | |
| NER_MODEL = str(MODELS_DIR / "ner" / "model") | |
| # Then download it: python3 scripts/download_lightweight_models.py | |
| ``` | |
| --- | |
| ## Estimated File Sizes | |
| | Component | Size | | |
| |-----------|------| | |
| | DistilBERT (sentiment) | ~260 MB | | |
| | BERT base (tokenizer) | ~440 MB | | |
| | Config/tokenizer files | ~5 MB | | |
| | **Total for 2 models** | **~700 MB** | | |
| | Git repo (with models) | ~750 MB | | |
| Git can handle this fine. For many more models, use Git LFS (already configured in `.gitattributes`) | |
| --- | |
| ## Next Steps | |
| 1. **Run:** `python3 scripts/download_lightweight_models.py` | |
| 2. **Test:** `python3 app.py` β click a button β instant loading β | |
| 3. **Commit:** `git add models/` β `git push origin main` | |
| 4. **Demo:** Perfect for your session! | |
| --- | |
| ## Why This Solves Your Problem | |
| | Issue | Solution | | |
| |-------|----------| | |
| | Company firewall blocks HF | β Models stored locally, no external download | | |
| | Slow network during demo | β Instant loading from disk | | |
| | Attendees can't download | β Everything in repo, cloneable | | |
| | Spaces issues | β Models come with Spaces push | | |
| | Repeatability | β Same models for everyone | | |
| --- | |
| **Ready?** Run this on your laptop now: | |
| ```bash | |
| python3 scripts/download_lightweight_models.py | |
| ``` | |
| Then let me know what the size is and we can decide if we add more models! π | |