File size: 5,591 Bytes
62a67da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d153152
62a67da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# Using Local Pre-Cached Models

## Option 1: Download Models & Commit to Git (RECOMMENDED for your setup)

This approach stores models **directly in the repo**, so they're always available without any network dependency.

### Step 1: Download Lightweight Models

```bash
python3 scripts/download_lightweight_models.py
```

This downloads smaller models (~500MB total) and saves them to `models/` directory.

### Step 2: Commit Models to Git

```bash
cd /Users/shouryaangrish/Documents/Work/HugginFaceInfy/infy
git add models/
git commit -m "Add pre-cached models for offline use"
git push origin main
```

### Step 3: Update App to Use Local Models

Option A - Modify your app to use local models:
```python
# In app.py, change:
import config
# To:
from scripts.config_local import SENTIMENT_MODEL, NER_MODEL, ...
```

Option B - Replace config.py entirely:
```bash
cp scripts/config_local.py config.py
git add config.py
git commit -m "Switch to local model loading"
git push origin main
```

### Step 4: Test Locally

```bash
python3 app.py
```

Then click buttons - models will load from `models/` directory (instant, no download!)

---

## Benefits of This Approach

βœ… **No network dependency** β€” Models stored locally in repo  
βœ… **Bypasses HF whitelist** β€” Company firewall won't block  
βœ… **Instant loading** β€” Models already on disk  
βœ… **Consistent deployments** β€” Same models for everyone  
βœ… **Reproducible** β€” Models don't change versions  
βœ… **Works on Spaces** β€” If you push to Spaces, models go with it  

---

## What Models Are Included

| Model | Size | Task |
|-------|------|------|
| DistilBERT (Sentiment) | ~260 MB | Sentiment Analysis |
| BERT (Tokenizer) | ~440 MB | Tokenization |
| **Total** | **~500-700 MB** | |

*Note: NER, QA, Summarization still download from HF (too large for repo), but can be added if needed*

---

## How It Works

When you load models:

```python
# config.py checks if local models exist
if Path("models/sentiment").exists():
    SENTIMENT_MODEL = "models/sentiment/model"  # Load locally
else:
    SENTIMENT_MODEL = "distilbert-base-uncased-..."  # Download from HF
```

So if models are in the repo, they load instantly. If not, they download from HF as fallback.

---

## Step-by-Step Setup

### For Your Laptop (Quick Demo Prep)

```bash
# 1. Download lightweight models (~500MB)
python3 scripts/download_lightweight_models.py

# 2. Test locally
python3 app.py
# Click "Analyze Sentiment" - should be instant (models loaded from "models/" dir)

# 3. Ready for demo!
```

### For Spaces Deployment

```bash
# 1. Models already in repo from above
# 2. Push to Spaces
git push origin main

# 3. Spaces auto-deploys with pre-cached models
# πŸŽ‰ Demos run instantly!
```

---

## File Structure After Setup

```
infy/
β”œβ”€β”€ models/                          ← Pre-downloaded models
β”‚   β”œβ”€β”€ sentiment/
β”‚   β”‚   β”œβ”€β”€ model/                   ← Model files
β”‚   β”‚   └── tokenizer/               ← Tokenizer files
β”‚   └── tokenizer/
β”‚       β”œβ”€β”€ model/
β”‚       └── tokenizer/
β”œβ”€β”€ app.py                           ← Uses local models
β”œβ”€β”€ config.py                        ← Loads from "models/"
β”œβ”€β”€ utils.py
β”œβ”€β”€ requirements.txt
└── scripts/
    β”œβ”€β”€ download_lightweight_models.py
    β”œβ”€β”€ config_local.py
    └── README.md
```

---

## Troubleshooting

### Models directory too large for git?

Git has limits on file size. If you exceed them:

```bash
# Install Git LFS (Large File Storage)
brew install git-lfs
git lfs install

# Then add models to LFS
git lfs track "models/**/*.bin"
git lfs track "models/**/*.safetensors"
git add .gitattributes models/
git commit -m "Use Git LFS for large model files"
git push origin main
```

Note: *Repo already has `.gitattributes` set up for this!*

### "Models still downloading during demo"?

- Make sure `python3 scripts/download_lightweight_models.py` completed
- Check `models/` directory exists: `ls -la models/`
- Verify config.py is using local paths
- Restart app: `python3 app.py`

### Want offline-only (no HF fallback)?

Edit `scripts/config_local.py`:
```python
# Change this (current):
NER_MODEL = "dslim/bert-base-NER"

# To this (local only):
NER_MODEL = str(MODELS_DIR / "ner" / "model")
# Then download it: python3 scripts/download_lightweight_models.py
```

---

## Estimated File Sizes

| Component | Size |
|-----------|------|
| DistilBERT (sentiment) | ~260 MB |
| BERT base (tokenizer) | ~440 MB |
| Config/tokenizer files | ~5 MB |
| **Total for 2 models** | **~700 MB** |
| Git repo (with models) | ~750 MB |

Git can handle this fine. For many more models, use Git LFS (already configured in `.gitattributes`)

---

## Next Steps

1. **Run:** `python3 scripts/download_lightweight_models.py`
2. **Test:** `python3 app.py` β†’ click a button β†’ instant loading βœ…
3. **Commit:** `git add models/` β†’ `git push origin main`
4. **Demo:** Perfect for your session!

---

## Why This Solves Your Problem

| Issue | Solution |
|-------|----------|
| Company firewall blocks HF | βœ… Models stored locally, no external download |
| Slow network during demo | βœ… Instant loading from disk |
| Attendees can't download | βœ… Everything in repo, cloneable |
| Spaces issues | βœ… Models come with Spaces push |
| Repeatability | βœ… Same models for everyone |

---

**Ready?** Run this on your laptop now:
```bash
python3 scripts/download_lightweight_models.py
```

Then let me know what the size is and we can decide if we add more models! πŸš€