File size: 5,541 Bytes
5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec 5a66ce1 27a60ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# Environment Configuration for Hugging Face Spaces
This document lists all environment variables for the crossword generator backend when deployed on Hugging Face Spaces.
## Required Variables
### Core Application Settings
```env
NODE_ENV=production
PORT=7860
PYTHONPATH=/app/crossword-app/backend-py
PYTHONUNBUFFERED=1
```
### Cache Configuration
```env
CACHE_DIR=/app/cache
```
### AI/ML Model Configuration
```env
THEMATIC_MODEL_NAME=all-mpnet-base-v2
THEMATIC_VOCAB_SIZE_LIMIT=100000
VOCAB_SOURCE=norvig
```
## Optional Variables (with defaults)
### Word Selection & Quality Control
```env
SIMILARITY_TEMPERATURE=0.2
USE_SOFTMAX_SELECTION=true
DIFFICULTY_WEIGHT=0.5
THEMATIC_POOL_SIZE=150
```
### Multi-Topic Intersection Configuration
```env
MULTI_TOPIC_METHOD=soft_minimum
SOFT_MIN_BETA=10.0
SOFT_MIN_ADAPTIVE=true
SOFT_MIN_MIN_WORDS=15
SOFT_MIN_MAX_RETRIES=5
SOFT_MIN_BETA_DECAY=0.7
```
### Distribution Normalization (Experimental)
```env
ENABLE_DISTRIBUTION_NORMALIZATION=false
NORMALIZATION_METHOD=similarity_range
```
### Debug & Development
```env
ENABLE_DEBUG_TAB=false
```
## Variable Explanations
### **CACHE_DIR** (Required)
- Directory for caching models, embeddings, and vocabulary
- Contains sentence-transformer models, word embeddings, and NLTK data
- Should be persistent across deployments
### **THEMATIC_MODEL_NAME** (Default: all-mpnet-base-v2)
- Sentence transformer model for semantic embeddings
- Options: all-mpnet-base-v2, all-MiniLM-L6-v2 (smaller/faster)
- Affects quality vs performance trade-off
### **THEMATIC_VOCAB_SIZE_LIMIT** (Default: 100000)
- Maximum vocabulary size for word generation
- Higher = more word variety, more memory usage
- Norvig vocabulary contains ~100K words
### **VOCAB_SOURCE** (Default: norvig)
- Vocabulary source for word generation
- Currently only "norvig" is supported
- Uses Norvig word frequency dataset
### **SIMILARITY_TEMPERATURE** (Default: 0.2)
- Controls randomness in word selection
- Lower = more deterministic (top similarity words)
- Higher = more random selection from similar words
- Range: 0.1-2.0
### **DIFFICULTY_WEIGHT** (Default: 0.5)
- Balances similarity vs frequency for difficulty levels
- 0.0 = pure similarity, 1.0 = pure frequency
- Affects easy/medium/hard word selection
### **MULTI_TOPIC_METHOD** (Default: soft_minimum)
- Method for multi-topic word intersection
- Options: soft_minimum, geometric_mean, harmonic_mean, averaging
- soft_minimum finds words relevant to ALL topics
### **SOFT_MIN_BETA** (Default: 10.0)
- Beta parameter for soft minimum calculation
- Higher = stricter intersection requirement
- Automatically adjusted if SOFT_MIN_ADAPTIVE=true
### **ENABLE_DEBUG_TAB** (Default: false)
- Shows debug information in frontend
- Displays word selection process and parameters
- Useful for development and analysis
### **ENABLE_DISTRIBUTION_NORMALIZATION** (Default: false)
- Experimental feature for normalizing similarity distributions
- Generally disabled for better semantic authenticity
- See docs/distribution_normalization_analysis.md
## Recommended HF Spaces Configuration
**Minimal Setup (Core functionality):**
```env
NODE_ENV=production
PORT=7860
PYTHONPATH=/app/crossword-app/backend-py
PYTHONUNBUFFERED=1
CACHE_DIR=/app/cache
THEMATIC_VOCAB_SIZE_LIMIT=100000
THEMATIC_MODEL_NAME=all-mpnet-base-v2
VOCAB_SOURCE=norvig
```
**Optimized Setup (Better performance & debugging):**
```env
NODE_ENV=production
PORT=7860
PYTHONPATH=/app/crossword-app/backend-py
PYTHONUNBUFFERED=1
CACHE_DIR=/app/cache
THEMATIC_VOCAB_SIZE_LIMIT=100000
THEMATIC_MODEL_NAME=all-mpnet-base-v2
VOCAB_SOURCE=norvig
SIMILARITY_TEMPERATURE=0.2
DIFFICULTY_WEIGHT=0.5
ENABLE_DEBUG_TAB=true
MULTI_TOPIC_METHOD=soft_minimum
SOFT_MIN_ADAPTIVE=true
```
## Deprecated Variables (Safe to Remove)
These variables are no longer used and can be deleted from HF Spaces:
- `EMBEDDING_MODEL` (replaced by THEMATIC_MODEL_NAME)
- `WORD_SIMILARITY_THRESHOLD` (deprecated with old vector search)
- `USE_AI_WORDS` (always true now)
- `FALLBACK_TO_STATIC` (no static fallback in current system)
- `SEARCH_RANDOMNESS` (replaced by SIMILARITY_TEMPERATURE)
- `MAX_CACHED_WORDS` (deprecated with old caching)
- `CACHE_EXPIRY_HOURS` (deprecated with old caching)
- `USE_HIERARCHICAL_SEARCH` (deprecated with old vector search)
- `MAX_USED_WORDS_MEMORY` (deprecated with old word tracking)
## Performance Notes
- **Startup Time**: ~30-60 seconds (model download + cache creation)
- **Memory Usage**: ~500MB-1GB (sentence-transformers + embeddings + vocabulary)
- **Response Time**: ~200-500ms (word generation + clue creation + grid fitting)
- **Disk Usage**: ~500MB for full model cache (vocabulary, embeddings, models)
## Troubleshooting
**If puzzle generation fails:**
1. Check CACHE_DIR is writable and has sufficient space
2. Monitor startup logs for cache creation progress
3. Verify THEMATIC_VOCAB_SIZE_LIMIT isn't too restrictive
**If words seem too random:**
1. Lower SIMILARITY_TEMPERATURE (try 0.1)
2. Increase DIFFICULTY_WEIGHT for frequency-based selection
3. Check debug tab with ENABLE_DEBUG_TAB=true
**If multi-topic queries return too few words:**
1. Enable SOFT_MIN_ADAPTIVE=true for automatic threshold adjustment
2. Lower SOFT_MIN_BETA manually (try 5.0)
3. Try different MULTI_TOPIC_METHOD (geometric_mean is more permissive)
**If startup is too slow:**
1. Use smaller model: THEMATIC_MODEL_NAME=all-MiniLM-L6-v2
2. Reduce vocabulary: THEMATIC_VOCAB_SIZE_LIMIT=50000
3. Cache should speed up subsequent startups significantly |