Spaces:

Ishu8904
/

image-caption-ai

Sleeping

Ishu8904 commited on Aug 20, 2025

Commit

f5cf0fb

1 Parent(s): 40b0aa5

FIX: Handle NLTK download reliably with nltk.txt

Files changed (3) hide show

build.sh CHANGED Viewed

@@ -4,6 +4,7 @@ set -e
 # Install all the python packages
 echo "--- Installing dependencies ---"
 pip install -r requirements.txt
 # Download ONLY the large model files from the GitHub Release
 echo "--- Downloading model files ---"
@@ -11,7 +12,7 @@ wget -O decoder-model.pth "https://github.com/Ishu-Kaur/Image-Caption-AI/release
 wget -O encoder-model.pth "https://github.com/Ishu-Kaur/Image-Caption-AI/releases/download/v1.0.1/encoder-model.pth"
 echo "--- Model files downloaded successfully ---"
-# CRITICAL STEP: Build the vocabulary file directly on the server
 echo "--- Building vocabulary file ---"
 python build_vocab.py
 echo "--- Vocabulary file built successfully ---"

 # Install all the python packages
 echo "--- Installing dependencies ---"
 pip install -r requirements.txt
+pip install -r nltk.txt -d /opt/render/project/src/nltk_data # <-- ADD THIS LINE
 # Download ONLY the large model files from the GitHub Release
 echo "--- Downloading model files ---"
 wget -O encoder-model.pth "https://github.com/Ishu-Kaur/Image-Caption-AI/releases/download/v1.0.1/encoder-model.pth"
 echo "--- Model files downloaded successfully ---"
+# Build the vocabulary file directly on the server
 echo "--- Building vocabulary file ---"
 python build_vocab.py
 echo "--- Vocabulary file built successfully ---"

build_vocab.py CHANGED Viewed

@@ -56,13 +56,6 @@ class Vocabulary:
 if __name__ == "__main__":
     print("Starting vocabulary creation process...")
-    # Download the NLTK tokenizer model (only needs to be done once)
-    try:
-        nltk.data.find('tokenizers/punkt')
-    except LookupError: # <-- This is a more robust way to check
-        print("Downloading NLTK 'punkt' model...")
-        nltk.download('punkt')
     # Load the Flickr8k training data from Hugging Face
     print("Loading Flickr8k dataset from Hugging Face...")
     train_dataset = load_dataset("jxie/flickr8k", split="train")

 if __name__ == "__main__":
     print("Starting vocabulary creation process...")
     # Load the Flickr8k training data from Hugging Face
     print("Loading Flickr8k dataset from Hugging Face...")
     train_dataset = load_dataset("jxie/flickr8k", split="train")

nltk.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ punkt