Aya777 commited on
Commit
f30516c
·
verified ·
1 Parent(s): 766273e

Upload 8 files

Browse files
Files changed (8) hide show
  1. README.md +94 -11
  2. _app.py_archive_old_versions.py +1086 -0
  3. _main.py_archive_old_versions.py +607 -0
  4. app.py +134 -0
  5. config.toml +5 -0
  6. gitignore.txt +39 -0
  7. main.py +198 -0
  8. requirements.txt +9 -0
README.md CHANGED
@@ -1,14 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: VisionSort AIChallenge
3
- emoji: 🐠
4
- colorFrom: indigo
5
- colorTo: blue
6
- sdk: streamlit
7
- sdk_version: 1.44.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: AI tool to search images or video frames using natural promp
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VisionSort
2
+ *AI-powered visual search tool for finding key moments in large batches of images or video frames using natural prompts.*
3
+
4
+ ---
5
+
6
+ ## Concept Summary
7
+
8
+ VisionSort helps users avoid manually scrubbing through thousands of images or long video footage. Instead, they can simply describe what they’re looking for in natural language — like:
9
+
10
+ > “Show me images with meteors,”
11
+ > “Find the person wearing a blue hoodie,”
12
+ > “Only frames with the cat near the window.”
13
+
14
+ The app uses OpenAI's CLIP model to semantically compare your prompt to the visual content of uploaded images or video frames. It then displays the most relevant matches with ranked confidence scores and video timestamps.
15
+
16
+ ---
17
+
18
+ ## Target Users
19
+
20
+ - **Astrophotographers & skywatchers** — spotting rare meteor events
21
+ - **Surveillance teams / CCTV users** — locating key moments in footage
22
+ - **Researchers or satellite image analysts** — filtering massive visual datasets
23
+ - **Drone operators or hobbyists** — identifying key subjects
24
+ - **Anyone with a large photo/video archive** — looking for specific visuals
25
+
26
+ ---
27
+
28
+ ## Tech Stack
29
+
30
+ - VS Code (development)
31
+ - Jupyter Notebook (early prototyping)
32
+ - Python 3.11.11
33
+ - Streamlit (app UI)
34
+ - OpenAI CLIP (ViT-B/32 model for image-text matching)
35
+ - OpenCV (video frame extraction)
36
+ - Pillow (image handling and processing)
37
+
38
+
39
+ ---
40
+
41
+
42
+ ## Key Features
43
+
44
+ - Upload multiple images or videos
45
+ - Auto-extract frames from video (1 frame/sec)
46
+ - Search using natural language prompts
47
+ - Semantic similarity matching using CLIP embeddings + cosine similarity
48
+ - Results sorted into:
49
+ - 🎯 Confident Matches
50
+ - ⚠️ Potential Matches (borderline)
51
+ - ❓ Low Confidence Matches
52
+ - Interactive Configuration Panel:
53
+ - Adjust confidence threshold and borderline minimum
54
+ - Toggle display of borderline and low-confidence results
55
+ - Timestamp support for video frames
56
+ - Download Displayed Results as `.zip` based on current filter settings
57
+ - Temp file cleanup on each run
58
+
59
+
60
  ---
61
+
62
+
63
+ ## Archived & Upcoming Features
64
+
65
+ This version focuses on a clean, working CLIP-based prototype.
66
+
67
+ The following features were previously implemented but later removed (archived in _main.py_archive_old_versions.py and _app.py_archive_old_versions.py)
68
+ to improve performance and simplify the user experience — but are preserved for future updates:
69
+
70
+ - BLIP captioning for fallback logic
71
+ - GPT-4 integration for prompt refinement when user input was vague or misspelled
72
+ - User-controlled frame sampling rate (choose how many frames to extract from videos)
73
+ - Optional fallback triggers — user could decide when to use BLIP or GPT help
74
+ - Alternative UI versions with more interactive elements
75
+
76
+
77
+
78
  ---
79
 
80
+ ## Challenges Faced
81
+
82
+ - Balancing scope under tight time pressure.
83
+ - First time independently building an AI project — and seriously working with Python.
84
+ - Initially aimed to integrate CLIP (semantic search), GPT-4 (prompt refinement), and BLIP (fallback captioning) — but the stack proved too complex for the challenge timeline.
85
+ - Learned an important lesson in **scope control** under tight deadlines.
86
+ - Faced performance issues when processing large batches of images and videos, which taught me the need to write code that handles batch operations efficiently.
87
+ - Experimented with BLIP as a fallback model:
88
+ - Helped add context, but often lacked precision.
89
+ - Highlighted the need for smarter fallback triggers and sparked interest in future models like PaLI or GIT.
90
+ - Streamlit-specific challenges:
91
+ - Managing multiple file uploads and temp files
92
+ - Keeping UI responsive with real-time feedback (match scores, timestamps, toggles)
93
+ - Key takeaway: **Build a stable core first**, then layer in advanced features.
94
+ - Although GPT-4 and BLIP weren’t fully integrated in the final version, I preserved and documented their experiments for future improvements.
95
+
96
+
97
+
_app.py_archive_old_versions.py ADDED
@@ -0,0 +1,1086 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 🗃️ ARCHIVED CODE — Not used in the final submitted app
3
+
4
+ This file contains earlier experimental versions and alternative implementations
5
+ of the VisionSort app. It includes:
6
+
7
+ - Initial UI structures that were later refactored
8
+ - GPT-4 prompt suggestion and fallback logic (commented out)
9
+ - BLIP captioning integration attempts (eventually removed)
10
+ - Other design variations and logic blocks
11
+
12
+ These sections were removed from main.py and app.py to simplify the final submission,
13
+ but are preserved here to document the development process, thought flow, and future plans.
14
+
15
+ Do not import or execute this file — it is for reference only.
16
+ """
17
+
18
+
19
+ # #Imports
20
+ # import os
21
+ # import tempfile
22
+ # import streamlit as st
23
+ # from main import analyze_media, process_with_blip
24
+ # from PIL import Image
25
+ # import openai
26
+
27
+ # # Initialize OpenAI API
28
+ # from dotenv import load_dotenv
29
+ # load_dotenv()
30
+ # api_key = os.getenv("OPENAI_API_KEY")
31
+ # openai.api_key = api_key
32
+
33
+ # # --- Streamlit Setup ---
34
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
35
+ # st.sidebar.header("Configuration")
36
+
37
+ # # --- Sidebar Config ---
38
+ # min_confidence = st.sidebar.number_input("Confidence Threshold", min_value=0, max_value=100, value=25, step=1)
39
+ # borderline_min = st.sidebar.number_input("Borderline Minimum", min_value=0, max_value=100, value=15, step=1)
40
+
41
+
42
+ # # --- Main Interface ---
43
+ # st.title("🔍 VisionSort Pro")
44
+ # uploaded_files = st.file_uploader("Upload images/videos", type=["jpg", "jpeg", "png", "mp4", "mov"], accept_multiple_files=True)
45
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
46
+
47
+ # if uploaded_files and user_prompt:
48
+ # results = {"high": [], "borderline": [], "low": []}
49
+ # temp_paths = []
50
+
51
+ # with st.spinner(f"Processing {len(uploaded_files)} files..."):
52
+ # for file in uploaded_files:
53
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
54
+ # f.write(file.read())
55
+ # temp_paths.append(f.name)
56
+ # media_results = analyze_media(
57
+ # f.name,
58
+ # user_prompt,
59
+ # min_confidence,
60
+ # (borderline_min, min_confidence)
61
+ # )
62
+
63
+ # for res in media_results:
64
+ # results[res["status"]].append(res)
65
+
66
+ # # Sort all groups by confidence descending
67
+ # for group in results.values():
68
+ # group.sort(key=lambda r: r["confidence"], reverse=True)
69
+
70
+ # # --- Display Confident Matches ---
71
+ # if results["high"]:
72
+ # st.subheader(f"🎯 Confident Matches ({len(results['high'])})")
73
+ # cols = st.columns(4)
74
+ # for idx, res in enumerate(results["high"]):
75
+ # with cols[idx % 4]:
76
+ # st.image(Image.open(res["path"]), use_container_width=True)
77
+ # st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
78
+
79
+ # # --- Display Borderline Matches ---
80
+ # if results["borderline"]:
81
+ # st.subheader(f"⚠️ Potential Matches ({len(results['borderline'])})")
82
+ # if st.checkbox("Show borderline results", True):
83
+ # cols = st.columns(4)
84
+ # for idx, res in enumerate(results["borderline"]):
85
+ # with cols[idx % 4]:
86
+ # st.image(Image.open(res["path"]), use_container_width=True)
87
+ # st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
88
+ # if st.button("🧠 Explain Match", key=f"blip_{idx}"):
89
+ # with st.expander("🔍 BLIP Analysis"):
90
+ # st.write(f"**BLIP Description:** {process_with_blip(res['path'])}")
91
+ # if "gpt_suggestion" in res:
92
+ # st.write(f"**GPT Suggestion:** {res['gpt_suggestion']}")
93
+
94
+ # # --- Display Low Confidence Matches Only If GPT Enabled ---
95
+ # if results["low"] and openai.api_key:
96
+ # st.subheader(f"❓ Low Confidence Matches ({len(results['low'])})")
97
+ # if st.checkbox("Show low confidence results"):
98
+ # for res in results["low"]:
99
+ # st.image(Image.open(res["path"]), use_container_width=True)
100
+ # st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
101
+ # if "gpt_suggestion" in res:
102
+ # st.markdown(f"**💡 GPT Suggestion:** {res['gpt_suggestion']}")
103
+
104
+ # # --- Cleanup Temporary Files ---
105
+ # for path in temp_paths:
106
+ # if os.path.exists(path):
107
+ # os.unlink(path)
108
+
109
+ #------original visonSort Chat-------------------------------------------------------
110
+ # import os
111
+ # import tempfile
112
+ # import streamlit as st
113
+ # from main import analyze_media, process_with_blip
114
+ # from PIL import Image
115
+ # import openai
116
+
117
+ # # Load OpenAI key from .env file
118
+ # from dotenv import load_dotenv
119
+ # load_dotenv()
120
+ # api_key = os.getenv("OPENAI_API_KEY")
121
+ # openai.api_key = api_key
122
+
123
+ # # Set Streamlit layout
124
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
125
+ # st.sidebar.header("Configuration")
126
+
127
+ # # === USER CONFIG ===
128
+ # min_confidence = st.sidebar.number_input(
129
+ # "Confidence Threshold", min_value=0, max_value=100, value=25, step=1
130
+ # )
131
+
132
+ # # Helpful explanation
133
+ # st.sidebar.caption("💡 All results below the threshold will use fallback logic (BLIP/GPT).")
134
+
135
+ # # === UI: Upload Files ===
136
+ # st.title("🔍 VisionSort Pro")
137
+ # uploaded_files = st.file_uploader(
138
+ # "Upload images or a video",
139
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
140
+ # accept_multiple_files=True,
141
+ # key="file_uploader"
142
+ # )
143
+
144
+ # # Clear All Button
145
+ # if st.button("❌ Clear All"):
146
+ # st.session_state["file_uploader"] = [] # Reset uploaded files
147
+
148
+ # # Prompt
149
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
150
+
151
+ # # === MEDIA TYPE CHECK ===
152
+ # if uploaded_files:
153
+ # exts = {os.path.splitext(f.name)[1].lower() for f in uploaded_files}
154
+ # if {".mp4", ".mov"}.intersection(exts) and {".jpg", ".jpeg", ".png"}.intersection(exts):
155
+ # st.error("⚠️ Please upload only images OR only a video. Mixing is not supported.")
156
+ # st.stop()
157
+
158
+ # # === MAIN LOGIC ===
159
+ # if uploaded_files and user_prompt:
160
+ # temp_paths, results = [], {"confident": [], "fallback": []}
161
+
162
+ # with st.spinner("Analyzing..."):
163
+ # for file in uploaded_files:
164
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
165
+ # f.write(file.read())
166
+ # temp_paths.append(f.name)
167
+ # res = analyze_media(f.name, user_prompt, min_confidence)
168
+ # for r in res:
169
+ # group = "confident" if r["confidence"] >= min_confidence else "fallback"
170
+ # results[group].append(r)
171
+
172
+ # # Sort all result groups high → low confidence
173
+ # results["confident"].sort(key=lambda x: x["confidence"], reverse=True)
174
+ # results["fallback"].sort(key=lambda x: x["confidence"], reverse=True)
175
+
176
+ # # === DISPLAY: CONFIDENT MATCHES ===
177
+ # if results["confident"]:
178
+ # with st.expander(f"🎯 Confident Matches ({len(results['confident'])})", expanded=True):
179
+ # cols = st.columns(4)
180
+ # for idx, res in enumerate(results["confident"]):
181
+ # with cols[idx % 4]:
182
+ # st.image(Image.open(res["path"]), use_container_width=True)
183
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
184
+ # with st.expander("📌 Details"):
185
+ # st.write(f"**File:** {os.path.basename(res['path'])}")
186
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
187
+ # st.write(f"**Timestamp:** {res['timestamp']:.2f}s")
188
+ # st.write(f"**Location:** (Unavailable)")
189
+
190
+ # # === DISPLAY: FALLBACK MATCHES ===
191
+ # if results["fallback"]:
192
+ # with st.expander(f"⚠️ Fallback Matches ({len(results['fallback'])})", expanded=True):
193
+ # cols = st.columns(4)
194
+ # for idx, res in enumerate(results["fallback"]):
195
+ # with cols[idx % 4]:
196
+ # st.image(Image.open(res["path"]), use_container_width=True)
197
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
198
+ # with st.expander("📌 Details"):
199
+ # st.write(f"**File:** {os.path.basename(res['path'])}")
200
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
201
+ # st.write(f"**Timestamp:** {res['timestamp']:.2f}s")
202
+ # st.write(f"**Location:** (Unavailable)")
203
+
204
+ # # === FALLBACK: GPT PROMPT SUGGESTION ===
205
+ # if not results["confident"] and openai.api_key:
206
+ # st.markdown("---")
207
+ # st.warning("😕 No confident results found.")
208
+ # try:
209
+ # captions = [process_with_blip(r["path"]) for r in results["fallback"][:3]]
210
+ # suggestion_prompt = openai.ChatCompletion.create(
211
+ # model="gpt-4",
212
+ # messages=[
213
+ # {"role": "system", "content": "Suggest a clearer image prompt from captions."},
214
+ # {"role": "user", "content": "Captions:\n" + "\n".join(captions)}
215
+ # ],
216
+ # max_tokens=50
217
+ # )
218
+ # suggested = suggestion_prompt.choices[0].message.content.strip()
219
+ # st.info(f"💡 Try this instead: **{suggested}**")
220
+ # except Exception as e:
221
+ # st.error(f"Error getting prompt suggestion: {str(e)}")
222
+
223
+ # # === CLEANUP TEMP FILES ===
224
+ # for path in temp_paths:
225
+ # if os.path.exists(path):
226
+ # os.remove(path)
227
+
228
+
229
+ #------updates visonsort chat--------------------------------------------------------------
230
+
231
+ # # ✅ Confidence confirmation button
232
+ # # ✅ Prompt auto-fill with GPT suggestion
233
+ # # ✅ Clear button appears only when media is uploaded
234
+ # # ✅ Sidebar UI cleaned up
235
+ # # ✅ App name centered
236
+ # # ✅ Loading spinner during Streamlit runs
237
+ # # ✅ Fallback cleanup logic (including crash safety)
238
+
239
+ # import os
240
+ # import tempfile
241
+ # import streamlit as st
242
+ # from main import analyze_media, process_with_blip
243
+ # from PIL import Image
244
+ # from dotenv import load_dotenv
245
+ # import openai
246
+
247
+ # # Load OpenAI key from .env file
248
+ # load_dotenv()
249
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
250
+
251
+ # # --- Streamlit Setup ---
252
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
253
+ # st.markdown("<h1 style='text-align: center;'>VisionSort Pro</h1>", unsafe_allow_html=True)
254
+
255
+ # # --- Sidebar UI ---
256
+ # with st.sidebar:
257
+ # st.header("⚙️ Configuration")
258
+ # st.caption("Adjust filtering behavior before analyzing media.")
259
+ # min_conf_slider = st.slider("Confidence Threshold", 0, 100, 25, step=1, key="threshold_slider")
260
+ # confirm_threshold = st.button("Apply Threshold")
261
+ # st.caption("Only frames above this confidence are shown as strong matches.")
262
+
263
+ # # Only update threshold when user confirms
264
+ # if confirm_threshold:
265
+ # st.session_state["confirmed_threshold"] = st.session_state["threshold_slider"]
266
+ # elif "confirmed_threshold" not in st.session_state:
267
+ # st.session_state["confirmed_threshold"] = 25
268
+ # min_confidence = st.session_state["confirmed_threshold"]
269
+
270
+ # # --- Upload Section ---
271
+ # st.markdown("---")
272
+ # uploaded_files = st.file_uploader(
273
+ # "Upload images or a video",
274
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
275
+ # accept_multiple_files=True,
276
+ # key="file_uploader"
277
+ # )
278
+
279
+ # if uploaded_files:
280
+ # if st.button("❌ Clear All"):
281
+ # st.session_state["file_uploader"] = []
282
+
283
+ # # Prompt
284
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'", key="user_prompt")
285
+
286
+ # # Media type check
287
+ # if uploaded_files:
288
+ # exts = {os.path.splitext(f.name)[1].lower() for f in uploaded_files}
289
+ # if {".mp4", ".mov"}.intersection(exts) and {".jpg", ".jpeg", ".png"}.intersection(exts):
290
+ # st.error("⚠️ Please upload only images OR only a video. Mixing is not supported.")
291
+ # st.stop()
292
+
293
+ # # Main app logic
294
+ # if uploaded_files and user_prompt:
295
+ # temp_paths, results = [], {"confident": [], "fallback": []}
296
+
297
+ # with st.spinner("Analyzing media..."):
298
+ # for file in uploaded_files:
299
+ # try:
300
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
301
+ # f.write(file.read())
302
+ # temp_paths.append(f.name)
303
+ # res = analyze_media(f.name, user_prompt, min_confidence)
304
+ # for r in res:
305
+ # group = "confident" if r["confidence"] >= min_confidence else "fallback"
306
+ # results[group].append(r)
307
+ # except Exception as e:
308
+ # st.error(f"Failed to process file {file.name}: {e}")
309
+
310
+ # results["confident"].sort(key=lambda x: x["confidence"], reverse=True)
311
+ # results["fallback"].sort(key=lambda x: x["confidence"], reverse=True)
312
+
313
+ # # Display confident matches
314
+ # if results["confident"]:
315
+ # with st.expander(f"🎯 Confident Matches ({len(results['confident'])})", expanded=True):
316
+ # cols = st.columns(4)
317
+ # for idx, res in enumerate(results["confident"]):
318
+ # with cols[idx % 4]:
319
+ # st.image(Image.open(res["path"]), use_container_width=True)
320
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
321
+ # with st.expander("📌 Details"):
322
+ # st.write(f"**File:** {os.path.basename(res['path'])}")
323
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
324
+ # st.write(f"**Timestamp:** {res['timestamp']:.2f}s")
325
+
326
+ # # Display fallback matches
327
+ # if results["fallback"]:
328
+ # with st.expander(f"⚠️ Fallback Matches ({len(results['fallback'])})", expanded=True):
329
+ # cols = st.columns(4)
330
+ # for idx, res in enumerate(results["fallback"]):
331
+ # with cols[idx % 4]:
332
+ # st.image(Image.open(res["path"]), use_container_width=True)
333
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
334
+ # with st.expander("📌 Details"):
335
+ # st.write(f"**File:** {os.path.basename(res['path'])}")
336
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
337
+ # st.write(f"**Timestamp:** {res['timestamp']:.2f}s")
338
+
339
+ # # Prompt suggestion fallback
340
+ # if not results["confident"]:
341
+ # st.warning("😕 No confident matches found.")
342
+ # try:
343
+ # captions = [process_with_blip(r["path"]) for r in results["fallback"][:3]]
344
+ # suggestion_prompt = openai.ChatCompletion.create(
345
+ # model="gpt-4",
346
+ # messages=[
347
+ # {"role": "system", "content": "Suggest a clearer image prompt from captions."},
348
+ # {"role": "user", "content": "Captions:\n" + "\n".join(captions)}
349
+ # ],
350
+ # max_tokens=50
351
+ # )
352
+
353
+ # suggested = suggestion_prompt.choices[0].message.content.strip()
354
+ # st.info(f"💡 Try this instead: **{suggested}**")
355
+ # if st.button("Use Suggested Prompt"):
356
+ # st.session_state["user_prompt"] = suggested
357
+ # st.rerun()
358
+ # except Exception as e:
359
+ # st.error(f"Error generating prompt suggestion: {str(e)}")
360
+
361
+ # # Clean up temp files even on crash
362
+ # for path in temp_paths:
363
+ # try:
364
+ # if os.path.exists(path):
365
+ # os.remove(path)
366
+ # except Exception as e:
367
+ # st.warning(f"Couldn't delete temp file: {e}")
368
+
369
+ #-----------updated--------------------------------------------------
370
+ #analyze_media now takes frame_interval as a parameter (make sure main.py supports that)
371
+ # The Clear All fix avoids touching Streamlit widgets directly (you can’t modify file uploader state post-init)
372
+ # GPT fallback is untouched for now — we can re-add it in the fallback expander if you want
373
+ # Crash-safe cleanup: added try/except around os.remove() in case files are locked or used elsewhere
374
+ # Streamlit loading speed is mostly I/O-bound; slowing down is likely due to frame extraction or model loading. This optimization helps by skipping unnecessary reloading unless a button is clicked.
375
+
376
+ # import os
377
+ # import tempfile
378
+ # import streamlit as st
379
+ # from main import analyze_media, process_with_blip
380
+ # from PIL import Image
381
+ # from dotenv import load_dotenv
382
+ # import openai
383
+
384
+ # # Initialize OpenAI API
385
+ # from dotenv import load_dotenv
386
+ # load_dotenv()
387
+ # api_key = os.getenv("OPENAI_API_KEY")
388
+ # openai.api_key = api_key
389
+
390
+ # # === App Setup ===
391
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
392
+ # st.markdown("<h1 style='text-align: center;'>VisionSort Pro</h1>", unsafe_allow_html=True)
393
+
394
+ # # === Sidebar ===
395
+ # st.sidebar.title("⚙️ Configuration")
396
+
397
+ # # Confidence slider + apply button
398
+ # confidence = st.sidebar.slider("Confidence Threshold", 0, 100, 25)
399
+ # apply_conf = st.sidebar.button("Apply Threshold")
400
+
401
+ # # Frame sampling
402
+ # frame_interval = st.sidebar.slider("Video Frame Interval (1 = every frame)", 1, 120, 60)
403
+ # apply_frame = st.sidebar.button("Apply Frame Interval")
404
+
405
+ # # Store settings in session_state
406
+ # if apply_conf:
407
+ # st.session_state["min_conf"] = confidence
408
+
409
+ # if apply_frame:
410
+ # st.session_state["frame_interval"] = frame_interval
411
+
412
+ # # Set defaults if not set
413
+ # if "min_conf" not in st.session_state:
414
+ # st.session_state["min_conf"] = 25
415
+ # if "frame_interval" not in st.session_state:
416
+ # st.session_state["frame_interval"] = 60
417
+
418
+ # # === Upload Media ===
419
+ # uploaded_files = st.file_uploader(
420
+ # "Upload images or a video",
421
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
422
+ # accept_multiple_files=True,
423
+ # )
424
+
425
+ # # Clear All
426
+ # if uploaded_files and st.button("❌ Clear All"):
427
+ # uploaded_files.clear() # Clear uploads
428
+ # st.experimental_rerun()
429
+
430
+ # # Prompt
431
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the dog'")
432
+
433
+ # # === Main Logic ===
434
+ # if uploaded_files and user_prompt:
435
+ # temp_paths, results = [], {"confident": [], "fallback": []}
436
+ # st.info("⏳ Processing media... please wait.")
437
+ # with st.spinner("Analyzing..."):
438
+ # for file in uploaded_files:
439
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
440
+ # f.write(file.read())
441
+ # temp_paths.append(f.name)
442
+ # res = analyze_media(f.name, user_prompt,
443
+ # min_confidence=st.session_state["min_conf"],
444
+ # frame_interval=st.session_state["frame_interval"])
445
+ # for r in res:
446
+ # group = "confident" if r["confidence"] >= st.session_state["min_conf"] else "fallback"
447
+ # results[group].append(r)
448
+
449
+ # results["confident"].sort(key=lambda x: x["confidence"], reverse=True)
450
+ # results["fallback"].sort(key=lambda x: x["confidence"], reverse=True)
451
+
452
+ # # === Confident Results ===
453
+ # if results["confident"]:
454
+ # st.subheader(f"🎯 Confident Matches ({len(results['confident'])})")
455
+ # cols = st.columns(4)
456
+ # for idx, res in enumerate(results["confident"]):
457
+ # with cols[idx % 4]:
458
+ # st.image(Image.open(res["path"]), use_container_width=True)
459
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
460
+
461
+ # # === Fallback (Optional Reveal) ===
462
+ # if results["fallback"]:
463
+ # with st.expander(f"⚠️ Show Potential Matches ({len(results['fallback'])})"):
464
+ # fallback_slider = st.slider("Show matches above this confidence", 0, st.session_state["min_conf"], 10)
465
+ # filtered_fallback = [r for r in results["fallback"] if r["confidence"] >= fallback_slider]
466
+ # filtered_fallback.sort(key=lambda x: x["confidence"], reverse=True)
467
+
468
+ # cols = st.columns(4)
469
+ # for idx, res in enumerate(filtered_fallback):
470
+ # with cols[idx % 4]:
471
+ # st.image(Image.open(res["path"]), use_container_width=True)
472
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
473
+
474
+ # # CLEANUP
475
+ # for path in temp_paths:
476
+ # if os.path.exists(path):
477
+ # try:
478
+ # os.remove(path)
479
+ # except Exception:
480
+ # pass
481
+
482
+ #-------------------------------------------------------------------------------------------
483
+ # Metadata display on image click
484
+ # Smart frame interval UI (only for videos)
485
+ # Proper “Clear All” logic
486
+ # Removal of sidebar clutter
487
+ # A working “Apply Frame Interval” flow
488
+ # Confidence-based filtering with “Potential Matches” toggle
489
+ # Download option for selected images
490
+
491
+ # import os
492
+ # import tempfile
493
+ # import streamlit as st
494
+ # from main import analyze_media, process_with_blip
495
+ # from PIL import Image
496
+ # from dotenv import load_dotenv
497
+ # import openai
498
+ # import zipfile
499
+
500
+ # # Load OpenAI key from .env
501
+ # load_dotenv()
502
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
503
+
504
+ # # App title
505
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
506
+
507
+ # # Centered title
508
+ # st.markdown("<h1 style='text-align: center;'>VisionSort Pro</h1>", unsafe_allow_html=True)
509
+
510
+ # # === USER INPUT SECTION ===
511
+ # uploaded_files = st.file_uploader(
512
+ # "Upload images or a video",
513
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
514
+ # accept_multiple_files=True
515
+ # )
516
+
517
+ # if uploaded_files:
518
+ # media_type = "video" if any(f.name.lower().endswith(('.mp4', '.mov')) for f in uploaded_files) else "image"
519
+ # else:
520
+ # media_type = None
521
+
522
+ # # Only show frame interval if video uploaded
523
+ # if media_type == "video":
524
+ # frame_interval = st.slider("Video Frame Interval (1 = every frame)", 1, 120, 30)
525
+ # if st.button("Apply Frame Interval"):
526
+ # st.session_state["frame_ready"] = True
527
+ # else:
528
+ # frame_interval = None
529
+
530
+ # # Prompt
531
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
532
+
533
+ # # Clear All Button
534
+ # if st.button("❌ Clear All"):
535
+ # st.session_state["clear_all"] = True
536
+
537
+ # if st.session_state.get("clear_all"):
538
+ # uploaded_files = []
539
+ # st.session_state["clear_all"] = False
540
+
541
+ # # === MAIN LOGIC ===
542
+ # if uploaded_files and user_prompt and (media_type == "image" or st.session_state.get("frame_ready")):
543
+ # st.info("⏳ Processing media... please wait.")
544
+ # temp_paths, all_results = [], []
545
+
546
+ # for file in uploaded_files:
547
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
548
+ # f.write(file.read())
549
+ # temp_path = f.name
550
+ # temp_paths.append(temp_path)
551
+ # res = analyze_media(temp_path, user_prompt, frame_interval=frame_interval if frame_interval else 30)
552
+ # all_results.extend(res)
553
+
554
+ # # Split results
555
+ # min_confidence = 25
556
+ # confident_results = [r for r in all_results if r["confidence"] >= min_confidence]
557
+ # potential_results = [r for r in all_results if r["confidence"] < min_confidence]
558
+
559
+ # # Hide processing state
560
+ # st.empty()
561
+
562
+ # # === CONFIDENT RESULTS ===
563
+ # if confident_results:
564
+ # st.subheader(f"🎯 Confident Matches ({len(confident_results)})")
565
+ # selected = st.multiselect("Select images to download", [r["path"] for r in confident_results], key="confident_select")
566
+ # if st.button("📥 Download Selected"):
567
+ # zip_path = "selected_images.zip"
568
+ # with zipfile.ZipFile(zip_path, "w") as zipf:
569
+ # for p in selected:
570
+ # zipf.write(p, arcname=os.path.basename(p))
571
+ # with open(zip_path, "rb") as f:
572
+ # st.download_button("Download ZIP", f, file_name="selected_images.zip")
573
+ # os.remove(zip_path)
574
+
575
+ # cols = st.columns(4)
576
+ # for idx, res in enumerate(confident_results):
577
+ # with cols[idx % 4]:
578
+ # if st.button("Show Details", key=f"detail-{idx}"):
579
+ # st.image(Image.open(res["path"]), use_container_width=True)
580
+ # st.caption(f"🕒 {res['timestamp']:.2f}s | 📊 {res['confidence']:.1f}%")
581
+ # st.markdown(f"**File:** {os.path.basename(res['path'])}")
582
+ # st.markdown(f"**Confidence:** {res['confidence']:.1f}%")
583
+ # st.markdown(f"**Timestamp:** {res['timestamp']:.2f}s")
584
+
585
+ # # === POTENTIAL RESULTS ===
586
+ # if potential_results:
587
+ # if st.checkbox("Show Potential Matches (below threshold)"):
588
+ # min_potential = st.slider("Minimum confidence to show", 5, min_confidence - 1, 10)
589
+ # filtered = [r for r in potential_results if r["confidence"] >= min_potential]
590
+ # filtered.sort(key=lambda x: x["confidence"], reverse=True)
591
+
592
+ # with st.expander(f"🌀 Potential Matches ({len(filtered)})", expanded=True):
593
+ # for r in filtered:
594
+ # try:
595
+ # caption = process_with_blip(r["path"])
596
+ # st.image(Image.open(r["path"]), use_container_width=True)
597
+ # st.caption(f"{caption}")
598
+ # st.write(f"🕒 {r['timestamp']:.2f}s | 📊 {r['confidence']:.1f}%")
599
+ # except Exception:
600
+ # st.write("⚠️ BLIP captioning failed")
601
+
602
+
603
+ # # === CLEANUP TEMP FILES ===
604
+ # for path in temp_paths:
605
+ # if os.path.exists(path):
606
+ # try:
607
+ # os.remove(path)
608
+ # except Exception as e:
609
+ # st.warning(f"Could not delete: {path}")
610
+ #DEEPSEEK UPDATES-----------------------------------------------------------------------------------------------------
611
+ # import os
612
+ # import tempfile
613
+ # import streamlit as st
614
+ # from main import analyze_media, process_with_blip
615
+ # from PIL import Image
616
+ # from dotenv import load_dotenv
617
+ # import openai
618
+ # import zipfile
619
+ # import time
620
+
621
+ # # Load OpenAI key from .env
622
+ # load_dotenv()
623
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
624
+
625
+ # # App title and config
626
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
627
+ # st.markdown("<h1 style='text-align: center;'>VisionSort Pro</h1>", unsafe_allow_html=True)
628
+
629
+ # # Initialize session state for selections
630
+ # if 'selected_images' not in st.session_state:
631
+ # st.session_state.selected_images = set()
632
+ # if 'selection_mode' not in st.session_state:
633
+ # st.session_state.selection_mode = False
634
+
635
+ # # === USER INPUT SECTION ===
636
+ # uploaded_files = st.file_uploader(
637
+ # "Upload images or a video",
638
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
639
+ # accept_multiple_files=True
640
+ # )
641
+
642
+ # if uploaded_files:
643
+ # media_type = "video" if any(f.name.lower().endswith(('.mp4', '.mov')) for f in uploaded_files) else "image"
644
+ # else:
645
+ # media_type = None
646
+
647
+ # # Frame interval for videos
648
+ # frame_interval = st.slider("Video Frame Interval (frames to skip)", 1, 120, 30) if media_type == "video" else None
649
+
650
+ # # Prompt input
651
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
652
+
653
+ # # === MAIN PROCESSING ===
654
+ # if uploaded_files and user_prompt:
655
+ # st.info("⏳ Processing media... please wait.")
656
+ # temp_paths, all_results = [], []
657
+
658
+ # progress_bar = st.progress(0)
659
+ # for i, file in enumerate(uploaded_files):
660
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
661
+ # f.write(file.read())
662
+ # temp_path = f.name
663
+ # temp_paths.append(temp_path)
664
+ # res = analyze_media(temp_path, user_prompt, frame_interval=frame_interval if frame_interval else 30)
665
+ # all_results.extend(res)
666
+ # progress_bar.progress((i + 1) / len(uploaded_files))
667
+
668
+ # progress_bar.empty()
669
+
670
+ # # Split results
671
+ # min_confidence = 25
672
+ # confident_results = [r for r in all_results if r["confidence"] >= min_confidence]
673
+ # potential_results = [r for r in all_results if r["confidence"] < min_confidence]
674
+
675
+ # # === SELECTION CONTROLS ===
676
+ # col1, col2, col3 = st.columns([1, 1, 2])
677
+ # with col1:
678
+ # if st.button("🔘 Toggle Selection Mode"):
679
+ # st.session_state.selection_mode = not st.session_state.selection_mode
680
+ # with col2:
681
+ # if st.session_state.selection_mode:
682
+ # if st.button("📌 Select All"):
683
+ # st.session_state.selected_images.update(r["path"] for r in all_results)
684
+ # if st.button("❌ Deselect All"):
685
+ # st.session_state.selected_images.clear()
686
+
687
+ # # === CONFIDENT RESULTS ===
688
+ # if confident_results:
689
+ # st.subheader(f"🎯 Confident Matches ({len(confident_results)})")
690
+
691
+ # # Display in 5-column grid
692
+ # cols = st.columns(5)
693
+ # for idx, res in enumerate(confident_results):
694
+ # with cols[idx % 5]:
695
+ # img = Image.open(res["path"])
696
+
697
+ # # Selection overlay
698
+ # is_selected = res["path"] in st.session_state.selected_images
699
+ # if st.session_state.selection_mode:
700
+ # st.checkbox(
701
+ # f"Select {os.path.basename(res['path'])}",
702
+ # value=is_selected,
703
+ # key=f"select_conf_{idx}",
704
+ # on_change=lambda idx=idx, path=res["path"]: st.session_state.selected_images.add(path) if st.session_state[f"select_conf_{idx}"] else st.session_state.selected_images.discard(path)
705
+ # )
706
+
707
+ # # Display image with optional selection highlight
708
+ # if is_selected:
709
+ # st.markdown("<div style='border: 3px solid #4CAF50; padding: 5px; border-radius: 5px;'>", unsafe_allow_html=True)
710
+
711
+ # st.image(img, use_container_width=True)
712
+
713
+ # if is_selected:
714
+ # st.markdown("</div>", unsafe_allow_html=True)
715
+
716
+ # # Show details button
717
+ # if st.button(f"Details {idx+1}", key=f"detail_conf_{idx}"):
718
+ # st.image(img, width=400)
719
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
720
+ # if res['timestamp'] > 0:
721
+ # mins, secs = divmod(res['timestamp'], 60)
722
+ # st.write(f"**Timestamp:** {int(mins):02d}:{secs:05.2f}")
723
+ # if res['datetime']:
724
+ # st.write(f"**Date Taken:** {res['datetime'].strftime('%Y-%m-%d %H:%M:%S')}")
725
+
726
+ # # === POTENTIAL RESULTS ===
727
+ # if potential_results:
728
+ # with st.expander(f"🌀 Potential Matches ({len(potential_results)})", expanded=False):
729
+ # # Display in 5-column grid
730
+ # cols = st.columns(5)
731
+ # for idx, res in enumerate(potential_results):
732
+ # with cols[idx % 5]:
733
+ # try:
734
+ # img = Image.open(res["path"])
735
+
736
+ # # Selection overlay
737
+ # is_selected = res["path"] in st.session_state.selected_images
738
+ # if st.session_state.selection_mode:
739
+ # st.checkbox(
740
+ # f"Select {os.path.basename(res['path'])}",
741
+ # value=is_selected,
742
+ # key=f"select_pot_{idx}",
743
+ # on_change=lambda idx=idx, path=res["path"]: st.session_state.selected_images.add(path) if st.session_state[f"select_pot_{idx}"] else st.session_state.selected_images.discard(path)
744
+ # )
745
+
746
+ # # Display image with optional selection highlight
747
+ # if is_selected:
748
+ # st.markdown("<div style='border: 3px solid #FFA500; padding: 5px; border-radius: 5px;'>", unsafe_allow_html=True)
749
+
750
+ # st.image(img, use_container_width=True)
751
+
752
+ # if is_selected:
753
+ # st.markdown("</div>", unsafe_allow_html=True)
754
+
755
+ # # Show details button
756
+ # if st.button(f"Details P{idx+1}", key=f"detail_pot_{idx}"):
757
+ # st.image(img, width=400)
758
+ # caption = process_with_blip(res["path"])
759
+ # st.write(f"**BLIP Caption:** {caption}")
760
+ # st.write(f"**Confidence:** {res['confidence']:.1f}%")
761
+ # if res['timestamp'] > 0:
762
+ # mins, secs = divmod(res['timestamp'], 60)
763
+ # st.write(f"**Timestamp:** {int(mins):02d}:{secs:05.2f}")
764
+ # except Exception as e:
765
+ # st.error(f"Error displaying image: {e}")
766
+
767
+ # # === DOWNLOAD SELECTED ===
768
+ # if st.session_state.selected_images:
769
+ # if st.button("📥 Download Selected"):
770
+ # zip_path = "selected_images.zip"
771
+ # with zipfile.ZipFile(zip_path, "w") as zipf:
772
+ # for path in st.session_state.selected_images:
773
+ # if os.path.exists(path):
774
+ # zipf.write(path, arcname=os.path.basename(path))
775
+
776
+ # with open(zip_path, "rb") as f:
777
+ # st.download_button(
778
+ # "Download ZIP",
779
+ # f,
780
+ # file_name="selected_images.zip",
781
+ # mime="application/zip"
782
+ # )
783
+ # os.remove(zip_path)
784
+
785
+ # # === CLEAR ALL BUTTON ===
786
+ # if st.button("🧹 Clear All"):
787
+ # st.session_state.clear()
788
+ # st.experimental_rerun()
789
+
790
+ # # Cleanup temp files
791
+ # for path in temp_paths:
792
+ # if os.path.exists(path):
793
+ # try:
794
+ # os.remove(path)
795
+ # except Exception as e:
796
+ # print(f"Could not delete temp file: {e}")
797
+
798
+ #GPT UPDATE-----------------------------------------------------------------------------------------------------------------------
799
+
800
+ # import os
801
+ # import tempfile
802
+ # import streamlit as st
803
+ # from main import analyze_media, process_with_blip
804
+ # from PIL import Image
805
+ # from datetime import datetime
806
+ # from dotenv import load_dotenv
807
+ # import shutil
808
+
809
+ # load_dotenv()
810
+ # import openai
811
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
812
+
813
+ # # App layout config
814
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
815
+ # st.markdown("<h1 style='text-align: center;'>VisionSort Pro</h1>", unsafe_allow_html=True)
816
+
817
+ # # Session state setup
818
+ # if "selection_mode" not in st.session_state:
819
+ # st.session_state.selection_mode = False
820
+ # if "selected" not in st.session_state:
821
+ # st.session_state.selected = set()
822
+ # if "clear_trigger" not in st.session_state:
823
+ # st.session_state.clear_trigger = False
824
+
825
+ # # Sidebar was removed — all controls below the upload
826
+ # uploaded_files = st.file_uploader(
827
+ # "Upload images or a video",
828
+ # type=["jpg", "jpeg", "png", "mp4", "mov"],
829
+ # accept_multiple_files=True,
830
+ # key="media_upload"
831
+ # )
832
+
833
+ # # Frame Interval (only show if video is detected)
834
+ # frame_interval = 30
835
+ # if uploaded_files:
836
+ # if any(file.name.endswith(('.mp4', '.mov')) for file in uploaded_files):
837
+ # frame_interval = st.slider("Video Frame Interval (1 = every frame)", 1, 120, 30)
838
+ # if st.button("Apply Frame Interval"):
839
+ # st.session_state.frame_interval = frame_interval
840
+
841
+ # # Prompt
842
+ # user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
843
+
844
+ # # Clear All button
845
+ # if st.button("❌ Clear All"):
846
+ # st.session_state.clear_trigger = True
847
+ # st.session_state.selected.clear()
848
+
849
+ # # Main analysis logic
850
+ # if st.session_state.clear_trigger:
851
+ # uploaded_files = []
852
+ # st.session_state.clear_trigger = False
853
+
854
+ # if uploaded_files and user_prompt:
855
+ # st.info("⏳ Processing media... please wait.")
856
+ # temp_paths = []
857
+ # all_results = []
858
+
859
+ # for file in uploaded_files:
860
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
861
+ # f.write(file.read())
862
+ # temp_paths.append(f.name)
863
+ # results = analyze_media(f.name, user_prompt, min_confidence=25, frame_interval=st.session_state.get("frame_interval", 30))
864
+ # all_results.extend(results)
865
+
866
+ # confident = [r for r in all_results if r["confidence"] >= 25]
867
+ # potential = [r for r in all_results if r["confidence"] < 25]
868
+ # confident.sort(key=lambda x: x["confidence"], reverse=True)
869
+ # potential.sort(key=lambda x: x["confidence"], reverse=True)
870
+
871
+ # # Global select toggle
872
+ # st.subheader(f"🎯 Confident Matches ({len(confident)})")
873
+ # col1, col2 = st.columns([1, 6])
874
+ # with col1:
875
+ # if st.button("Select"):
876
+ # st.session_state.selection_mode = not st.session_state.selection_mode
877
+ # with col2:
878
+ # if st.session_state.selection_mode:
879
+ # if st.button("Select All" if len(st.session_state.selected) < len(confident) + len(potential) else "Deselect All"):
880
+ # if len(st.session_state.selected) < len(confident) + len(potential):
881
+ # st.session_state.selected = {r["path"] for r in confident + potential}
882
+ # else:
883
+ # st.session_state.selected.clear()
884
+
885
+ # # Download logic
886
+ # if st.session_state.selection_mode and st.session_state.selected:
887
+ # if st.download_button("⬇️ Download Selected", data=b"", file_name="selected_placeholder.txt"):
888
+ # for path in st.session_state.selected:
889
+ # shutil.copy(path, os.path.join(os.getcwd(), os.path.basename(path)))
890
+
891
+ # # Display confident matches
892
+ # cols = st.columns(5)
893
+ # for idx, r in enumerate(confident):
894
+ # with cols[idx % 5]:
895
+ # img = Image.open(r["path"])
896
+ # if st.session_state.selection_mode:
897
+ # if st.button("✅" if r["path"] in st.session_state.selected else "☐", key=f"sel_{r['path']}"):
898
+ # if r["path"] in st.session_state.selected:
899
+ # st.session_state.selected.remove(r["path"])
900
+ # else:
901
+ # st.session_state.selected.add(r["path"])
902
+ # st.image(img, use_container_width=True)
903
+ # if st.button("Show Details", key=f"meta_conf_{idx}"):
904
+ # st.write(f"🕒 {r['timestamp']:.2f}s")
905
+ # st.write(f"📊 {r['confidence']:.1f}%")
906
+
907
+ # # Low confidence section
908
+ # if st.checkbox("Show Potential Matches (below threshold)"):
909
+ # st.subheader(f"🌀 Potential Matches ({len(potential)})")
910
+ # min_potential = st.slider("Minimum confidence to show", 1, 24, 10)
911
+ # filtered = [r for r in potential if r["confidence"] >= min_potential]
912
+ # cols = st.columns(5)
913
+ # for idx, r in enumerate(filtered):
914
+ # with cols[idx % 5]:
915
+ # try:
916
+ # caption = process_with_blip(r["path"])
917
+ # st.image(Image.open(r["path"]), use_container_width=True)
918
+ # st.caption(f"{caption}")
919
+ # st.caption(f"🕒 {r['timestamp']:.2f}s | 📊 {r['confidence']:.1f}%")
920
+ # except:
921
+ # st.caption("⚠️ BLIP captioning failed")
922
+
923
+ # # Cleanup
924
+ # for path in temp_paths:
925
+ # if os.path.exists(path):
926
+ # try:
927
+ # os.remove(path)
928
+ # except Exception as e:
929
+ # st.warning(f"Could not delete: {path}")
930
+ #GPT cleanup new python---------------------------------------------------------------------------------------------------------------------------
931
+ # vision_sort_pro.py (COMPLETE: Spec-Matching Version)
932
+
933
+ # import os
934
+ # import tempfile
935
+ # import shutil
936
+ # from PIL import Image
937
+ # import streamlit as st
938
+ # from main import analyze_media, process_with_blip
939
+ # from sentence_transformers import SentenceTransformer, util
940
+ # from spellchecker import SpellChecker
941
+
942
+ # # Page Configuration
943
+ # st.set_page_config(layout="wide", page_title="VisionSort Pro")
944
+ # st.markdown("<h1 style='text-align: center;'>Vision Sort</h1>", unsafe_allow_html=True)
945
+
946
+ # # Init NLP models
947
+ # spell = SpellChecker()
948
+ # embedder = SentenceTransformer("all-MiniLM-L6-v2")
949
+
950
+ # # Session State Initialization
951
+ # if "selection_mode" not in st.session_state:
952
+ # st.session_state.selection_mode = False
953
+ # if "selected" not in st.session_state:
954
+ # st.session_state.selected = set()
955
+ # if "frame_interval" not in st.session_state:
956
+ # st.session_state.frame_interval = 30
957
+
958
+ # # Upload & Media Handling
959
+ # uploaded_files = st.file_uploader("Upload images or a video", type=["jpg", "jpeg", "png", "mp4", "mov"], accept_multiple_files=True)
960
+ # mixed_upload = False
961
+ # video_uploaded = False
962
+
963
+ # if uploaded_files:
964
+ # extensions = {os.path.splitext(f.name)[1].lower() for f in uploaded_files}
965
+ # if any(ext in extensions for ext in [".mp4", ".mov"]):
966
+ # video_uploaded = True
967
+ # if len(extensions) > 1 and video_uploaded:
968
+ # mixed_upload = True
969
+ # st.error("🚨 Please upload either images *or* a video. Mixed uploads are not supported.")
970
+
971
+ # if video_uploaded and not mixed_upload:
972
+ # st.session_state.frame_interval = st.slider("Video Frame Interval (1 = every frame)", 1, 120, 30, key="video_interval")
973
+
974
+ # # Prompt Input Section
975
+ # user_prompt = st.text_input("Search for a scene or object...", placeholder="e.g. find the cat")
976
+
977
+
978
+ # # Clear Button (only shown if uploads exist)
979
+ # if uploaded_files:
980
+ # if st.button("Clear All"):
981
+ # st.session_state.selected.clear()
982
+ # uploaded_files.clear()
983
+
984
+ # # Main Logic
985
+ # if uploaded_files and user_prompt and not mixed_upload:
986
+ # st.info("⏳ Processing media... please wait.")
987
+ # all_results, temp_paths = [], []
988
+
989
+ # for file in uploaded_files:
990
+ # with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as tmp:
991
+ # tmp.write(file.read())
992
+ # temp_paths.append(tmp.name)
993
+ # results = analyze_media(tmp.name, user_prompt, frame_interval=st.session_state.frame_interval)
994
+ # all_results.extend(results)
995
+
996
+ # confident = [r for r in all_results if r["confidence"] >= 25]
997
+ # potential = [r for r in all_results if 15 <= r["confidence"] < 25]
998
+ # confident.sort(key=lambda x: x["confidence"], reverse=True)
999
+ # potential.sort(key=lambda x: x["confidence"], reverse=True)
1000
+
1001
+ # if not confident:
1002
+ # st.warning("No confident matches found. Want a closer look?")
1003
+ # st.session_state.show_potential = True
1004
+
1005
+ # st.subheader(f"✅ Confident Matches ({len(confident)})")
1006
+ # if confident:
1007
+ # col1, col2 = st.columns([1, 6])
1008
+ # with col1:
1009
+ # st.session_state.selection_mode = st.toggle("Select Mode", value=st.session_state.selection_mode)
1010
+ # with col2:
1011
+ # if st.session_state.selection_mode:
1012
+ # if st.button("Select All"):
1013
+ # st.session_state.selected = {r["path"] for r in confident + potential}
1014
+ # if st.button("Deselect All"):
1015
+ # st.session_state.selected.clear()
1016
+
1017
+ # cols = st.columns(5)
1018
+ # for idx, r in enumerate(confident):
1019
+ # with cols[idx % 5]:
1020
+ # st.image(Image.open(r["path"]), use_container_width=True)
1021
+ # if st.session_state.selection_mode:
1022
+ # toggle_label = "✅" if r["path"] in st.session_state.selected else "☐"
1023
+ # if st.button(toggle_label, key=f"select_{r['path']}"):
1024
+ # if r["path"] in st.session_state.selected:
1025
+ # st.session_state.selected.remove(r["path"])
1026
+ # else:
1027
+ # st.session_state.selected.add(r["path"])
1028
+ # if st.button("Show Details", key=f"conf_details_{idx}"):
1029
+ # st.write(f"🕒 {r['timestamp']:.2f}s | 📊 {r['confidence']:.1f}%")
1030
+
1031
+ # # Show Low Confidence Section
1032
+ # show_potential = st.session_state.get("show_potential", False)
1033
+ # if show_potential or st.checkbox("⚠️ Show Potential Matches (below threshold)"):
1034
+ # min_thresh = st.slider("Min confidence to show", 15, 24, 20)
1035
+ # filtered = [r for r in potential if r["confidence"] >= min_thresh]
1036
+ # st.subheader(f"🌀 Potential Matches ({len(filtered)})")
1037
+ # cols = st.columns(5)
1038
+ # captions = []
1039
+ # for idx, r in enumerate(filtered):
1040
+ # with cols[idx % 5]:
1041
+ # img = Image.open(r["path"])
1042
+ # st.image(img, use_container_width=True)
1043
+ # caption = process_with_blip(r["path"])
1044
+ # captions.append((caption, r["path"]))
1045
+ # st.caption(f"{caption}\n🕒 {r['timestamp']:.2f}s | 📊 {r['confidence']:.1f}%")
1046
+
1047
+ #----------#GPT RPROMPT TUNING-----------------------------------------------------------------------------------------------
1048
+ # # Prompt Tuning with GPT-like logic
1049
+ # corrected_prompt = " ".join([spell.correction(word) for word in user_prompt.split()])
1050
+ # user_embed = embedder.encode(corrected_prompt, convert_to_tensor=True)
1051
+ # caption_texts = [c[0] for c in captions]
1052
+ # caption_embeds = embedder.encode(caption_texts, convert_to_tensor=True)
1053
+ # sims = util.pytorch_cos_sim(user_embed, caption_embeds)[0]
1054
+ # ranked = sorted(zip(caption_texts, sims, captions), key=lambda x: x[1], reverse=True)
1055
+ # top_captions = [r[0] for r in ranked[:5]]
1056
+
1057
+ # st.markdown("---")
1058
+ # st.markdown(f"**Prompt Assitant:**\nUser prompt: \"{user_prompt}\" → Corrected: \"{corrected_prompt}\"")
1059
+ # st.markdown("**Image Captions Most Similar:**")
1060
+ # for cap in top_captions:
1061
+ # st.markdown(f"- {cap}")
1062
+
1063
+ # # Suggest new prompts
1064
+ # suggestions = [
1065
+ # f"Find a scene showing {cap.split()[0]}..." for cap in top_captions if len(cap.split()) > 1
1066
+ # ][:3]
1067
+ # if suggestions:
1068
+ # new_prompt = st.selectbox("💡 Try a refined prompt?", suggestions)
1069
+ # if st.button("🔁 Re-run with refined prompt"):
1070
+ # st.experimental_rerun()
1071
+
1072
+ # # Download Selected
1073
+ # if st.session_state.selection_mode and st.session_state.selected:
1074
+ # if st.download_button("⬇️ Download Selected", data=b"", file_name="selected_placeholder.txt"):
1075
+ # for path in st.session_state.selected:
1076
+ # shutil.copy(path, os.path.join(os.getcwd(), os.path.basename(path)))
1077
+
1078
+ # # Cleanup Temporary Files
1079
+ # for path in temp_paths:
1080
+ # if os.path.exists(path):
1081
+ # try:
1082
+ # os.remove(path)
1083
+ # except Exception as e:
1084
+ # st.warning(f"⚠️ Could not delete temporary file: {path}")
1085
+
1086
+ #------------GITHUB CODE ORIGINAL--------------------------------------------------------------------------------------------------------
_main.py_archive_old_versions.py ADDED
@@ -0,0 +1,607 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 🗃️ ARCHIVED CODE — Not used in the final submitted app
3
+
4
+ This file contains earlier experimental versions and alternative implementations
5
+ of the VisionSort app. It includes:
6
+
7
+ - Initial UI structures that were later refactored
8
+ - GPT-4 prompt suggestion and fallback logic (commented out)
9
+ - BLIP captioning integration attempts (eventually removed)
10
+ - Other design variations and logic blocks
11
+
12
+ These sections were removed from main.py and app.py to simplify the final submission,
13
+ but are preserved here to document the development process, thought flow, and future plans.
14
+
15
+ Do not import or execute this file — it is for reference only.
16
+ """
17
+
18
+ # #Imports
19
+ # import os
20
+ # import cv2
21
+ # import torch
22
+ # import clip
23
+ # import openai
24
+ # from PIL import Image
25
+ # from datetime import datetime
26
+ # from functools import lru_cache
27
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
28
+
29
+ # # Initialize OpenAI API
30
+ # from dotenv import load_dotenv
31
+ # load_dotenv()
32
+ # api_key = os.getenv("OPENAI_API_KEY")
33
+ # openai.api_key = api_key
34
+
35
+ # # Initialize models
36
+ # device = "cuda" if torch.cuda.is_available() else "cpu"
37
+ # clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
38
+ # blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
39
+ # blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
40
+
41
+ # # Video processing
42
+ # def extract_frames(video_path, frame_interval=30):
43
+ # frames = []
44
+ # vidcap = cv2.VideoCapture(video_path)
45
+ # fps = vidcap.get(cv2.CAP_PROP_FPS)
46
+ # total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
47
+
48
+ # for i in range(0, total_frames, frame_interval):
49
+ # vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
50
+ # success, frame = vidcap.read()
51
+ # if success:
52
+ # frame_path = f"temp_frame_{i}.jpg"
53
+ # cv2.imwrite(frame_path, frame)
54
+ # frames.append(frame_path)
55
+ # vidcap.release()
56
+ # return frames, fps
57
+
58
+ # @lru_cache(maxsize=100)
59
+ # def process_with_blip(image_path):
60
+ # try:
61
+ # image = Image.open(image_path).convert("RGB")
62
+ # inputs = blip_processor(image, return_tensors="pt").to(device)
63
+ # caption = blip_model.generate(**inputs, max_new_tokens=50)[0]
64
+ # return blip_processor.decode(caption, skip_special_tokens=True)
65
+ # except Exception as e:
66
+ # return f"Error: {str(e)}"
67
+
68
+ # def analyze_media(file_path, prompt, min_confidence=25):
69
+ # # Handle both images and videos
70
+ # if file_path.endswith(('.mp4', '.mov')):
71
+ # frame_paths, fps = extract_frames(file_path)
72
+ # timestamps = [i/fps for i in range(0, len(frame_paths)*30, 30)]
73
+ # else:
74
+ # frame_paths = [file_path]
75
+ # timestamps = [0]
76
+
77
+ # results = []
78
+ # for path, timestamp in zip(frame_paths, timestamps):
79
+ # try:
80
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
81
+ # text = clip.tokenize([prompt]).to(device)
82
+
83
+ # with torch.no_grad():
84
+ # image_features = clip_model.encode_image(image)
85
+ # text_features = clip_model.encode_text(text)
86
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
87
+
88
+ # confidence = similarity.item() * 100
89
+ # result = {
90
+ # "path": path,
91
+ # "confidence": confidence,
92
+ # "timestamp": timestamp,
93
+ # "source": "CLIP",
94
+ # "status": "confident" if confidence >= min_confidence else "fallback"
95
+ # }
96
+ # results.append(result)
97
+ # except Exception as e:
98
+ # print(f"[ERROR] Processing frame failed: {e}")
99
+ # return results
100
+
101
+
102
+ #------updates^ original visonSort chat--------------------------------------------------------------
103
+ # We can simplify analyze_media() like this:
104
+
105
+ # ✅ Key Changes:
106
+ # diff
107
+ # Copy
108
+ # Edit
109
+ # def analyze_media(file_path, prompt, min_confidence=25):
110
+ # - borderline_range = (15, 25) # ❌ remove this
111
+ # ...
112
+ # - "status": (
113
+ # - "high_confidence" if confidence >= min_confidence else
114
+ # - "borderline" if confidence >= borderline_range[0] else
115
+ # - "low_confidence"
116
+ # - )
117
+ # + "status": "confident" if confidence >= min_confidence else "fallback"
118
+ #This will align it with the refactored logic in app.py, making your data flow more consistent and easier to debug.#
119
+
120
+ #-------------------Below updates visionsort chat-----------------------------------------------------------------------------
121
+ # analyze_media() now accepts and passes frame_interval directly to extract_frames()
122
+ # Frame timestamps are correctly calculated based on your chosen interval
123
+ # Still supports both images and videos without breaking compatibility
124
+ # Cleanup-safe and GPU-friendly if available
125
+ # import os
126
+ # import cv2
127
+ # import torch
128
+ # import clip
129
+ # import openai
130
+ # from PIL import Image
131
+ # from datetime import datetime
132
+ # from functools import lru_cache
133
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
134
+
135
+ # # Initialize OpenAI API
136
+ # from dotenv import load_dotenv
137
+ # load_dotenv()
138
+ # api_key = os.getenv("OPENAI_API_KEY")
139
+ # openai.api_key = api_key
140
+
141
+ # # Init device & models
142
+ # device = "cuda" if torch.cuda.is_available() else "cpu"
143
+ # clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
144
+ # blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
145
+ # blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
146
+
147
+ # # === Video Frame Extractor ===
148
+ # def extract_frames(video_path, frame_interval=60):
149
+ # frames = []
150
+ # vidcap = cv2.VideoCapture(video_path)
151
+ # fps = vidcap.get(cv2.CAP_PROP_FPS)
152
+ # total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
153
+
154
+ # for i in range(0, total_frames, frame_interval):
155
+ # vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
156
+ # success, frame = vidcap.read()
157
+ # if success:
158
+ # frame_path = f"temp_frame_{i}.jpg"
159
+ # cv2.imwrite(frame_path, frame)
160
+ # frames.append(frame_path)
161
+ # vidcap.release()
162
+ # return frames, fps
163
+
164
+ # # === BLIP Captioning ===
165
+ # @lru_cache(maxsize=100)
166
+ # def process_with_blip(image_path):
167
+ # try:
168
+ # image = Image.open(image_path).convert("RGB")
169
+ # inputs = blip_processor(image, return_tensors="pt").to(device)
170
+ # caption = blip_model.generate(**inputs, max_new_tokens=50)[0]
171
+ # return blip_processor.decode(caption, skip_special_tokens=True)
172
+ # except Exception as e:
173
+ # return f"Error: {str(e)}"
174
+
175
+ # === Main Inference Logic ===
176
+ # def analyze_media(file_path, prompt, min_confidence=25, frame_interval=60):
177
+ # # Choose logic based on media type
178
+ # if file_path.endswith(('.mp4', '.mov')):
179
+ # frame_paths, fps = extract_frames(file_path, frame_interval)
180
+ # timestamps = [i/fps for i in range(0, len(frame_paths)*frame_interval, frame_interval)]
181
+ # else:
182
+ # frame_paths = [file_path]
183
+ # timestamps = [0]
184
+
185
+ # results = []
186
+ # for path, timestamp in zip(frame_paths, timestamps):
187
+ # try:
188
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
189
+ # text = clip.tokenize([prompt]).to(device)
190
+
191
+ # with torch.no_grad():
192
+ # image_features = clip_model.encode_image(image)
193
+ # text_features = clip_model.encode_text(text)
194
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
195
+
196
+ # confidence = similarity.item() * 100
197
+ # result = {
198
+ # "path": path,
199
+ # "confidence": confidence,
200
+ # "timestamp": timestamp,
201
+ # "source": "CLIP",
202
+ # "status": "confident" if confidence >= min_confidence else "fallback"
203
+ # }
204
+ # results.append(result)
205
+ # except Exception as e:
206
+ # print(f"[ERROR] Failed on {path}: {e}")
207
+ # return results
208
+
209
+
210
+ # def analyze_media(file_path, prompt, min_confidence=25, frame_interval=30):
211
+ # # Handle both images and videos
212
+ # if file_path.endswith(('.mp4', '.mov')):
213
+ # frame_paths, fps = extract_frames(file_path, frame_interval)
214
+ # timestamps = [i / fps for i in range(len(frame_paths))]
215
+ # else:
216
+ # frame_paths = [file_path]
217
+ # timestamps = [0]
218
+
219
+ # results = []
220
+ # for path, timestamp in zip(frame_paths, timestamps):
221
+ # try:
222
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
223
+ # text = clip.tokenize([prompt]).to(device)
224
+
225
+ # with torch.no_grad():
226
+ # image_features = clip_model.encode_image(image)
227
+ # text_features = clip_model.encode_text(text)
228
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
229
+
230
+ # confidence = similarity.item() * 100
231
+ # result = {
232
+ # "path": path,
233
+ # "confidence": confidence,
234
+ # "timestamp": timestamp,
235
+ # "source": "CLIP",
236
+ # "status": "confident" if confidence >= min_confidence else "fallback"
237
+ # }
238
+ # results.append(result)
239
+ # except Exception as e:
240
+ # print(f"[ERROR] Processing frame failed: {e}")
241
+ # return results
242
+
243
+ #DEEPSEEK UPDATES testing------------------------------------------------------------------------------------------------------------
244
+ # import os
245
+ # import cv2
246
+ # import torch
247
+ # import clip
248
+ # import openai
249
+ # from PIL import Image, ExifTags
250
+ # from datetime import datetime
251
+ # from functools import lru_cache
252
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
253
+
254
+ # # Initialize OpenAI API
255
+ # from dotenv import load_dotenv
256
+ # load_dotenv()
257
+ # api_key = os.getenv("OPENAI_API_KEY")
258
+ # openai.api_key = api_key
259
+
260
+ # # Init device & models
261
+ # device = "cuda" if torch.cuda.is_available() else "cpu"
262
+ # clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
263
+ # blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base", use_fast=True) # Fix for warning
264
+ # blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
265
+
266
+ # def get_image_datetime(image_path):
267
+ # """Extract datetime from image EXIF data if available"""
268
+ # try:
269
+ # img = Image.open(image_path)
270
+ # if hasattr(img, '_getexif'):
271
+ # exif = img._getexif()
272
+ # if exif:
273
+ # for tag, value in exif.items():
274
+ # if tag in ExifTags.TAGS and ExifTags.TAGS[tag] == 'DateTimeOriginal':
275
+ # return datetime.strptime(value, '%Y:%m:%d %H:%M:%S')
276
+ # except Exception:
277
+ # pass
278
+ # return None
279
+
280
+ # def extract_frames(video_path, frame_interval=30):
281
+ # """Improved video frame extraction with better error handling"""
282
+ # frames = []
283
+ # timestamps = []
284
+
285
+ # try:
286
+ # vidcap = cv2.VideoCapture(video_path)
287
+ # if not vidcap.isOpened():
288
+ # raise ValueError(f"Could not open video file: {video_path}")
289
+
290
+ # fps = vidcap.get(cv2.CAP_PROP_FPS)
291
+ # total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
292
+
293
+ # for i in range(0, total_frames, frame_interval):
294
+ # vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
295
+ # success, frame = vidcap.read()
296
+ # if success:
297
+ # frame_path = f"temp_frame_{i}.jpg"
298
+ # cv2.imwrite(frame_path, frame)
299
+ # frames.append(frame_path)
300
+ # timestamps.append(i / fps)
301
+
302
+ # vidcap.release()
303
+ # return frames, timestamps
304
+
305
+ # except Exception as e:
306
+ # print(f"[ERROR] Video processing failed: {e}")
307
+ # if 'vidcap' in locals():
308
+ # vidcap.release()
309
+ # return [], []
310
+
311
+ # @lru_cache(maxsize=100)
312
+ # def process_with_blip(image_path):
313
+ # """BLIP captioning with better error handling"""
314
+ # try:
315
+ # image = Image.open(image_path).convert("RGB")
316
+ # inputs = blip_processor(image, return_tensors="pt").to(device)
317
+ # caption = blip_model.generate(**inputs, max_new_tokens=50)[0]
318
+ # return blip_processor.decode(caption, skip_special_tokens=True)
319
+ # except Exception as e:
320
+ # print(f"[BLIP Error] {str(e)}")
321
+ # return "Could not generate caption"
322
+
323
+ # def analyze_media(file_path, prompt, min_confidence=25, frame_interval=30):
324
+ # """Improved media analysis with better metadata handling"""
325
+ # # Handle both images and videos
326
+ # if file_path.lower().endswith(('.mp4', '.mov')):
327
+ # frame_paths, timestamps = extract_frames(file_path, frame_interval)
328
+ # if not frame_paths:
329
+ # return []
330
+ # else:
331
+ # frame_paths = [file_path]
332
+ # timestamps = [0]
333
+
334
+ # results = []
335
+ # for path, timestamp in zip(frame_paths, timestamps):
336
+ # try:
337
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
338
+ # text = clip.tokenize([prompt]).to(device)
339
+
340
+ # with torch.no_grad():
341
+ # image_features = clip_model.encode_image(image)
342
+ # text_features = clip_model.encode_text(text)
343
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
344
+
345
+ # confidence = similarity.item() * 100
346
+ # datetime_info = get_image_datetime(path) if not file_path.lower().endswith(('.mp4', '.mov')) else None
347
+
348
+ # result = {
349
+ # "path": path,
350
+ # "confidence": confidence,
351
+ # "timestamp": timestamp,
352
+ # "datetime": datetime_info,
353
+ # "source": "CLIP",
354
+ # "status": "confident" if confidence >= min_confidence else "fallback"
355
+ # }
356
+ # results.append(result)
357
+ # except Exception as e:
358
+ # print(f"[ERROR] Processing frame failed: {e}")
359
+ # return results
360
+ #GPT bugs UPDATE---------------------------------------------------------------------------------------------------------------
361
+ # main.py
362
+ # import os
363
+ # import cv2
364
+ # import torch
365
+ # import clip
366
+ # import openai
367
+ # from PIL import Image
368
+ # from datetime import datetime
369
+ # from functools import lru_cache
370
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
371
+ # from dotenv import load_dotenv
372
+
373
+ # # Load .env for OpenAI
374
+ # load_dotenv()
375
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
376
+
377
+ # # Init models (lazy loaded for performance)
378
+ # device = "cuda" if torch.cuda.is_available() else "cpu"
379
+
380
+ # @lru_cache(maxsize=1)
381
+ # def get_clip_model():
382
+ # return clip.load("ViT-B/32", device=device)
383
+
384
+ # @lru_cache(maxsize=1)
385
+ # def get_blip_models():
386
+ # processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
387
+ # model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
388
+ # return processor, model
389
+
390
+ # # Video frame extraction
391
+ # def extract_frames(video_path, frame_interval=30):
392
+ # frames = []
393
+ # vidcap = cv2.VideoCapture(video_path)
394
+ # fps = vidcap.get(cv2.CAP_PROP_FPS)
395
+ # total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
396
+
397
+ # for i in range(0, total_frames, frame_interval):
398
+ # vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
399
+ # success, frame = vidcap.read()
400
+ # if success:
401
+ # frame_path = f"temp_frame_{i}.jpg"
402
+ # cv2.imwrite(frame_path, frame)
403
+ # frames.append(frame_path)
404
+ # vidcap.release()
405
+ # return frames, fps
406
+
407
+ # # BLIP fallback
408
+ # @lru_cache(maxsize=100)
409
+ # def process_with_blip(image_path):
410
+ # processor, model = get_blip_models()
411
+ # try:
412
+ # image = Image.open(image_path).convert("RGB")
413
+ # inputs = processor(image, return_tensors="pt").to(device)
414
+ # caption_ids = model.generate(**inputs, max_new_tokens=50)[0]
415
+ # return processor.decode(caption_ids, skip_special_tokens=True)
416
+ # except Exception as e:
417
+ # return f"BLIP error: {str(e)}"
418
+
419
+ # # Core logic
420
+ # def analyze_media(file_path, prompt, min_confidence=25, frame_interval=30):
421
+ # clip_model, clip_preprocess = get_clip_model()
422
+
423
+ # # Handle video vs image
424
+ # if file_path.endswith(('.mp4', '.mov', '.mpeg4')):
425
+ # frame_paths, fps = extract_frames(file_path, frame_interval)
426
+ # timestamps = [i / fps for i in range(len(frame_paths))]
427
+ # else:
428
+ # frame_paths = [file_path]
429
+ # timestamps = [0]
430
+
431
+ # results = []
432
+ # for path, timestamp in zip(frame_paths, timestamps):
433
+ # try:
434
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
435
+ # text = clip.tokenize([prompt]).to(device)
436
+ # with torch.no_grad():
437
+ # image_features = clip_model.encode_image(image)
438
+ # text_features = clip_model.encode_text(text)
439
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
440
+ # confidence = similarity.item() * 100
441
+ # results.append({
442
+ # "path": path,
443
+ # "confidence": confidence,
444
+ # "timestamp": timestamp,
445
+ # "source": "CLIP",
446
+ # "status": "confident" if confidence >= min_confidence else "fallback"
447
+ # })
448
+ # except Exception as e:
449
+ # print(f"[ERROR] Processing frame failed: {e}")
450
+ # return results
451
+
452
+ #GPT cleanup new python---------------------------------------------------------------------------------------------------------------------------
453
+ # main.py (COMPLETE: Spec-Matching Version)
454
+ # main.py (Refactored for batching, async, EXIF)
455
+
456
+ # main.py (Refactored for batching, async, EXIF, and video fix)
457
+
458
+ # main.py (Optimized: Max 60 frames, 1 FPS, Removed Interval Slider)
459
+
460
+ # import os
461
+ # import cv2
462
+ # import torch
463
+ # import clip
464
+ # import openai
465
+ # import asyncio
466
+ # import concurrent.futures
467
+ # from PIL import Image, UnidentifiedImageError, ExifTags
468
+ # from datetime import datetime
469
+ # from functools import lru_cache
470
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
471
+ # from dotenv import load_dotenv
472
+ # from torchvision import transforms
473
+
474
+ # # Load API Keys
475
+ # load_dotenv()
476
+ # openai.api_key = os.getenv("OPENAI_API_KEY")
477
+
478
+ # # Device Setup
479
+ # device = "cuda" if torch.cuda.is_available() else "cpu"
480
+
481
+ # # Init Models (Lazy Cache)
482
+ # @lru_cache(maxsize=1)
483
+ # def get_clip_model():
484
+ # return clip.load("ViT-B/32", device=device)
485
+
486
+ # @lru_cache(maxsize=1)
487
+ # def get_blip_models():
488
+ # processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
489
+ # model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
490
+ # return processor, model
491
+
492
+ # # Extract up to 60 frames at 1 FPS
493
+ # def extract_frames(video_path):
494
+ # frames = []
495
+ # vidcap = cv2.VideoCapture(video_path)
496
+ # fps = vidcap.get(cv2.CAP_PROP_FPS)
497
+ # total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
498
+ # interval = int(fps) # 1 frame per second
499
+ # max_frames = 60
500
+
501
+ # for i in range(0, min(total_frames, max_frames * interval), interval):
502
+ # vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
503
+ # success, frame = vidcap.read()
504
+ # if success:
505
+ # frame_path = f"temp_frame_{i}.jpg"
506
+ # cv2.imwrite(frame_path, frame)
507
+ # frames.append(frame_path)
508
+ # vidcap.release()
509
+ # return frames, fps
510
+
511
+ # # BLIP fallback captioning
512
+ # @lru_cache(maxsize=100)
513
+ # def process_with_blip(image_path):
514
+ # processor, model = get_blip_models()
515
+ # try:
516
+ # image = Image.open(image_path).convert("RGB")
517
+ # inputs = processor(image, return_tensors="pt").to(device)
518
+ # caption_ids = model.generate(**inputs, max_new_tokens=50)[0]
519
+ # return processor.decode(caption_ids, skip_special_tokens=True)
520
+ # except Exception as e:
521
+ # return f"BLIP error: {str(e)}"
522
+
523
+ # # Optional EXIF extractor
524
+ # def extract_metadata(image_path):
525
+ # try:
526
+ # image = Image.open(image_path)
527
+ # exif_data = image._getexif()
528
+ # if not exif_data:
529
+ # return {}
530
+ # labeled = {
531
+ # ExifTags.TAGS.get(k, k): v for k, v in exif_data.items()
532
+ # if k in ExifTags.TAGS
533
+ # }
534
+ # return labeled
535
+ # except Exception:
536
+ # return {}
537
+
538
+ # # Resize & preprocess
539
+ # clip_resize = transforms.Compose([
540
+ # transforms.Resize((224, 224)),
541
+ # transforms.ToTensor()
542
+ # ])
543
+
544
+ # # Batch processing helper
545
+ # def get_clip_features_batch(image_paths, model, preprocess, batch_size=32):
546
+ # images = []
547
+ # for p in image_paths:
548
+ # try:
549
+ # img = preprocess(Image.open(p).convert("RGB"))
550
+ # images.append(img)
551
+ # except UnidentifiedImageError:
552
+ # continue # Skip bad frames
553
+ # if not images:
554
+ # return torch.empty(0)
555
+ # image_batches = [torch.stack(images[i:i+batch_size]) for i in range(0, len(images), batch_size)]
556
+ # encoded = []
557
+ # with torch.no_grad():
558
+ # for batch in image_batches:
559
+ # encoded.append(model.encode_image(batch.to(device)))
560
+ # return torch.cat(encoded)
561
+
562
+ # # Async helper
563
+ # async def run_async_batches(func, items):
564
+ # loop = asyncio.get_event_loop()
565
+ # with concurrent.futures.ThreadPoolExecutor() as pool:
566
+ # return await asyncio.gather(*[loop.run_in_executor(pool, func, *item) for item in items])
567
+
568
+ # # Main media analysis logic
569
+ # def analyze_media(file_path, prompt, min_confidence=25):
570
+ # clip_model, clip_preprocess = get_clip_model()
571
+ # frame_paths = []
572
+ # timestamps = []
573
+
574
+ # # Detect if video
575
+ # if file_path.endswith((".mp4", ".mov", ".mpeg4")):
576
+ # frame_paths, fps = extract_frames(file_path)
577
+ # timestamps = [i for i in range(len(frame_paths))] # 1 second per frame
578
+ # else:
579
+ # frame_paths = [file_path]
580
+ # timestamps = [0]
581
+
582
+ # # Prepare text features
583
+ # text = clip.tokenize([prompt]).to(device)
584
+ # with torch.no_grad():
585
+ # text_features = clip_model.encode_text(text)
586
+
587
+ # # Batch encode images
588
+ # image_features = get_clip_features_batch(frame_paths, clip_model, clip_preprocess)
589
+ # if image_features.shape[0] == 0:
590
+ # return []
591
+
592
+ # results = []
593
+ # for idx, (img_path, img_feat, ts) in enumerate(zip(frame_paths, image_features, timestamps)):
594
+ # sim = torch.nn.functional.cosine_similarity(img_feat.unsqueeze(0), text_features)
595
+ # confidence = sim.item() * 100
596
+ # if confidence >= 15:
597
+ # results.append({
598
+ # "path": img_path,
599
+ # "confidence": confidence,
600
+ # "timestamp": ts,
601
+ # "source": "CLIP",
602
+ # "status": "confident" if confidence >= min_confidence else "fallback",
603
+ # "metadata": extract_metadata(img_path)
604
+ # })
605
+ # return results
606
+
607
+ # return results
app.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ #Imports
3
+ import os
4
+ import tempfile
5
+ import streamlit as st
6
+ from PIL import Image
7
+ # from main import analyze_media, process_with_blip
8
+ from main import analyze_media
9
+ #import openai
10
+ import io
11
+ import zipfile
12
+
13
+ # Initialize OpenAI API
14
+ # from dotenv import load_dotenv
15
+ # load_dotenv()
16
+ # api_key = os.getenv("OPENAI_API_KEY")
17
+ # openai.api_key = api_key
18
+
19
+ # --- Streamlit Setup ---
20
+ st.set_page_config(layout="wide", page_title="Vision Sort")
21
+ st.sidebar.header("Configuration")
22
+
23
+ # --- Sidebar Config ---
24
+ min_confidence = st.sidebar.number_input("Confidence Threshold", min_value=0, max_value=100, value=25, step=1)
25
+ borderline_min = st.sidebar.number_input("Borderline Minimum", min_value=0, max_value=100, value=15, step=1)
26
+
27
+
28
+ # --- Main Interface ---
29
+ st.title("🔍 VisionSort Pro")
30
+ uploaded_files = st.file_uploader("Upload images/videos", type=["jpg", "jpeg", "png", "mp4", "mov"], accept_multiple_files=True)
31
+ user_prompt = st.text_input("Search prompt", placeholder="e.g. 'find the cat'")
32
+
33
+ if uploaded_files and user_prompt:
34
+ results = {"high": [], "borderline": [], "low": []}
35
+ temp_paths = []
36
+
37
+ with st.spinner(f"Processing {len(uploaded_files)} files..."):
38
+ for file in uploaded_files:
39
+ with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as f:
40
+ f.write(file.read())
41
+ temp_paths.append(f.name)
42
+ media_results = analyze_media(
43
+ f.name,
44
+ user_prompt,
45
+ min_confidence,
46
+ (borderline_min, min_confidence)
47
+ )
48
+
49
+ for res in media_results:
50
+ results[res["status"]].append(res)
51
+
52
+ # Sort all groups by confidence descending
53
+ for group in results.values():
54
+ group.sort(key=lambda r: r["confidence"], reverse=True)
55
+
56
+ # --- Display Confident Matches ---
57
+ if results["high"]:
58
+ st.subheader(f"🎯 Confident Matches ({len(results['high'])})")
59
+ cols = st.columns(4)
60
+ for idx, res in enumerate(results["high"]):
61
+ with cols[idx % 4]:
62
+ st.image(Image.open(res["path"]), use_container_width=True)
63
+ st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
64
+
65
+ # --- Display Borderline Matches ---
66
+ if results["borderline"]:
67
+ st.subheader(f"⚠️ Potential Matches ({len(results['borderline'])})")
68
+ #if st.checkbox("Show borderline results", True):
69
+ if st.checkbox("Show borderline results", True, key="show_borderline"):
70
+ cols = st.columns(4)
71
+ for idx, res in enumerate(results["borderline"]):
72
+ with cols[idx % 4]:
73
+ st.image(Image.open(res["path"]), use_container_width=True)
74
+ st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
75
+ # if st.button("🧠 Explain Match", key=f"blip_{idx}"):
76
+ # with st.expander("🔍 BLIP Analysis"):
77
+ # st.write(f"**BLIP Description:** {process_with_blip(res['path'])}")
78
+ # if "gpt_suggestion" in res:
79
+ # st.write(f"**GPT Suggestion:** {res['gpt_suggestion']}")
80
+
81
+ # --- Display Low Confidence Matches Only If GPT Enabled ---
82
+ # if results["low"] and openai.api_key:
83
+ # st.subheader(f"❓ Low Confidence Matches ({len(results['low'])})")
84
+ # if st.checkbox("Show low confidence results"):
85
+ # for res in results["low"]:
86
+ # st.image(Image.open(res["path"]), use_container_width=True)
87
+ # st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
88
+ # if "gpt_suggestion" in res:
89
+ # st.markdown(f"**💡 GPT Suggestion:** {res['gpt_suggestion']}")
90
+
91
+ # --- Display Low Confidence Matches ------------------------------------------------------
92
+ if results["low"]:
93
+ st.subheader(f"❓ Low Confidence Matches ({len(results['low'])})")
94
+ # if st.checkbox("Show low confidence results"):
95
+ if st.checkbox("Show low confidence results", key="show_low"):
96
+ for res in results["low"]:
97
+ st.image(Image.open(res["path"]), use_container_width=True)
98
+ st.caption(f"{res['confidence']:.1f}% | {res['timestamp']:.2f}s")
99
+
100
+ # --- Prepare Downloadable Results ---
101
+ download_ready = []
102
+
103
+ if results["high"]:
104
+ download_ready += results["high"]
105
+
106
+ if results["borderline"] and st.session_state.get("show_borderline", True):
107
+ download_ready += results["borderline"]
108
+
109
+ if results["low"] and st.session_state.get("show_low", False):
110
+ download_ready += results["low"]
111
+
112
+ if download_ready:
113
+ zip_buffer = io.BytesIO()
114
+ with zipfile.ZipFile(zip_buffer, "w") as zipf:
115
+ for res in download_ready:
116
+ try:
117
+ filename = os.path.basename(res["path"])
118
+ zipf.write(res["path"], arcname=filename)
119
+ except Exception:
120
+ continue
121
+ zip_buffer.seek(0)
122
+
123
+ st.download_button(
124
+ label="⬇️ Download Displayed Images",
125
+ data=zip_buffer,
126
+ file_name="visionSort_results.zip",
127
+ mime="application/zip"
128
+ )
129
+
130
+ # --- Cleanup Temporary Files ---
131
+ for path in temp_paths:
132
+ if os.path.exists(path):
133
+ os.unlink(path)
134
+
config.toml ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Streamlit Cloud defaults to Python 3.12, which can break native modules like opencv-python.
2
+ # Pinning Python 3.11 ensures compatibility with the local dev environment (Python 3.11.11).
3
+ # Force Python 3.11 to avoid OpenCV/Torch compatibility issues on Streamlit Cloud
4
+ [tool.streamlit]
5
+ pythonVersion = "3.11"
gitignore.txt ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #Python cache files - generated automatically, not needed in Git
2
+ __pycache__/
3
+ *.pyc
4
+
5
+ #macOS system file - useless clutter on GitHub
6
+ *.DS_Store
7
+
8
+ #OpenAI API key
9
+ .env
10
+
11
+ #JetBrains IDE settings (like PyCharm) - user-specific, not project code
12
+ .idea/
13
+
14
+ #VS Code settings folder - personal workspace config, not app logic
15
+ .vscode/
16
+
17
+ # Python cache
18
+ __pycache__/
19
+ *.pyc
20
+
21
+ # Secrets
22
+ .env
23
+
24
+ # macOS clutter
25
+ *.DS_Store
26
+
27
+ #IDE/editor settings
28
+ .idea/
29
+ .vscode/
30
+
31
+ #Dev test files you i want to push
32
+ test_video.MOV
33
+ *.ipynb
34
+ frames/
35
+ *.jpg
36
+ *.jpeg
37
+ *.webp
38
+ example.py
39
+
main.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #Imports
2
+ import os
3
+ import cv2
4
+ import torch
5
+ import clip
6
+ from PIL import Image
7
+ from datetime import datetime
8
+ # import openai
9
+ # from functools import lru_cache
10
+ # from transformers import BlipProcessor, BlipForConditionalGeneration
11
+
12
+ # Initialize OpenAI API
13
+ # from dotenv import load_dotenv
14
+ # load_dotenv()
15
+ # api_key = os.getenv("OPENAI_API_KEY")
16
+ # openai.api_key = api_key
17
+
18
+ # Initialize models
19
+ device = "cuda" if torch.cuda.is_available() else "cpu"
20
+ clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
21
+ # blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
22
+ # blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)
23
+
24
+ # Video processing
25
+ def extract_frames(video_path, frame_interval=30):
26
+ frames = []
27
+ timestamps = []
28
+
29
+ vidcap = cv2.VideoCapture(video_path)
30
+ fps = vidcap.get(cv2.CAP_PROP_FPS)
31
+ total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
32
+
33
+ for i in range(0, total_frames, frame_interval):
34
+ vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
35
+ success, frame = vidcap.read()
36
+ if success:
37
+ timestamp = i / fps # 🕒 actual second into the video
38
+ frame_path = f"temp_frame_{i}.jpg"
39
+ cv2.imwrite(frame_path, frame)
40
+ frames.append(frame_path)
41
+ timestamps.append(timestamp)
42
+
43
+ vidcap.release()
44
+ # return frames, fps
45
+ return frames, timestamps
46
+
47
+
48
+ # @lru_cache(maxsize=100)
49
+ # def process_with_blip(image_path):
50
+ # try:
51
+ # image = Image.open(image_path).convert("RGB")
52
+ # inputs = blip_processor(image, return_tensors="pt").to(device)
53
+ # caption = blip_model.generate(**inputs, max_new_tokens=50)[0]
54
+ # return blip_processor.decode(caption, skip_special_tokens=True)
55
+ # except Exception as e:
56
+ # return f"Error: {str(e)}"
57
+
58
+
59
+ #Updated analyze_media() function with:
60
+ # Video frame timestamps
61
+ # Try/except with Streamlit warnings
62
+ # GPT fallback logic for low-confidence matches
63
+ # Supports both images and videos
64
+
65
+ def analyze_media(file_path, prompt, min_confidence=25, borderline_range=(15, 25)):
66
+ from PIL import Image
67
+ import streamlit as st
68
+
69
+ # Handle different input types: image or video
70
+ if file_path.lower().endswith((".jpg", ".jpeg", ".png")):
71
+ frame_paths = [file_path]
72
+ timestamps = [0] # Static images get timestamp 0
73
+ elif file_path.lower().endswith((".mp4", ".mov")):
74
+ # Extract frames and their timestamps
75
+ frame_paths, timestamps = extract_frames(file_path)
76
+ else:
77
+ st.warning(f"⚠️ Unsupported file type: {os.path.basename(file_path)}")
78
+ return []
79
+
80
+ results = []
81
+
82
+ # Process each frame or image
83
+ for path, timestamp in zip(frame_paths, timestamps):
84
+ try:
85
+ # Open and convert image to RGB (avoids channel issues)
86
+ pil_image = Image.open(path).convert("RGB")
87
+ except Exception as e:
88
+ # Warn the user and skip the frame if it's not readable
89
+ st.warning(f"⚠️ Skipped: `{os.path.basename(path)}` — couldn't load image.")
90
+ continue
91
+
92
+ # Preprocess image for CLIP
93
+ image = clip_preprocess(pil_image).unsqueeze(0).to(device)
94
+ text = clip.tokenize([prompt]).to(device)
95
+
96
+ # Get similarity score from CLIP
97
+ with torch.no_grad():
98
+ image_features = clip_model.encode_image(image)
99
+ text_features = clip_model.encode_text(text)
100
+ similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
101
+
102
+ confidence = similarity.item() * 100 # Convert to %
103
+
104
+ # Assign confidence category
105
+ if confidence >= min_confidence:
106
+ status = "high"
107
+ elif confidence >= borderline_range[0]:
108
+ status = "borderline"
109
+ else:
110
+ status = "low"
111
+
112
+ # Base result
113
+ result = {
114
+ "path": path,
115
+ "confidence": confidence,
116
+ "timestamp": timestamp,
117
+ "source": "CLIP",
118
+ "status": status
119
+ }
120
+
121
+ # If low confidence and GPT available, add fallback suggestion
122
+ # if status == "low" and openai.api_key:
123
+ # try:
124
+ # blip_desc = process_with_blip(path)
125
+ # response = openai.ChatCompletion.create(
126
+ # model="gpt-4",
127
+ # messages=[
128
+ # {"role": "system", "content": "Suggest one improved image search prompt based on:"},
129
+ # {"role": "user", "content": blip_desc}
130
+ # ],
131
+ # max_tokens=50
132
+ # )
133
+ # result["gpt_suggestion"] = response.choices[0].message.content
134
+ # except Exception as e:
135
+ # st.warning(f"⚠️ GPT fallback failed for `{os.path.basename(path)}`")
136
+
137
+ results.append(result)
138
+
139
+ return results
140
+
141
+ # def analyze_media(file_path, prompt, min_confidence=25, borderline_range=(15,25)):
142
+ # # Handle both images and videos
143
+ # if file_path.endswith(('.mp4', '.mov')):
144
+ # frame_paths, fps = extract_frames(file_path)
145
+ # timestamps = [i/fps for i in range(0, len(frame_paths)*30, 30)]
146
+ # else:
147
+ # frame_paths = [file_path]
148
+ # timestamps = [0]
149
+
150
+ # results = []
151
+ # for path, timestamp in zip(frame_paths, timestamps):
152
+ # # CLIP analysis
153
+ # image = clip_preprocess(Image.open(path)).unsqueeze(0).to(device)
154
+ # text = clip.tokenize([prompt]).to(device)
155
+
156
+ # with torch.no_grad():
157
+ # image_features = clip_model.encode_image(image)
158
+ # text_features = clip_model.encode_text(text)
159
+ # similarity = torch.nn.functional.cosine_similarity(image_features, text_features)
160
+
161
+ # confidence = similarity.item() * 100
162
+ # result = {
163
+ # "path": path,
164
+ # "confidence": confidence,
165
+ # "timestamp": timestamp,
166
+ # "source": "CLIP",
167
+ # "status": (
168
+ # "high_confidence" if confidence >= min_confidence else
169
+ # "borderline" if confidence >= borderline_range[0] else
170
+ # "low_confidence"
171
+ # )
172
+ # }
173
+
174
+ # # Only use GPT-4 for very low confidence if available
175
+ # if confidence < borderline_range[0] and openai.api_key:
176
+ # try:
177
+ # blip_desc = process_with_blip(path)
178
+ # response = openai.ChatCompletion.create(
179
+ # model="gpt-4",
180
+ # messages=[{
181
+ # "role": "system",
182
+ # "content": "Suggest one improved image search prompt based on:"
183
+ # }, {
184
+ # "role": "user",
185
+ # "content": blip_desc
186
+ # }],
187
+ # max_tokens=50
188
+ # )
189
+ # result["gpt_suggestion"] = response.choices[0].message.content
190
+ # except:
191
+ # pass
192
+
193
+ # results.append(result)
194
+
195
+ # return results
196
+
197
+
198
+ #---------------------------------------------------------------------------------------------------
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ torchaudio
4
+ ftfy
5
+ regex
6
+ tqdm
7
+ numpy
8
+ Pillow
9
+ git+https://github.com/openai/CLIP.git