Spaces:
Runtime error
Runtime error
prioritize some TODOs
Browse files
app.py
CHANGED
|
@@ -1,31 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# TODO some inline todos below that should reduce need to reset/rollback DBs
|
| 2 |
# - how to easily rollback bad data?
|
| 3 |
# TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
|
| 4 |
# TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
|
| 5 |
# - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
|
| 6 |
-
# TODO account for mpnet's limit of 384 word pieces per chunk (is it done already?)
|
| 7 |
-
# supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
|
| 8 |
-
# - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
|
| 9 |
-
# - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
|
| 10 |
# TODO user prefs data model (their curations)
|
| 11 |
-
# - meh not needed at first
|
| 12 |
-
# TODO summarize a vid (and optionally add to curation)
|
| 13 |
-
# TODO support yt playlists in addition to just one-off videos
|
| 14 |
-
# - can i make this really easy to add via a well designed api?
|
| 15 |
# TODO finalize deployment strategy
|
| 16 |
# - supabase free tier for db + blob storage of transcripts
|
| 17 |
# - hf space to host model computations (langchain bits need to run here)
|
| 18 |
# - replit or supabase to host edge functiosn to call hf space
|
| 19 |
# TODO gradio global state to track recently asked questions from everyone
|
| 20 |
-
# TODO add discord/github/google auth...via custom js? see supabase docs
|
| 21 |
-
# - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
|
| 22 |
-
# - add stars to curations+users profile -> display starred curations first, then sort by most popular
|
| 23 |
-
# - securely store user's openai key in supabase for convenience
|
| 24 |
# TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
|
| 25 |
# TODO could use pinecone namespace per embedding model
|
| 26 |
-
# TODO let user customize instr (via langchain's jinji support?)
|
| 27 |
-
# - better: make easy to experiment with langchain's chains/agents
|
| 28 |
-
# - maybe something like model_laboratory with gradio's Parallel block?
|
| 29 |
|
| 30 |
# TODO deploy txtai to fly.io free tier? not sure compute reqs
|
| 31 |
# - or haystack?
|
|
|
|
| 1 |
+
# prioritized todos
|
| 2 |
+
# supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
|
| 3 |
+
# - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
|
| 4 |
+
# - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
|
| 5 |
+
# TODO add discord/github/google auth...via custom js? see supabase docs
|
| 6 |
+
# - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
|
| 7 |
+
# - add stars to curations+users profile -> display starred curations first, then sort by most popular
|
| 8 |
+
# - securely store user's openai key in supabase for convenience
|
| 9 |
+
# TODO better ai arch
|
| 10 |
+
# - eg let user customize instr (via langchain's jinji support?)
|
| 11 |
+
# - better: make easy to experiment with langchain's chains/agents
|
| 12 |
+
# - maybe something like model_laboratory with gradio's Parallel block?
|
| 13 |
+
# - account for mpnet's limit of 384 word pieces per chunk (is it done already?)
|
| 14 |
+
# - - more deliberate chunking strat in general
|
| 15 |
+
# TODO summarize a vid (and optionally add to curation)
|
| 16 |
+
# TODO support yt playlists and yt channels in addition to just one-off videos
|
| 17 |
+
# - can i make this really easy to add via a well designed api?
|
| 18 |
+
|
| 19 |
+
# unprioritized todos
|
| 20 |
# TODO some inline todos below that should reduce need to reset/rollback DBs
|
| 21 |
# - how to easily rollback bad data?
|
| 22 |
# TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
|
| 23 |
# TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
|
| 24 |
# - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
# TODO user prefs data model (their curations)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
# TODO finalize deployment strategy
|
| 27 |
# - supabase free tier for db + blob storage of transcripts
|
| 28 |
# - hf space to host model computations (langchain bits need to run here)
|
| 29 |
# - replit or supabase to host edge functiosn to call hf space
|
| 30 |
# TODO gradio global state to track recently asked questions from everyone
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
# TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
|
| 32 |
# TODO could use pinecone namespace per embedding model
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
# TODO deploy txtai to fly.io free tier? not sure compute reqs
|
| 35 |
# - or haystack?
|