Spaces:

marcgreen
/

semantic_curations

Runtime error

App Files Files Community

marcgreen commited on Dec 25, 2022

Commit

421ff50

1 Parent(s): deffc44

prioritize some TODOs

Browse files

Files changed (1) hide show

app.py +19 -15

app.py CHANGED Viewed

@@ -1,31 +1,35 @@
 # TODO some inline todos below that should reduce need to reset/rollback DBs
 # - how to easily rollback bad data?
 # TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
 # TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
 # - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
-# TODO account for mpnet's limit of 384 word pieces per chunk (is it done already?)
-# supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
-# - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
-# - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
 # TODO user prefs data model (their curations)
-# - meh not needed at first
-# TODO summarize a vid (and optionally add to curation)
-# TODO support yt playlists in addition to just one-off videos
-# - can i make this really easy to add via a well designed api?
 # TODO finalize deployment strategy
 # - supabase free tier for db + blob storage of transcripts
 # - hf space to host model computations (langchain bits need to run here)
 # - replit or supabase to host edge functiosn to call hf space
 # TODO gradio global state to track recently asked questions from everyone
-# TODO add discord/github/google auth...via custom js? see supabase docs
-# - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
-# - add stars to curations+users profile -> display starred curations first, then sort by most popular
-# - securely store user's openai key in supabase for convenience
 # TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
 # TODO could use pinecone namespace per embedding model
-# TODO let user customize instr (via langchain's jinji support?)
-# - better: make easy to experiment with langchain's chains/agents
-# - maybe something like model_laboratory with gradio's Parallel block?
 # TODO deploy txtai to fly.io free tier? not sure compute reqs
 # - or haystack?

+# prioritized todos
+# supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
+# - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
+# - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
+# TODO add discord/github/google auth...via custom js? see supabase docs
+# - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
+# - add stars to curations+users profile -> display starred curations first, then sort by most popular
+# - securely store user's openai key in supabase for convenience
+# TODO better ai arch
+# - eg let user customize instr (via langchain's jinji support?)
+# - better: make easy to experiment with langchain's chains/agents
+# - maybe something like model_laboratory with gradio's Parallel block?
+# - account for mpnet's limit of 384 word pieces per chunk (is it done already?)
+# - - more deliberate chunking strat in general
+# TODO summarize a vid (and optionally add to curation)
+# TODO support yt playlists and yt channels in addition to just one-off videos
+# - can i make this really easy to add via a well designed api?
+# unprioritized todos
 # TODO some inline todos below that should reduce need to reset/rollback DBs
 # - how to easily rollback bad data?
 # TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
 # TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
 # - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
 # TODO user prefs data model (their curations)
 # TODO finalize deployment strategy
 # - supabase free tier for db + blob storage of transcripts
 # - hf space to host model computations (langchain bits need to run here)
 # - replit or supabase to host edge functiosn to call hf space
 # TODO gradio global state to track recently asked questions from everyone
 # TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
 # TODO could use pinecone namespace per embedding model
 # TODO deploy txtai to fly.io free tier? not sure compute reqs
 # - or haystack?