marcgreen commited on
Commit
421ff50
·
1 Parent(s): deffc44

prioritize some TODOs

Browse files
Files changed (1) hide show
  1. app.py +19 -15
app.py CHANGED
@@ -1,31 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # TODO some inline todos below that should reduce need to reset/rollback DBs
2
  # - how to easily rollback bad data?
3
  # TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
4
  # TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
5
  # - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
6
- # TODO account for mpnet's limit of 384 word pieces per chunk (is it done already?)
7
- # supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
8
- # - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
9
- # - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
10
  # TODO user prefs data model (their curations)
11
- # - meh not needed at first
12
- # TODO summarize a vid (and optionally add to curation)
13
- # TODO support yt playlists in addition to just one-off videos
14
- # - can i make this really easy to add via a well designed api?
15
  # TODO finalize deployment strategy
16
  # - supabase free tier for db + blob storage of transcripts
17
  # - hf space to host model computations (langchain bits need to run here)
18
  # - replit or supabase to host edge functiosn to call hf space
19
  # TODO gradio global state to track recently asked questions from everyone
20
- # TODO add discord/github/google auth...via custom js? see supabase docs
21
- # - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
22
- # - add stars to curations+users profile -> display starred curations first, then sort by most popular
23
- # - securely store user's openai key in supabase for convenience
24
  # TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
25
  # TODO could use pinecone namespace per embedding model
26
- # TODO let user customize instr (via langchain's jinji support?)
27
- # - better: make easy to experiment with langchain's chains/agents
28
- # - maybe something like model_laboratory with gradio's Parallel block?
29
 
30
  # TODO deploy txtai to fly.io free tier? not sure compute reqs
31
  # - or haystack?
 
1
+ # prioritized todos
2
+ # supabase to store index (apparently can't rely on vector db to do it?) and user's curations / popular curations
3
+ # - paused after 1 week inactivity (and i believe pinecone index DELETED after some days of inactivity?!)
4
+ # - - TODO backup both pinecone and supabase daily (this should count as the activity), and make publicly accessible
5
+ # TODO add discord/github/google auth...via custom js? see supabase docs
6
+ # - make users maintainers of their own curations, restrict add perms, introduce edit/delete/clone perms
7
+ # - add stars to curations+users profile -> display starred curations first, then sort by most popular
8
+ # - securely store user's openai key in supabase for convenience
9
+ # TODO better ai arch
10
+ # - eg let user customize instr (via langchain's jinji support?)
11
+ # - better: make easy to experiment with langchain's chains/agents
12
+ # - maybe something like model_laboratory with gradio's Parallel block?
13
+ # - account for mpnet's limit of 384 word pieces per chunk (is it done already?)
14
+ # - - more deliberate chunking strat in general
15
+ # TODO summarize a vid (and optionally add to curation)
16
+ # TODO support yt playlists and yt channels in addition to just one-off videos
17
+ # - can i make this really easy to add via a well designed api?
18
+
19
+ # unprioritized todos
20
  # TODO some inline todos below that should reduce need to reset/rollback DBs
21
  # - how to easily rollback bad data?
22
  # TODO harrison thinks editing vectorDB abstraction to consume Embedding class vs func is a good approach -> need to PR this
23
  # TODO can i generalize the query filter approach (add to langchain?) to remove coupling to pinecone?
24
  # - i believe elastic8.5 supports rdb and vdb, but need nontrivial specs to run it i think
 
 
 
 
25
  # TODO user prefs data model (their curations)
 
 
 
 
26
  # TODO finalize deployment strategy
27
  # - supabase free tier for db + blob storage of transcripts
28
  # - hf space to host model computations (langchain bits need to run here)
29
  # - replit or supabase to host edge functiosn to call hf space
30
  # TODO gradio global state to track recently asked questions from everyone
 
 
 
 
31
  # TODO create pinecone index without indexing text metadata field for performance: https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing
32
  # TODO could use pinecone namespace per embedding model
 
 
 
33
 
34
  # TODO deploy txtai to fly.io free tier? not sure compute reqs
35
  # - or haystack?