| title: PogcastGPT | |
| emoji: 💻 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.10.0 | |
| app_file: app.py | |
| pinned: false | |
| duplicated_from: somuch4subtlety/pogcastGPT | |
| This app uses semantic search to find and summarize relevant sections of the Pogcast to answer a user's question. | |
| The process began by downloading and transcribing Pogcast episodes using [OpenAI’s Whisper](https://github.com/openai/whisper). | |
| The transcriptions were then chunked into sections of ~500 words and each chunk was vectorized using [OpenAI’s embedding endpoint](https://beta.openai.com/docs/guides/embeddings). | |
| The embeddings and text are then stored in a [vector database](Pinecone.io). | |
| When you ask a question, the text is run through the embedding endpoint and then is compared to all of the vectorized sections using cosine similarity. | |
| The top results are used as context and passed to [OpenAI’s GPT-3 completion endpoint](https://beta.openai.com/docs/api-reference/completions) along with your question and an explanation of how GPT-3 should answer the question. | |
| Lastly, the summary answer and top matching sections are displayed. | |
| Note | |
| The parameters and completion prompt are set loosely and the bot is likely to hallucinate during its anwsers. |