Spaces:
Runtime error
Runtime error
| title: Capstone | |
| emoji: π¨ | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.1.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: A multimodel LLM | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| ### Huggingface Gradio App: | |
| - The app.py script is a multimodal AI application that integrates image, audio, and text inputs using pre-trained models like CLIP (for vision tasks), Phi-2 | |
| (for text generation), and WhisperX (for audio transcription). The script sets up tokenizers and processors for handling inputs and defines a custom residual | |
| block (SimpleResBlock) to transform embeddings for more stable learning. After loading pretrained and fine-tuned weights for both the projection and residual layers, | |
| it implements the model_generate_ans function, which processes inputs from different modalities, combines their embeddings, and generates responses sequentially. | |
| This model handles tasks like image embedding extraction, audio transcription and embedding, and text tokenization to predict responses. The app features a Gradio | |
| interface where users can upload images, record or upload audio, and submit text queries, receiving multimodal answers through a web interface. This interactive | |
| application is designed for seamless, multi-input AI tasks using advanced model architectures. | |