Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| # Custom CSS for better styling | |
| st.markdown(""" | |
| <style> | |
| .main-title { | |
| font-size: 36px; | |
| color: #4A90E2; | |
| font-weight: bold; | |
| text-align: center; | |
| } | |
| .sub-title { | |
| font-size: 24px; | |
| color: #4A90E2; | |
| margin-top: 20px; | |
| } | |
| .section { | |
| background-color: #f9f9f9; | |
| padding: 15px; | |
| border-radius: 10px; | |
| margin-top: 20px; | |
| } | |
| .section p, .section ul { | |
| color: #666666; | |
| } | |
| .link { | |
| color: #4A90E2; | |
| text-decoration: none; | |
| } | |
| .benchmark-table { | |
| width: 100%; | |
| border-collapse: collapse; | |
| margin-top: 20px; | |
| } | |
| .benchmark-table th, .benchmark-table td { | |
| border: 1px solid #ddd; | |
| padding: 8px; | |
| text-align: left; | |
| } | |
| .benchmark-table th { | |
| background-color: #4A90E2; | |
| color: white; | |
| } | |
| .benchmark-table td { | |
| background-color: #f2f2f2; | |
| } | |
| </style> | |
| """, unsafe_allow_html=True) | |
| # Main Title | |
| st.markdown('<div class="main-title">Wav2Vec2 for Speech Recognition</div>', unsafe_allow_html=True) | |
| # Description | |
| st.markdown(""" | |
| <div class="section"> | |
| <p><strong>Wav2Vec2</strong> is a groundbreaking model in Automatic Speech Recognition (ASR), developed to learn speech representations from raw audio. This model achieves exceptional accuracy with minimal labeled data, making it ideal for low-resource settings. Adapted for Spark NLP, Wav2Vec2 enables scalable, production-ready ASR applications.</p> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # Why, Where, and When to Use Wav2Vec2 | |
| st.markdown('<div class="sub-title">Why, Where, and When to Use Wav2Vec2</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <p>Use <strong>Wav2Vec2</strong> when you need a robust ASR solution that excels in scenarios with limited labeled data. Itβs perfect for various speech-to-text applications where scalability and accuracy are critical. Some ideal use cases include:</p> | |
| <ul> | |
| <li><strong>Transcription Services:</strong> Efficiently convert large volumes of speech into text, vital for media, legal, and healthcare industries.</li> | |
| <li><strong>Voice-Activated Assistants:</strong> Enhance the accuracy of voice commands in smart devices and personal assistants.</li> | |
| <li><strong>Meeting Summarization:</strong> Automatically transcribe and summarize meetings, aiding in easy content review and catch-up for absentees.</li> | |
| <li><strong>Language Learning Tools:</strong> Assist learners in improving pronunciation by providing real-time speech-to-text feedback.</li> | |
| <li><strong>Accessibility Enhancements:</strong> Generate real-time captions for videos and live events, making content accessible to the hearing impaired.</li> | |
| <li><strong>Call Center Analytics:</strong> Analyze customer interactions for insights and quality monitoring.</li> | |
| </ul> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # How to Use the Model | |
| st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True) | |
| st.code(''' | |
| audio_assembler = AudioAssembler() \\ | |
| .setInputCol("audio_content") \\ | |
| .setOutputCol("audio_assembler") | |
| speech_to_text = Wav2Vec2ForCTC \\ | |
| .pretrained("asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman", "en")\\ | |
| .setInputCols("audio_assembler") \\ | |
| .setOutputCol("text") | |
| pipeline = Pipeline(stages=[ | |
| audio_assembler, | |
| speech_to_text, | |
| ]) | |
| pipelineModel = pipeline.fit(audioDf) | |
| pipelineDF = pipelineModel.transform(audioDf) | |
| ''', language='python') | |
| # Best Practices & Tips | |
| st.markdown('<div class="sub-title">Best Practices & Tips</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <ul> | |
| <li><strong>Preprocessing:</strong> Ensure your audio data is clear and well-prepared by removing background noise and normalizing audio levels for the best transcription results.</li> | |
| <li><strong>Fine-tuning:</strong> For specific use cases or languages, consider fine-tuning the model on your own dataset to improve accuracy.</li> | |
| <li><strong>Batch Processing:</strong> Leverage Spark NLP's distributed processing capabilities to handle large-scale audio datasets efficiently.</li> | |
| <li><strong>Model Evaluation:</strong> Regularly evaluate the model's performance on your specific use case using metrics like Word Error Rate (WER) to ensure it meets your accuracy requirements.</li> | |
| <li><strong>Resource Management:</strong> When deploying in production, monitor resource usage, especially for large models, to optimize performance and cost.</li> | |
| </ul> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # Model Information | |
| st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <table class="benchmark-table"> | |
| <tr> | |
| <th>Attribute</th> | |
| <th>Description</th> | |
| </tr> | |
| <tr> | |
| <td><strong>Model Name</strong></td> | |
| <td>asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Compatibility</strong></td> | |
| <td>Spark NLP 4.2.0+</td> | |
| </tr> | |
| <tr> | |
| <td><strong>License</strong></td> | |
| <td>Open Source</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Edition</strong></td> | |
| <td>Official</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Input Labels</strong></td> | |
| <td>[audio_assembler]</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Output Labels</strong></td> | |
| <td>[text]</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Language</strong></td> | |
| <td>en</td> | |
| </tr> | |
| <tr> | |
| <td><strong>Size</strong></td> | |
| <td>1.2 GB</td> | |
| </tr> | |
| </table> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # Data Source Section | |
| st.markdown('<div class="sub-title">Data Source</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <p>The Wav2Vec2 model is available on <a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Hugging Face</a>. This model, trained by <em>jonatasgrosman</em>, has been adapted for use with Spark NLP, ensuring it is optimized for large-scale applications.</p> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # Conclusion | |
| st.markdown('<div class="sub-title">Conclusion</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <p><strong>Wav2Vec2</strong> is a versatile and powerful ASR model that excels in scenarios with limited labeled data, making it a game-changer in the field of speech recognition. Its seamless integration with Spark NLP allows for scalable, efficient, and accurate deployment in various real-world applications, from transcription services to voice-activated systems.</p> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # References | |
| st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <ul> | |
| <li><a class="link" href="https://sparknlp.org/2022/09/24/asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman_en.html" target="_blank">Wav2Vec2 Model on Spark NLP</a></li> | |
| <li><a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Wav2Vec2 Model on Hugging Face</a></li> | |
| <li><a class="link" href="https://arxiv.org/abs/2006.11477" target="_blank">wav2vec 2.0 Paper</a></li> | |
| <li><a class="link" href="https://github.com/pytorch/fairseq/tree/master/examples/wav2vec" target="_blank">Wav2Vec2 GitHub Repository</a></li> | |
| </ul> | |
| </div> | |
| """, unsafe_allow_html=True) | |
| # Community & Support | |
| st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True) | |
| st.markdown(""" | |
| <div class="section"> | |
| <ul> | |
| <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Comprehensive documentation and examples.</li> | |
| <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Join the community for live discussions and support.</li> | |
| <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Report issues, request features, and contribute to the project.</li> | |
| <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Read articles and tutorials about Spark NLP.</li> | |
| <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Watch video tutorials and demonstrations.</li> | |
| </ul> | |
| </div> | |
| """, unsafe_allow_html=True) | |