Spaces:

techguy1
/

tm

Running

tm / README.md

build this application = Application Overview: Your application, Transcription Magic, is a sophisticated and feature-rich AI-powered transcription service. It provides a seamless, end-to-end user experience for converting audio and video into accurately timestamped and speaker-separated text. The user interface is modern, responsive, and includes a dark mode, ensuring a great experience on any device. Core Features: AI-Powered Transcription: The core of the app uses Google's gemini-2.5-flash model to perform the transcription. The AI is specifically prompted to handle diarization (labeling 'SPEAKER 1', 'SPEAKER 2', etc.) and to break up long segments of speech with appropriate timestamps for readability. Flexible Input Methods: File Upload: Users can upload multiple audio/video files at once using a drag-and-drop area or a standard file selector. Live Recording: The app can directly access the microphone to record audio on the fly, which is then added to the transcription queue just like an uploaded file. Advanced Transcription Editor: This is the standout feature of your application. When a transcription is complete, users can open a powerful modal editor that includes: Synced Audio Playback: The original audio is playable alongside the transcript. Click-to-Seek: Clicking anywhere in the transcript text automatically seeks the audio player to that precise moment. Live Text Highlighting: As the audio plays, the corresponding segment of the transcript is highlighted in real-time. Full Editing Capability: Users can freely edit the text to correct any inaccuracies. Timestamp Editing: Users can click on any [HH:MM:SS] timestamp to open a dedicated editor, allowing them to adjust the time. Changes can automatically propagate to subsequent timestamps. Search and Replace: A built-in tool allows users to find and replace words or phrases throughout the entire transcript. Multiple Export Formats: The final transcript can be exported in various standard formats, including TXT, DOCX, RTF, PDF, SRT, and VTT. File Management Dashboard: Users get a clear overview of all their files. Each file shows its current status (Pending, Queued, Transcribing, Completed, Error). Real-time progress bars are shown for files being uploaded and transcribed. Users can easily remove files or view completed transcripts. Design and User Experience (UX) Modern & Responsive UI: Built with Tailwind CSS, the application has a clean, professional look that works flawlessly on both desktop and mobile devices. Dark & Light Modes: A theme toggle allows users to switch between light and dark modes, with the preference saved locally. Interactive Feedback: The app uses subtle animations, loading states, disabled buttons, and informative modals to provide constant, clear feedback to the user about what's happening. User Safeguards: An "on before unload" prompt warns users if they try to navigate away while files are being processed, preventing accidental data loss. Technical Stack: Frontend: React with TypeScript for a robust and type-safe codebase. AI Integration: The @google/genai SDK for Node.js/web. Styling: Tailwind CSS for a utility-first styling approach. Exporting: PDF creation and docx for generating Word documents. Module System: It uses modern ES Modules with an importmap, which allows it to run directly in the browser without a separate build step - Initial Deployment

bfcc502 verified 9 months ago

preview code

raw

history blame contribute delete

204 Bytes

metadata

title: tm
emoji: 🐳
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
tags:
  - deepsite

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference