Commits · mic3333/asr

fix app.py to remove redundant generate_kwargs in transcription calls and enable demo sharing during launch

3e64dd3

michaeltangz commited on Dec 8, 2025

fix app.py to enable demo sharing during launch

dd7a42e

michaeltangz commited on Dec 8, 2025

refactor app.py to update demo launch logic; ensure proper execution within main guard

a3d95dd

michaeltangz commited on Dec 8, 2025

fix app.py to correct dtype parameter usage in model initialization and pipeline; remove redundant torch_dtype argument

574825e

michaeltangz commited on Dec 8, 2025

fix app.py to correct variable name from dtype to torch_dtype in model and pipeline initialization

6cd2d8c

michaeltangz commited on Dec 8, 2025

fix app.py to correct variable name for dtype in model and pipeline initialization

1a404ef

michaeltangz commited on Dec 8, 2025

refactor app.py to remove flash attention installation logic and simplify attention implementation; enhance error handling in transcription functions

721ab04

michaeltangz commited on Dec 8, 2025

refactor app.py to move flash attention installation into a try-except block; ensure proper fallback to sdpa if installation fails

078e579

michaeltangz commited on Dec 8, 2025

refactor app.py to streamline flash attention installation and model loading; remove fallback mechanisms and enhance transcription parameters

8f2a46b

michaeltangz commited on Dec 8, 2025

refactor app.py to remove condition_on_previous_text parameter from transcription functions; enhance output consistency

f8af19e

michaeltangz commited on Dec 8, 2025

refactor app.py to enhance pipeline configuration; adjust chunk length for better context and simplify demo interface

31502e6

michaeltangz commited on Dec 8, 2025

refactor app.py to correct parameter name for model loading; streamline microphone input handling

d64755c

michaeltangz commited on Dec 8, 2025

refactor app.py to improve flash attention installation and model loading; add fallback mechanism for model loading

47751fa

michaeltangz commited on Dec 8, 2025

refactor app.py to streamline flash attention installation and model loading; enhance voice activity detection and transcription accuracy

62fccb4

michaeltangz commited on Dec 8, 2025

refactor app.py to enhance transcription pipeline by adding condition_on_previous_text parameter; improve transcription accuracy

ae149f3

michaeltangz commited on Dec 8, 2025

refactor app.py to specify language parameter in transcription pipeline; enhance accuracy for English transcriptions

aebccdb

michaeltangz commited on Dec 8, 2025

refactor app.py to update model loading parameters for flash attention; improve code clarity and maintainability

26bcc10

michaeltangz commited on Dec 8, 2025

refactor app.py to remove concurrency limit from audio stream; improve streaming performance

a6b63fd

michaeltangz commited on Dec 8, 2025

refactor app.py to remove unnecessary time limit from audio stream; enhance streaming functionality

35686e6

michaeltangz commited on Dec 8, 2025

refactor app.py to improve flash-attn installation and model loading with error handling; update demo launch condition

fce6c08

michaeltangz commited on Dec 8, 2025

refactor app.py to streamline flash-attn installation and model loading; update requirements.txt to remove unnecessary dependencies

6fdae11

michaeltangz commited on Dec 8, 2025

refactor app.py to improve model loading with flash attention and enhance streaming transcription; add packages.txt for dependencies

7b34cad

michaeltangz commited on Dec 8, 2025

install Flash Attention 2 and optimize Whisper model loading; enhance streaming transcription with pipeline approach and latency tracking

a9a8aec

michaeltangz commited on Dec 8, 2025

update demo launch settings for Spaces to share=False by default

03bd1f9

michaeltangz commited on Dec 8, 2025

adjust max_new_tokens in transcription functions to leave room for special tokens

e542369

michaeltangz commited on Dec 8, 2025

enhance audio transcription accuracy with librosa resampling and voice activity detection; increase buffer duration for better context retention

2afefc3

michaeltangz commited on Dec 8, 2025

refactor streaming transcription to only emit complete sentences and clear buffer after capture, removing incremental text accumulation logic

e5caa82

mic3333 commited on Dec 4, 2025

add incremental text accumulation to prevent duplicate transcription of buffered audio

846c0d7

mic3333 commited on Dec 4, 2025

simplify streaming transcription by removing VAD, diarization, and complex buffering logic

55d67f9

mic3333 commited on Dec 4, 2025

add speaker diarization with pyannote and optimize streaming pipeline with static caching and deque-based buffering

1f8fa97

mic3333 commited on Dec 4, 2025

increase buffer duration and adjust transcription parameters to prioritize accuracy over latency

52f6b26

mic3333 commited on Dec 4, 2025

reduce buffer duration and adjust transcription parameters for lower latency

8493d7f

mic3333 commited on Dec 4, 2025

improve sentence splitting to avoid breaking on common abbreviations like Mr., Ms., Dr., etc.

649d6e6

mic3333 commited on Dec 4, 2025

remove live subtitle display and only show full transcript in the UI

41e2ea2

mic3333 commited on Dec 4, 2025

improve near-duplicate detection to compare against all existing sentences instead of only the last one

5db4d4b

mic3333 commited on Dec 4, 2025

add near-duplicate detection to suppress streaming variants of the same sentence

67ce003

mic3333 commited on Dec 4, 2025

update live subtitle to persist last sentence when no incomplete audio

8d1f680

mic3333 commited on Dec 4, 2025

refactor to use per-session state instead of global variables

8a63ea7

mic3333 commited on Dec 4, 2025

update the instruction(copy) for ui

bd1f625

mic3333 commited on Nov 9, 2025

fix the transcription-focused config

bb22cc6

mic3333 commited on Nov 8, 2025

fix duplication issue - 2

8dbd3ce

mic3333 commited on Nov 8, 2025

fix duplication issue

17e0529

mic3333 commited on Nov 8, 2025

update the further bugs

d6a23f4

mic3333 commited on Nov 8, 2025

fix bugs

cd9ed00

mic3333 commited on Nov 8, 2025

fix buffer_duration issue

5348723

mic3333 commited on Nov 8, 2025

full transcription

e038230

mic3333 commited on Nov 8, 2025

simpler version

87ebad7

mic3333 commited on Nov 8, 2025

update to word by word version

47b3415

mic3333 commited on Nov 8, 2025

optimize the repeation

c5cb0f3

mic3333 commited on Nov 8, 2025

update the title

1b06df3

mic3333 commited on Nov 8, 2025

Commit History

fix app.py to remove redundant generate_kwargs in transcription calls and enable demo sharing during launch 3e64dd3

fix app.py to enable demo sharing during launch dd7a42e

refactor app.py to update demo launch logic; ensure proper execution within main guard a3d95dd

fix app.py to correct dtype parameter usage in model initialization and pipeline; remove redundant torch_dtype argument 574825e

fix app.py to correct variable name from dtype to torch_dtype in model and pipeline initialization 6cd2d8c

fix app.py to correct variable name for dtype in model and pipeline initialization 1a404ef

refactor app.py to remove flash attention installation logic and simplify attention implementation; enhance error handling in transcription functions 721ab04

refactor app.py to move flash attention installation into a try-except block; ensure proper fallback to sdpa if installation fails 078e579

refactor app.py to streamline flash attention installation and model loading; remove fallback mechanisms and enhance transcription parameters 8f2a46b

refactor app.py to remove condition_on_previous_text parameter from transcription functions; enhance output consistency f8af19e

refactor app.py to enhance pipeline configuration; adjust chunk length for better context and simplify demo interface 31502e6

refactor app.py to correct parameter name for model loading; streamline microphone input handling d64755c

refactor app.py to improve flash attention installation and model loading; add fallback mechanism for model loading 47751fa

refactor app.py to streamline flash attention installation and model loading; enhance voice activity detection and transcription accuracy 62fccb4

refactor app.py to enhance transcription pipeline by adding condition_on_previous_text parameter; improve transcription accuracy ae149f3

refactor app.py to specify language parameter in transcription pipeline; enhance accuracy for English transcriptions aebccdb

refactor app.py to update model loading parameters for flash attention; improve code clarity and maintainability 26bcc10

refactor app.py to remove concurrency limit from audio stream; improve streaming performance a6b63fd

refactor app.py to remove unnecessary time limit from audio stream; enhance streaming functionality 35686e6

refactor app.py to improve flash-attn installation and model loading with error handling; update demo launch condition fce6c08

refactor app.py to streamline flash-attn installation and model loading; update requirements.txt to remove unnecessary dependencies 6fdae11

refactor app.py to improve model loading with flash attention and enhance streaming transcription; add packages.txt for dependencies 7b34cad

install Flash Attention 2 and optimize Whisper model loading; enhance streaming transcription with pipeline approach and latency tracking a9a8aec

update demo launch settings for Spaces to share=False by default 03bd1f9

adjust max_new_tokens in transcription functions to leave room for special tokens e542369

enhance audio transcription accuracy with librosa resampling and voice activity detection; increase buffer duration for better context retention 2afefc3

refactor streaming transcription to only emit complete sentences and clear buffer after capture, removing incremental text accumulation logic e5caa82

add incremental text accumulation to prevent duplicate transcription of buffered audio 846c0d7

simplify streaming transcription by removing VAD, diarization, and complex buffering logic 55d67f9

add speaker diarization with pyannote and optimize streaming pipeline with static caching and deque-based buffering 1f8fa97

increase buffer duration and adjust transcription parameters to prioritize accuracy over latency 52f6b26

reduce buffer duration and adjust transcription parameters for lower latency 8493d7f

improve sentence splitting to avoid breaking on common abbreviations like Mr., Ms., Dr., etc. 649d6e6

remove live subtitle display and only show full transcript in the UI 41e2ea2

improve near-duplicate detection to compare against all existing sentences instead of only the last one 5db4d4b

add near-duplicate detection to suppress streaming variants of the same sentence 67ce003

update live subtitle to persist last sentence when no incomplete audio 8d1f680

refactor to use per-session state instead of global variables 8a63ea7

update the instruction(copy) for ui bd1f625

fix the transcription-focused config bb22cc6

fix duplication issue - 2 8dbd3ce

fix duplication issue 17e0529

update the further bugs d6a23f4

fix bugs cd9ed00

fix buffer_duration issue 5348723

full transcription e038230

simpler version 87ebad7

update to word by word version 47b3415

optimize the repeation c5cb0f3

update the title 1b06df3

fix app.py to remove redundant generate_kwargs in transcription calls and enable demo sharing during launch

3e64dd3

fix app.py to enable demo sharing during launch

dd7a42e

refactor app.py to update demo launch logic; ensure proper execution within main guard

a3d95dd

fix app.py to correct dtype parameter usage in model initialization and pipeline; remove redundant torch_dtype argument

574825e

fix app.py to correct variable name from dtype to torch_dtype in model and pipeline initialization

6cd2d8c

fix app.py to correct variable name for dtype in model and pipeline initialization

1a404ef

refactor app.py to remove flash attention installation logic and simplify attention implementation; enhance error handling in transcription functions

721ab04

refactor app.py to move flash attention installation into a try-except block; ensure proper fallback to sdpa if installation fails

078e579

refactor app.py to streamline flash attention installation and model loading; remove fallback mechanisms and enhance transcription parameters

8f2a46b

refactor app.py to remove condition_on_previous_text parameter from transcription functions; enhance output consistency

f8af19e

refactor app.py to enhance pipeline configuration; adjust chunk length for better context and simplify demo interface

31502e6

refactor app.py to correct parameter name for model loading; streamline microphone input handling

d64755c

refactor app.py to improve flash attention installation and model loading; add fallback mechanism for model loading

47751fa

refactor app.py to streamline flash attention installation and model loading; enhance voice activity detection and transcription accuracy

62fccb4

refactor app.py to enhance transcription pipeline by adding condition_on_previous_text parameter; improve transcription accuracy

ae149f3

refactor app.py to specify language parameter in transcription pipeline; enhance accuracy for English transcriptions

aebccdb

refactor app.py to update model loading parameters for flash attention; improve code clarity and maintainability

26bcc10

refactor app.py to remove concurrency limit from audio stream; improve streaming performance

a6b63fd

refactor app.py to remove unnecessary time limit from audio stream; enhance streaming functionality

35686e6

refactor app.py to improve flash-attn installation and model loading with error handling; update demo launch condition

fce6c08

refactor app.py to streamline flash-attn installation and model loading; update requirements.txt to remove unnecessary dependencies

6fdae11

refactor app.py to improve model loading with flash attention and enhance streaming transcription; add packages.txt for dependencies

7b34cad

install Flash Attention 2 and optimize Whisper model loading; enhance streaming transcription with pipeline approach and latency tracking

a9a8aec

update demo launch settings for Spaces to share=False by default

03bd1f9

adjust max_new_tokens in transcription functions to leave room for special tokens

e542369

enhance audio transcription accuracy with librosa resampling and voice activity detection; increase buffer duration for better context retention

2afefc3

refactor streaming transcription to only emit complete sentences and clear buffer after capture, removing incremental text accumulation logic

e5caa82

add incremental text accumulation to prevent duplicate transcription of buffered audio

846c0d7

simplify streaming transcription by removing VAD, diarization, and complex buffering logic

55d67f9

add speaker diarization with pyannote and optimize streaming pipeline with static caching and deque-based buffering

1f8fa97

increase buffer duration and adjust transcription parameters to prioritize accuracy over latency

52f6b26

reduce buffer duration and adjust transcription parameters for lower latency

8493d7f

improve sentence splitting to avoid breaking on common abbreviations like Mr., Ms., Dr., etc.

649d6e6

remove live subtitle display and only show full transcript in the UI

41e2ea2

improve near-duplicate detection to compare against all existing sentences instead of only the last one

5db4d4b

add near-duplicate detection to suppress streaming variants of the same sentence

67ce003

update live subtitle to persist last sentence when no incomplete audio

8d1f680

refactor to use per-session state instead of global variables

8a63ea7

update the instruction(copy) for ui

bd1f625

fix the transcription-focused config

bb22cc6

fix duplication issue - 2

8dbd3ce

fix duplication issue

17e0529

update the further bugs

d6a23f4

fix bugs

cd9ed00

fix buffer_duration issue

5348723

full transcription

e038230

simpler version

87ebad7

update to word by word version

47b3415

optimize the repeation

c5cb0f3

update the title

1b06df3