Spaces:

webmobriltechnologies
/

BoomConnext-demo

Sleeping

Its-OMG commited on 15 days ago

Commit

a2eb473

1 Parent(s): a26a4aa

Bumped torch to 2.7+ and torchcodec to 0.3 to restore ASR auto transcribe

Files changed (3) hide show

Dockerfile CHANGED Viewed

@@ -27,14 +27,17 @@ RUN pip install --user --upgrade pip setuptools wheel \
  && pip install --user "numpy>=1.26.0" Cython \
  && pip install --user --no-build-isolation pkuseg==0.0.25
-# chatterbox-tts==0.1.7 hard-pins transformers==5.2.0, but OmniVoice needs
-# >=5.3.0 (HiggsAudioV2 was added in 5.3.0). Install chatterbox first so it
-# pulls in vocos / encodec / librosa / etc., then force-upgrade transformers
-# above its pin with --no-deps. chatterbox runs fine on transformers 5.3+
-# in practice; pip will print a "broken requirement" warning that's safe
-# to ignore here.
 RUN pip install --user chatterbox-tts==0.1.7 \
- && pip install --user --no-deps --upgrade 'transformers>=5.3.0,<6'
 # Install the rest of the Python deps (chatterbox is already satisfied,
 # so it won't be re-resolved here).

  && pip install --user "numpy>=1.26.0" Cython \
  && pip install --user --no-build-isolation pkuseg==0.0.25
+# chatterbox-tts==0.1.7 hard-pins transformers==5.2.0 and torch==2.6.0, but
+# OmniVoice needs transformers>=5.3.0 (HiggsAudioV2) and the transformers ASR
+# pipeline calls torchcodec.decoders.AudioDecoder which only exists in
+# torchcodec>=0.3, which in turn needs torch>=2.7. Install chatterbox first
+# so it pulls in vocos/encodec/librosa/etc., then force-upgrade transformers,
+# torch, and torchaudio above their pins with --no-deps. chatterbox uses
+# stable PyTorch APIs and runs fine on torch 2.7 in practice; pip will print
+# a "broken requirement" warning that's safe to ignore here.
 RUN pip install --user chatterbox-tts==0.1.7 \
+ && pip install --user --no-deps --upgrade 'transformers>=5.3.0,<6' \
+ && pip install --user --no-deps --upgrade 'torch>=2.7,<2.8' 'torchaudio>=2.7,<2.8'
 # Install the rest of the Python deps (chatterbox is already satisfied,
 # so it won't be re-resolved here).

main.py CHANGED Viewed

@@ -312,6 +312,16 @@ def omnivoice_generate(
     if mode == "clone":
         if not ref_audio_path:
             raise HTTPException(400, "Voice Clone requires a reference audio file.")
         kw["voice_clone_prompt"] = models.omnivoice.create_voice_clone_prompt(
             ref_audio=ref_audio_path,
             ref_text=ref_text or None,

     if mode == "clone":
         if not ref_audio_path:
             raise HTTPException(400, "Voice Clone requires a reference audio file.")
+        if not (ref_text and ref_text.strip()) and not LOAD_ASR:
+            # Auto-transcribe (Whisper) is disabled in this build because the
+            # transformers ASR pipeline pulls in torchcodec features that need
+            # torch>=2.7, and we're holding torch at 2.6 for chatterbox.
+            raise HTTPException(
+                400,
+                "Reference text is required: auto-transcribe is disabled in "
+                "this deployment. Please paste the transcript of your "
+                "reference audio in the 'Reference text' field.",
+            )
         kw["voice_clone_prompt"] = models.omnivoice.create_voice_clone_prompt(
             ref_audio=ref_audio_path,
             ref_text=ref_text or None,

requirements.txt CHANGED Viewed

@@ -4,12 +4,14 @@ uvicorn[standard]>=0.32
 python-multipart>=0.0.18
 # --- Core ML stack ----------------------------------------------------------
-# torch is pinned to 2.6.x because chatterbox-tts==0.1.7 installs torch 2.6.0
-# and torchcodec must match torch's AOTI ABI (aoti_torch_abi_version landed
-# in 2.7). torchcodec 0.2.x is the line that targets torch 2.6.
-torch>=2.6,<2.7
-torchaudio>=2.6,<2.7
-torchcodec>=0.2,<0.3
 transformers>=5.3.0
 accelerate
 numpy>=1.26

 python-multipart>=0.0.18
 # --- Core ML stack ----------------------------------------------------------
+# torch 2.7 is required because transformers 5.3's ASR pipeline calls
+# torchcodec.decoders.AudioDecoder, which only exists from torchcodec 0.3+,
+# which in turn requires torch>=2.7 (uses the 2.7 AOTI ABI). chatterbox-tts
+# 0.1.7 metadata pins torch==2.6.0 defensively, but it runs fine on 2.7 in
+# practice; the Dockerfile bypasses that pin with --no-deps.
+torch>=2.7,<2.8
+torchaudio>=2.7,<2.8
+torchcodec>=0.3,<0.4
 transformers>=5.3.0
 accelerate
 numpy>=1.26