ForStream Claude Opus 4.8 commited on
Commit
4a178d5
·
1 Parent(s): 5454e77

Fix Gemma StopIteration: gemma-4-26B-A4B-it (Novita provider)

Browse files

HF_GEMMA_MODEL=gemma-4-E4B-it은 HF Inference Provider 0개라
InferenceClient provider 해석에서 next()→StopIteration (네트워크 전 즉시 실패).
gemma-4-26B-A4B-it(Novita 서빙, not gated, active 3.8B MoE)로 교체하고
provider="novita"를 chat_completion 2곳(답변생성+의도파서)에 명시.

- llm_adapters.py: GEMMA_MODEL_HF 기본값 + HF_PROVIDER 신설, provider 지정 2곳
- Dockerfile: CACHE_BUST v3→v4 (COPY api/ 무효화)
- ⚠️ Space Variable HF_GEMMA_MODEL도 google/gemma-4-26B-A4B-it로 변경해야 코드 기본값이 살아남 (UI)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (2) hide show
  1. Dockerfile +1 -1
  2. api/llm_adapters.py +6 -3
Dockerfile CHANGED
@@ -39,7 +39,7 @@ RUN pip install --upgrade pip && pip install -r /app/api/requirements.txt
39
 
40
  # 코드 (rag_engine·semantic_search 등 active/code의 핵심 모듈)
41
  # 캐시 무효화용 ARG (commit SHA 다르면 캐시 무효화)
42
- ARG CACHE_BUST=v3
43
  COPY code/ /app/code/
44
  # 백엔드
45
  COPY api/ /app/api/
 
39
 
40
  # 코드 (rag_engine·semantic_search 등 active/code의 핵심 모듈)
41
  # 캐시 무효화용 ARG (commit SHA 다르면 캐시 무효화)
42
+ ARG CACHE_BUST=v4
43
  COPY code/ /app/code/
44
  # 백엔드
45
  COPY api/ /app/api/
api/llm_adapters.py CHANGED
@@ -13,7 +13,10 @@ from typing import Optional
13
 
14
  LLM_BACKEND = os.environ.get("LLM_BACKEND", "ollama")
15
  GEMMA_MODEL_OLLAMA = os.environ.get("OLLAMA_MODEL", "gemma4:e4b")
16
- GEMMA_MODEL_HF = os.environ.get("HF_GEMMA_MODEL", "google/gemma-4-E4B-it")
 
 
 
17
  HF_TOKEN = os.environ.get("HF_TOKEN", "")
18
 
19
  # transformers_local backend cache
@@ -56,7 +59,7 @@ def call_gemma_intent_parser(question: str, system: str) -> tuple[str | None, bo
56
  elif LLM_BACKEND == "hf_inference":
57
  try:
58
  from huggingface_hub import InferenceClient
59
- client = InferenceClient(model=GEMMA_MODEL_HF, token=HF_TOKEN)
60
  resp = client.chat_completion(
61
  messages=[{"role": "system", "content": system},
62
  {"role": "user", "content": question}],
@@ -107,7 +110,7 @@ def _call_hf_inference(system, prompt, max_tokens, temperature):
107
  return None, False, "HF_TOKEN 환경변수 미설정"
108
  try:
109
  start = time.time()
110
- client = InferenceClient(model=GEMMA_MODEL_HF, token=HF_TOKEN)
111
  resp = client.chat_completion(
112
  messages=[{"role": "system", "content": system},
113
  {"role": "user", "content": prompt}],
 
13
 
14
  LLM_BACKEND = os.environ.get("LLM_BACKEND", "ollama")
15
  GEMMA_MODEL_OLLAMA = os.environ.get("OLLAMA_MODEL", "gemma4:e4b")
16
+ GEMMA_MODEL_HF = os.environ.get("HF_GEMMA_MODEL", "google/gemma-4-26B-A4B-it")
17
+ # HF Inference Provider — gemma-4-26B-A4B-it는 Novita가 serverless 서빙. 모델 교체 시 함께 맞출 것.
18
+ # (provider 미지정 시 InferenceClient가 provider 0개 모델에서 next()→StopIteration으로 죽음)
19
+ HF_PROVIDER = os.environ.get("HF_PROVIDER", "novita")
20
  HF_TOKEN = os.environ.get("HF_TOKEN", "")
21
 
22
  # transformers_local backend cache
 
59
  elif LLM_BACKEND == "hf_inference":
60
  try:
61
  from huggingface_hub import InferenceClient
62
+ client = InferenceClient(model=GEMMA_MODEL_HF, token=HF_TOKEN, provider=HF_PROVIDER)
63
  resp = client.chat_completion(
64
  messages=[{"role": "system", "content": system},
65
  {"role": "user", "content": question}],
 
110
  return None, False, "HF_TOKEN 환경변수 미설정"
111
  try:
112
  start = time.time()
113
+ client = InferenceClient(model=GEMMA_MODEL_HF, token=HF_TOKEN, provider=HF_PROVIDER)
114
  resp = client.chat_completion(
115
  messages=[{"role": "system", "content": system},
116
  {"role": "user", "content": prompt}],