Spaces:

skkalwar
/

LLM_Model

Sleeping

Shreekant Kalwar (Nokia) commited on Aug 29

Commit

11ff83b

0 Parent(s):

initial commit

Files changed (4) hide show

.gitignore ADDED Viewed

+/venv
+.env
+__pycache__

Dockerfile ADDED Viewed

+FROM python:3.9
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+WORKDIR /app
+COPY --chown=user ./requirements.txt requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+COPY --chown=user . /app
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

app.py ADDED Viewed

+from fastapi import FastAPI
+from pydantic import BaseModel
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load DeepSeek model (small one for local use)
+# Try bigger models if you have a GPU with >12GB VRAM
+model_name = "deepseek-ai/deepseek-coder-1.3b-instruct"
+print("Loading model... this may take a minute ⏳")
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+    device_map="auto"
+)
+print("Model loaded ✅")
+app = FastAPI()
+class ChatRequest(BaseModel):
+    message: str
+@app.post("/chat")
+def chat(request: ChatRequest):
+    """Chat endpoint using DeepSeek model"""
+    inputs = tokenizer(request.message, return_tensors="pt").to(model.device)
+    outputs = model.generate(**inputs, max_new_tokens=200)
+    reply = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    return {"reply": reply}

requirements.txt ADDED Viewed

Binary file (1.38 kB). View file