visamram02 commited on
Commit
47a7f51
·
verified ·
1 Parent(s): 69c40e4

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. Dockerfile +19 -0
  2. README.md +6 -6
Dockerfile ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # Install system dependencies
4
+ RUN apt-get update && apt-get install -y \
5
+ build-essential \
6
+ wget \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ # Install llama-cpp-python with server extra
10
+ RUN pip install --no-cache-dir llama-cpp-python[server]
11
+
12
+ # Download the model (Qwen 2.5 7B Instruct Quantized)
13
+ RUN wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf -O model.gguf
14
+
15
+ # Expose the HF Space port
16
+ EXPOSE 7860
17
+
18
+ # Run the OpenAI-compatible server
19
+ CMD ["python3", "-m", "llama_cpp.server", "--model", "model.gguf", "--host", "0.0.0.0", "--port", "7860", "--n_ctx", "4096"]
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
- title: VisamIntelli Flash
3
- emoji: 🦀
4
- colorFrom: red
5
- colorTo: indigo
6
  sdk: docker
7
  pinned: false
8
  ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: VisamIntelli-Flash
3
+ emoji:
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: docker
7
  pinned: false
8
  ---
9
+ # VisamIntelli-Flash
10
+ Custom efficient inference server for VisamIntelliAI using Qwen 2.5 7B (Quantized).