8bcustom-model / README.md
koreallmdev's picture
Upload README.md with huggingface_hub
1bd901a verified
|
Raw
History Blame Contribute Delete
5.95 kB
---
license: other
pipeline_tag: text-generation
tags:
- text-generation
- coding
- korean
- vllm
- open-webui
- local-llm
- lora
- qwen
- 8b
language:
- ko
- en
---
# 8bcustom-model
**8bcustom-model** is an 8B-class local coding assistant model/runtime release built for Korean developers who need practical help with Linux, Docker, vLLM, Open-WebUI, CUDA, JSONL datasets, and LoRA workflows.
This repository is part of a DGX AI Factory-style local LLM deployment project: data preparation, LoRA repair, model merge, vLLM serving, Open-WebUI integration, systemd autostart, benchmarking, and Hugging Face release packaging.
## What this model is for
This model is designed as a practical development assistant for:
- Linux command troubleshooting
- Docker and service deployment
- vLLM OpenAI-compatible serving
- Open-WebUI connection setup
- CUDA/PyTorch environment checks
- JSONL dataset validation
- LoRA training and repair workflows
- Korean step-by-step developer support
The target behavior is direct, procedural, and operational: diagnose the problem, provide exact commands, and explain the result clearly in Korean honorific style.
## Validated local runtime
The model was validated in a local production-style runtime:
| Component | Status |
|---|---|
| vLLM OpenAI-compatible API | Working |
| Open-WebUI integration | Working |
| systemd autostart | Working |
| Local model name | `dgx-stable-current` |
| Public release name | `8bcustom-model` |
| Hugging Face public repo | `koreallmdev/8bcustom-model` |
## Benchmark summary
The final deployment benchmark used a router/template runtime hardening layer for operational reliability.
| Metric | Result |
|---|---:|
| Average score | 97.75 |
| Pass ≥ 70 | 20 / 20 |
| Strong ≥ 85 | 20 / 20 |
| Critical failures | 0 |
| Decision | DEPLOY_CANDIDATE |
The benchmark focused on practical developer operations such as Linux, Docker, CUDA checks, vLLM serving, JSONL validation, FastAPI, systemd troubleshooting, LoRA policy, and Korean response quality.
## Runtime policy
For production usage, the local deployment uses a hybrid approach:
- General coding questions: model generation
- Linux/vLLM/CUDA/systemd known operational routes: guarded templates
- LoRA/stable/rejected model policy: fixed policy templates
- CJK leakage and style regressions: post-check and route hardening
This approach keeps the model useful for open-ended coding while making high-risk operational answers more deterministic.
## Quick start with vLLM
After downloading the model files, you can serve the model with vLLM:
```bash
python -m vllm.entrypoints.openai.api_server \
--model ./ \
--served-model-name 8bcustom-model \
--dtype float16 \
--host 0.0.0.0 \
--port 8000 \
--max-model-len 1536 \
--gpu-memory-utilization 0.50 \
--max-num-seqs 8
```
Check the model endpoint:
```bash
curl http://127.0.0.1:8000/v1/models
```
Send a test request:
```bash
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "8bcustom-model",
"messages": [
{
"role": "user",
"content": "Docker 컨테이너가 실행 중인지 확인하는 명령어를 알려주세요."
}
],
"temperature": 0.2,
"max_tokens": 300
}'
```
## Open-WebUI connection
For Open-WebUI, add an OpenAI-compatible API connection.
If Open-WebUI runs in Docker:
```text
Base URL: http://host.docker.internal:8000/v1
API Key : dummy
Model : 8bcustom-model
```
If you use the local deployment name from the original DGX runtime:
```text
Model: dgx-stable-current
```
If `host.docker.internal` does not work in your Docker environment, try:
```text
Base URL: http://172.17.0.1:8000/v1
```
## Example prompts
Korean developer support:
```text
Ubuntu에서 8000 포트를 사용 중인 프로세스를 확인하고 종료하는 절차를 알려주세요.
```
vLLM troubleshooting:
```text
vLLM 서버가 Open-WebUI에 모델을 표시하지 못할 때 확인해야 할 순서를 알려주세요.
```
LoRA workflow:
```text
LoRA adapter를 merge한 뒤 vLLM에서 서빙하기 전 확인해야 할 파일 목록을 알려주세요.
```
Dataset validation:
```text
JSONL 학습 데이터에서 깨진 JSON과 중복 instruction을 검사하는 Python 스크립트를 만들어주세요.
```
## Intended use
This release is intended for:
- Local developer assistants
- On-premise coding assistant experiments
- vLLM/Open-WebUI deployment practice
- Korean-language coding support
- LoRA and dataset pipeline testing
## Out-of-scope use
This model is not intended to be treated as a fully audited security, legal, medical, or financial advisor. Operational outputs should be reviewed before applying them to production systems.
## Deployment notes
The original local deployment used:
```text
Local served model name: dgx-stable-current
Open-WebUI URL : http://127.0.0.1:3000
vLLM URL : http://127.0.0.1:8000/v1
Open-WebUI Base URL : http://host.docker.internal:8000/v1
```
The public release name is:
```text
8bcustom-model
```
## Project highlights
This project demonstrates an end-to-end local LLM workflow:
1. Dataset filtering and repair
2. LoRA candidate testing
3. Regression rejection
4. Stable adapter preservation
5. Model merge for vLLM
6. Open-WebUI integration
7. systemd autostart
8. Private backup upload
9. Public Hugging Face release
10. Runtime route/template hardening
## Collaboration
This repository can be used as a portfolio reference for:
- Local LLM deployment
- vLLM serving
- Open-WebUI integration
- Korean coding assistant customization
- LoRA fine-tuning and repair workflows
- On-premise AI assistant setup
For collaboration, please contact through the Hugging Face profile associated with this repository.
## Disclaimer
This is an experimental local LLM deployment release. Validate outputs before use in production environments.