| --- |
| license: other |
| pipeline_tag: text-generation |
| tags: |
| - text-generation |
| - coding |
| - korean |
| - vllm |
| - open-webui |
| - local-llm |
| - lora |
| - qwen |
| - 8b |
| language: |
| - ko |
| - en |
| --- |
| |
| # 8bcustom-model |
|
|
| **8bcustom-model** is an 8B-class local coding assistant model/runtime release built for Korean developers who need practical help with Linux, Docker, vLLM, Open-WebUI, CUDA, JSONL datasets, and LoRA workflows. |
|
|
| This repository is part of a DGX AI Factory-style local LLM deployment project: data preparation, LoRA repair, model merge, vLLM serving, Open-WebUI integration, systemd autostart, benchmarking, and Hugging Face release packaging. |
|
|
| ## What this model is for |
|
|
| This model is designed as a practical development assistant for: |
|
|
| - Linux command troubleshooting |
| - Docker and service deployment |
| - vLLM OpenAI-compatible serving |
| - Open-WebUI connection setup |
| - CUDA/PyTorch environment checks |
| - JSONL dataset validation |
| - LoRA training and repair workflows |
| - Korean step-by-step developer support |
|
|
| The target behavior is direct, procedural, and operational: diagnose the problem, provide exact commands, and explain the result clearly in Korean honorific style. |
|
|
| ## Validated local runtime |
|
|
| The model was validated in a local production-style runtime: |
|
|
| | Component | Status | |
| |---|---| |
| | vLLM OpenAI-compatible API | Working | |
| | Open-WebUI integration | Working | |
| | systemd autostart | Working | |
| | Local model name | `dgx-stable-current` | |
| | Public release name | `8bcustom-model` | |
| | Hugging Face public repo | `koreallmdev/8bcustom-model` | |
|
|
| ## Benchmark summary |
|
|
| The final deployment benchmark used a router/template runtime hardening layer for operational reliability. |
|
|
| | Metric | Result | |
| |---|---:| |
| | Average score | 97.75 | |
| | Pass ≥ 70 | 20 / 20 | |
| | Strong ≥ 85 | 20 / 20 | |
| | Critical failures | 0 | |
| | Decision | DEPLOY_CANDIDATE | |
| |
| The benchmark focused on practical developer operations such as Linux, Docker, CUDA checks, vLLM serving, JSONL validation, FastAPI, systemd troubleshooting, LoRA policy, and Korean response quality. |
| |
| ## Runtime policy |
| |
| For production usage, the local deployment uses a hybrid approach: |
| |
| - General coding questions: model generation |
| - Linux/vLLM/CUDA/systemd known operational routes: guarded templates |
| - LoRA/stable/rejected model policy: fixed policy templates |
| - CJK leakage and style regressions: post-check and route hardening |
| |
| This approach keeps the model useful for open-ended coding while making high-risk operational answers more deterministic. |
| |
| ## Quick start with vLLM |
| |
| After downloading the model files, you can serve the model with vLLM: |
| |
| ```bash |
| python -m vllm.entrypoints.openai.api_server \ |
| --model ./ \ |
| --served-model-name 8bcustom-model \ |
| --dtype float16 \ |
| --host 0.0.0.0 \ |
| --port 8000 \ |
| --max-model-len 1536 \ |
| --gpu-memory-utilization 0.50 \ |
| --max-num-seqs 8 |
| ``` |
| |
| Check the model endpoint: |
| |
| ```bash |
| curl http://127.0.0.1:8000/v1/models |
| ``` |
| |
| Send a test request: |
| |
| ```bash |
| curl http://127.0.0.1:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "8bcustom-model", |
| "messages": [ |
| { |
| "role": "user", |
| "content": "Docker 컨테이너가 실행 중인지 확인하는 명령어를 알려주세요." |
| } |
| ], |
| "temperature": 0.2, |
| "max_tokens": 300 |
| }' |
| ``` |
| |
| ## Open-WebUI connection |
|
|
| For Open-WebUI, add an OpenAI-compatible API connection. |
|
|
| If Open-WebUI runs in Docker: |
|
|
| ```text |
| Base URL: http://host.docker.internal:8000/v1 |
| API Key : dummy |
| Model : 8bcustom-model |
| ``` |
|
|
| If you use the local deployment name from the original DGX runtime: |
|
|
| ```text |
| Model: dgx-stable-current |
| ``` |
|
|
| If `host.docker.internal` does not work in your Docker environment, try: |
|
|
| ```text |
| Base URL: http://172.17.0.1:8000/v1 |
| ``` |
|
|
| ## Example prompts |
|
|
| Korean developer support: |
|
|
| ```text |
| Ubuntu에서 8000 포트를 사용 중인 프로세스를 확인하고 종료하는 절차를 알려주세요. |
| ``` |
|
|
| vLLM troubleshooting: |
|
|
| ```text |
| vLLM 서버가 Open-WebUI에 모델을 표시하지 못할 때 확인해야 할 순서를 알려주세요. |
| ``` |
|
|
| LoRA workflow: |
|
|
| ```text |
| LoRA adapter를 merge한 뒤 vLLM에서 서빙하기 전 확인해야 할 파일 목록을 알려주세요. |
| ``` |
|
|
| Dataset validation: |
|
|
| ```text |
| JSONL 학습 데이터에서 깨진 JSON과 중복 instruction을 검사하는 Python 스크립트를 만들어주세요. |
| ``` |
|
|
| ## Intended use |
|
|
| This release is intended for: |
|
|
| - Local developer assistants |
| - On-premise coding assistant experiments |
| - vLLM/Open-WebUI deployment practice |
| - Korean-language coding support |
| - LoRA and dataset pipeline testing |
|
|
| ## Out-of-scope use |
|
|
| This model is not intended to be treated as a fully audited security, legal, medical, or financial advisor. Operational outputs should be reviewed before applying them to production systems. |
|
|
| ## Deployment notes |
|
|
| The original local deployment used: |
|
|
| ```text |
| Local served model name: dgx-stable-current |
| Open-WebUI URL : http://127.0.0.1:3000 |
| vLLM URL : http://127.0.0.1:8000/v1 |
| Open-WebUI Base URL : http://host.docker.internal:8000/v1 |
| ``` |
|
|
| The public release name is: |
|
|
| ```text |
| 8bcustom-model |
| ``` |
|
|
| ## Project highlights |
|
|
| This project demonstrates an end-to-end local LLM workflow: |
|
|
| 1. Dataset filtering and repair |
| 2. LoRA candidate testing |
| 3. Regression rejection |
| 4. Stable adapter preservation |
| 5. Model merge for vLLM |
| 6. Open-WebUI integration |
| 7. systemd autostart |
| 8. Private backup upload |
| 9. Public Hugging Face release |
| 10. Runtime route/template hardening |
|
|
| ## Collaboration |
|
|
| This repository can be used as a portfolio reference for: |
|
|
| - Local LLM deployment |
| - vLLM serving |
| - Open-WebUI integration |
| - Korean coding assistant customization |
| - LoRA fine-tuning and repair workflows |
| - On-premise AI assistant setup |
|
|
| For collaboration, please contact through the Hugging Face profile associated with this repository. |
|
|
| ## Disclaimer |
|
|
| This is an experimental local LLM deployment release. Validate outputs before use in production environments. |
|
|