--- license: other pipeline_tag: text-generation tags: - text-generation - coding - korean - vllm - open-webui - local-llm - lora - qwen - 8b language: - ko - en --- # 8bcustom-model **8bcustom-model** is an 8B-class local coding assistant model/runtime release built for Korean developers who need practical help with Linux, Docker, vLLM, Open-WebUI, CUDA, JSONL datasets, and LoRA workflows. This repository is part of a DGX AI Factory-style local LLM deployment project: data preparation, LoRA repair, model merge, vLLM serving, Open-WebUI integration, systemd autostart, benchmarking, and Hugging Face release packaging. ## What this model is for This model is designed as a practical development assistant for: - Linux command troubleshooting - Docker and service deployment - vLLM OpenAI-compatible serving - Open-WebUI connection setup - CUDA/PyTorch environment checks - JSONL dataset validation - LoRA training and repair workflows - Korean step-by-step developer support The target behavior is direct, procedural, and operational: diagnose the problem, provide exact commands, and explain the result clearly in Korean honorific style. ## Validated local runtime The model was validated in a local production-style runtime: | Component | Status | |---|---| | vLLM OpenAI-compatible API | Working | | Open-WebUI integration | Working | | systemd autostart | Working | | Local model name | `dgx-stable-current` | | Public release name | `8bcustom-model` | | Hugging Face public repo | `koreallmdev/8bcustom-model` | ## Benchmark summary The final deployment benchmark used a router/template runtime hardening layer for operational reliability. | Metric | Result | |---|---:| | Average score | 97.75 | | Pass ≥ 70 | 20 / 20 | | Strong ≥ 85 | 20 / 20 | | Critical failures | 0 | | Decision | DEPLOY_CANDIDATE | The benchmark focused on practical developer operations such as Linux, Docker, CUDA checks, vLLM serving, JSONL validation, FastAPI, systemd troubleshooting, LoRA policy, and Korean response quality. ## Runtime policy For production usage, the local deployment uses a hybrid approach: - General coding questions: model generation - Linux/vLLM/CUDA/systemd known operational routes: guarded templates - LoRA/stable/rejected model policy: fixed policy templates - CJK leakage and style regressions: post-check and route hardening This approach keeps the model useful for open-ended coding while making high-risk operational answers more deterministic. ## Quick start with vLLM After downloading the model files, you can serve the model with vLLM: ```bash python -m vllm.entrypoints.openai.api_server \ --model ./ \ --served-model-name 8bcustom-model \ --dtype float16 \ --host 0.0.0.0 \ --port 8000 \ --max-model-len 1536 \ --gpu-memory-utilization 0.50 \ --max-num-seqs 8 ``` Check the model endpoint: ```bash curl http://127.0.0.1:8000/v1/models ``` Send a test request: ```bash curl http://127.0.0.1:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "8bcustom-model", "messages": [ { "role": "user", "content": "Docker 컨테이너가 실행 중인지 확인하는 명령어를 알려주세요." } ], "temperature": 0.2, "max_tokens": 300 }' ``` ## Open-WebUI connection For Open-WebUI, add an OpenAI-compatible API connection. If Open-WebUI runs in Docker: ```text Base URL: http://host.docker.internal:8000/v1 API Key : dummy Model : 8bcustom-model ``` If you use the local deployment name from the original DGX runtime: ```text Model: dgx-stable-current ``` If `host.docker.internal` does not work in your Docker environment, try: ```text Base URL: http://172.17.0.1:8000/v1 ``` ## Example prompts Korean developer support: ```text Ubuntu에서 8000 포트를 사용 중인 프로세스를 확인하고 종료하는 절차를 알려주세요. ``` vLLM troubleshooting: ```text vLLM 서버가 Open-WebUI에 모델을 표시하지 못할 때 확인해야 할 순서를 알려주세요. ``` LoRA workflow: ```text LoRA adapter를 merge한 뒤 vLLM에서 서빙하기 전 확인해야 할 파일 목록을 알려주세요. ``` Dataset validation: ```text JSONL 학습 데이터에서 깨진 JSON과 중복 instruction을 검사하는 Python 스크립트를 만들어주세요. ``` ## Intended use This release is intended for: - Local developer assistants - On-premise coding assistant experiments - vLLM/Open-WebUI deployment practice - Korean-language coding support - LoRA and dataset pipeline testing ## Out-of-scope use This model is not intended to be treated as a fully audited security, legal, medical, or financial advisor. Operational outputs should be reviewed before applying them to production systems. ## Deployment notes The original local deployment used: ```text Local served model name: dgx-stable-current Open-WebUI URL : http://127.0.0.1:3000 vLLM URL : http://127.0.0.1:8000/v1 Open-WebUI Base URL : http://host.docker.internal:8000/v1 ``` The public release name is: ```text 8bcustom-model ``` ## Project highlights This project demonstrates an end-to-end local LLM workflow: 1. Dataset filtering and repair 2. LoRA candidate testing 3. Regression rejection 4. Stable adapter preservation 5. Model merge for vLLM 6. Open-WebUI integration 7. systemd autostart 8. Private backup upload 9. Public Hugging Face release 10. Runtime route/template hardening ## Collaboration This repository can be used as a portfolio reference for: - Local LLM deployment - vLLM serving - Open-WebUI integration - Korean coding assistant customization - LoRA fine-tuning and repair workflows - On-premise AI assistant setup For collaboration, please contact through the Hugging Face profile associated with this repository. ## Disclaimer This is an experimental local LLM deployment release. Validate outputs before use in production environments.