README.md · koreallmdev/8bcustom-model at main

File size: 5,954 Bytes

---
license: other
pipeline_tag: text-generation
tags:
- text-generation
- coding
- korean
- vllm
- open-webui
- local-llm
- lora
- qwen
- 8b
language:
- ko
- en
---

# 8bcustom-model

**8bcustom-model** is an 8B-class local coding assistant model/runtime release built for Korean developers who need practical help with Linux, Docker, vLLM, Open-WebUI, CUDA, JSONL datasets, and LoRA workflows.

This repository is part of a DGX AI Factory-style local LLM deployment project: data preparation, LoRA repair, model merge, vLLM serving, Open-WebUI integration, systemd autostart, benchmarking, and Hugging Face release packaging.

## What this model is for

This model is designed as a practical development assistant for:

- Linux command troubleshooting
- Docker and service deployment
- vLLM OpenAI-compatible serving
- Open-WebUI connection setup
- CUDA/PyTorch environment checks
- JSONL dataset validation
- LoRA training and repair workflows
- Korean step-by-step developer support

The target behavior is direct, procedural, and operational: diagnose the problem, provide exact commands, and explain the result clearly in Korean honorific style.

## Validated local runtime

The model was validated in a local production-style runtime:

| Component | Status |
|---|---|
| vLLM OpenAI-compatible API | Working |
| Open-WebUI integration | Working |
| systemd autostart | Working |
| Local model name | `dgx-stable-current` |
| Public release name | `8bcustom-model` |
| Hugging Face public repo | `koreallmdev/8bcustom-model` |

## Benchmark summary

The final deployment benchmark used a router/template runtime hardening layer for operational reliability.

| Metric | Result |
|---|---:|
| Average score | 97.75 |
| Pass ≥ 70 | 20 / 20 |
| Strong ≥ 85 | 20 / 20 |
| Critical failures | 0 |
| Decision | DEPLOY_CANDIDATE |

The benchmark focused on practical developer operations such as Linux, Docker, CUDA checks, vLLM serving, JSONL validation, FastAPI, systemd troubleshooting, LoRA policy, and Korean response quality.

## Runtime policy

For production usage, the local deployment uses a hybrid approach:

- General coding questions: model generation
- Linux/vLLM/CUDA/systemd known operational routes: guarded templates
- LoRA/stable/rejected model policy: fixed policy templates
- CJK leakage and style regressions: post-check and route hardening

This approach keeps the model useful for open-ended coding while making high-risk operational answers more deterministic.

## Quick start with vLLM

After downloading the model files, you can serve the model with vLLM:

```bash
python -m vllm.entrypoints.openai.api_server \
  --model ./ \
  --served-model-name 8bcustom-model \
  --dtype float16 \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 1536 \
  --gpu-memory-utilization 0.50 \
  --max-num-seqs 8
```

Check the model endpoint:

```bash
curl http://127.0.0.1:8000/v1/models
```

Send a test request:

```bash
curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "8bcustom-model",
    "messages": [
      {
        "role": "user",
        "content": "Docker 컨테이너가 실행 중인지 확인하는 명령어를 알려주세요."
      }
    ],
    "temperature": 0.2,
    "max_tokens": 300
  }'
```

## Open-WebUI connection

For Open-WebUI, add an OpenAI-compatible API connection.

If Open-WebUI runs in Docker:

```text
Base URL: http://host.docker.internal:8000/v1
API Key : dummy
Model   : 8bcustom-model
```

If you use the local deployment name from the original DGX runtime:

```text
Model: dgx-stable-current
```

If `host.docker.internal` does not work in your Docker environment, try:

```text
Base URL: http://172.17.0.1:8000/v1
```

## Example prompts

Korean developer support:

```text
Ubuntu에서 8000 포트를 사용 중인 프로세스를 확인하고 종료하는 절차를 알려주세요.
```

vLLM troubleshooting:

```text
vLLM 서버가 Open-WebUI에 모델을 표시하지 못할 때 확인해야 할 순서를 알려주세요.
```

LoRA workflow:

```text
LoRA adapter를 merge한 뒤 vLLM에서 서빙하기 전 확인해야 할 파일 목록을 알려주세요.
```

Dataset validation:

```text
JSONL 학습 데이터에서 깨진 JSON과 중복 instruction을 검사하는 Python 스크립트를 만들어주세요.
```

## Intended use

This release is intended for:

- Local developer assistants
- On-premise coding assistant experiments
- vLLM/Open-WebUI deployment practice
- Korean-language coding support
- LoRA and dataset pipeline testing

## Out-of-scope use

This model is not intended to be treated as a fully audited security, legal, medical, or financial advisor. Operational outputs should be reviewed before applying them to production systems.

## Deployment notes

The original local deployment used:

```text
Local served model name: dgx-stable-current
Open-WebUI URL        : http://127.0.0.1:3000
vLLM URL              : http://127.0.0.1:8000/v1
Open-WebUI Base URL   : http://host.docker.internal:8000/v1
```

The public release name is:

```text
8bcustom-model
```

## Project highlights

This project demonstrates an end-to-end local LLM workflow:

1. Dataset filtering and repair
2. LoRA candidate testing
3. Regression rejection
4. Stable adapter preservation
5. Model merge for vLLM
6. Open-WebUI integration
7. systemd autostart
8. Private backup upload
9. Public Hugging Face release
10. Runtime route/template hardening

## Collaboration

This repository can be used as a portfolio reference for:

- Local LLM deployment
- vLLM serving
- Open-WebUI integration
- Korean coding assistant customization
- LoRA fine-tuning and repair workflows
- On-premise AI assistant setup

For collaboration, please contact through the Hugging Face profile associated with this repository.

## Disclaimer

This is an experimental local LLM deployment release. Validate outputs before use in production environments.