README.md · koreallmdev/8bcustom-model at main

8bcustom-model / README.md

koreallmdev

Upload README.md with huggingface_hub

1bd901a verified 3 days ago

preview code

Raw

History Blame Contribute Delete

5.95 kB

	---
	license: other
	pipeline_tag: text-generation
	tags:
	- text-generation
	- coding
	- korean
	- vllm
	- open-webui
	- local-llm
	- lora
	- qwen
	- 8b
	language:
	- ko
	- en
	---

	# 8bcustom-model

	8bcustom-model is an 8B-class local coding assistant model/runtime release built for Korean developers who need practical help with Linux, Docker, vLLM, Open-WebUI, CUDA, JSONL datasets, and LoRA workflows.

	This repository is part of a DGX AI Factory-style local LLM deployment project: data preparation, LoRA repair, model merge, vLLM serving, Open-WebUI integration, systemd autostart, benchmarking, and Hugging Face release packaging.

	## What this model is for

	This model is designed as a practical development assistant for:

	- Linux command troubleshooting
	- Docker and service deployment
	- vLLM OpenAI-compatible serving
	- Open-WebUI connection setup
	- CUDA/PyTorch environment checks
	- JSONL dataset validation
	- LoRA training and repair workflows
	- Korean step-by-step developer support

	The target behavior is direct, procedural, and operational: diagnose the problem, provide exact commands, and explain the result clearly in Korean honorific style.

	## Validated local runtime

	The model was validated in a local production-style runtime:

	\| Component \| Status \|
	\|---\|---\|
	\| vLLM OpenAI-compatible API \| Working \|
	\| Open-WebUI integration \| Working \|
	\| systemd autostart \| Working \|
	\| Local model name \| `dgx-stable-current` \|
	\| Public release name \| `8bcustom-model` \|
	\| Hugging Face public repo \| `koreallmdev/8bcustom-model` \|

	## Benchmark summary

	The final deployment benchmark used a router/template runtime hardening layer for operational reliability.

	\| Metric \| Result \|
	\|---\|---:\|
	\| Average score \| 97.75 \|
	\| Pass ≥ 70 \| 20 / 20 \|
	\| Strong ≥ 85 \| 20 / 20 \|
	\| Critical failures \| 0 \|
	\| Decision \| DEPLOY_CANDIDATE \|

	The benchmark focused on practical developer operations such as Linux, Docker, CUDA checks, vLLM serving, JSONL validation, FastAPI, systemd troubleshooting, LoRA policy, and Korean response quality.

	## Runtime policy

	For production usage, the local deployment uses a hybrid approach:

	- General coding questions: model generation
	- Linux/vLLM/CUDA/systemd known operational routes: guarded templates
	- LoRA/stable/rejected model policy: fixed policy templates
	- CJK leakage and style regressions: post-check and route hardening

	This approach keeps the model useful for open-ended coding while making high-risk operational answers more deterministic.

	## Quick start with vLLM

	After downloading the model files, you can serve the model with vLLM:

	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model ./ \
	--served-model-name 8bcustom-model \
	--dtype float16 \
	--host 0.0.0.0 \
	--port 8000 \
	--max-model-len 1536 \
	--gpu-memory-utilization 0.50 \
	--max-num-seqs 8
	```

	Check the model endpoint:

	```bash
	curl http://127.0.0.1:8000/v1/models
	```

	Send a test request:

	```bash
	curl http://127.0.0.1:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "8bcustom-model",
	"messages": [
	{
	"role": "user",
	"content": "Docker 컨테이너가 실행 중인지 확인하는 명령어를 알려주세요."
	}
	],
	"temperature": 0.2,
	"max_tokens": 300
	}'
	```

	## Open-WebUI connection

	For Open-WebUI, add an OpenAI-compatible API connection.

	If Open-WebUI runs in Docker:

	```text
	Base URL: http://host.docker.internal:8000/v1
	API Key : dummy
	Model : 8bcustom-model
	```

	If you use the local deployment name from the original DGX runtime:

	```text
	Model: dgx-stable-current
	```

	If `host.docker.internal` does not work in your Docker environment, try:

	```text
	Base URL: http://172.17.0.1:8000/v1
	```

	## Example prompts

	Korean developer support:

	```text
	Ubuntu에서 8000 포트를 사용 중인 프로세스를 확인하고 종료하는 절차를 알려주세요.
	```

	vLLM troubleshooting:

	```text
	vLLM 서버가 Open-WebUI에 모델을 표시하지 못할 때 확인해야 할 순서를 알려주세요.
	```

	LoRA workflow:

	```text
	LoRA adapter를 merge한 뒤 vLLM에서 서빙하기 전 확인해야 할 파일 목록을 알려주세요.
	```

	Dataset validation:

	```text
	JSONL 학습 데이터에서 깨진 JSON과 중복 instruction을 검사하는 Python 스크립트를 만들어주세요.
	```

	## Intended use

	This release is intended for:

	- Local developer assistants
	- On-premise coding assistant experiments
	- vLLM/Open-WebUI deployment practice
	- Korean-language coding support
	- LoRA and dataset pipeline testing

	## Out-of-scope use

	This model is not intended to be treated as a fully audited security, legal, medical, or financial advisor. Operational outputs should be reviewed before applying them to production systems.

	## Deployment notes

	The original local deployment used:

	```text
	Local served model name: dgx-stable-current
	Open-WebUI URL : http://127.0.0.1:3000
	vLLM URL : http://127.0.0.1:8000/v1
	Open-WebUI Base URL : http://host.docker.internal:8000/v1
	```

	The public release name is:

	```text
	8bcustom-model
	```

	## Project highlights

	This project demonstrates an end-to-end local LLM workflow:

	1. Dataset filtering and repair
	2. LoRA candidate testing
	3. Regression rejection
	4. Stable adapter preservation
	5. Model merge for vLLM
	6. Open-WebUI integration
	7. systemd autostart
	8. Private backup upload
	9. Public Hugging Face release
	10. Runtime route/template hardening

	## Collaboration

	This repository can be used as a portfolio reference for:

	- Local LLM deployment
	- vLLM serving
	- Open-WebUI integration
	- Korean coding assistant customization
	- LoRA fine-tuning and repair workflows
	- On-premise AI assistant setup

	For collaboration, please contact through the Hugging Face profile associated with this repository.

	## Disclaimer

	This is an experimental local LLM deployment release. Validate outputs before use in production environments.