Instructions to use dataslab/DSLM-LST-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dataslab/DSLM-LST-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dataslab/DSLM-LST-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("dataslab/DSLM-LST-9B")
model = AutoModelForImageTextToText.from_pretrained("dataslab/DSLM-LST-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dataslab/DSLM-LST-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dataslab/DSLM-LST-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataslab/DSLM-LST-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dataslab/DSLM-LST-9B

SGLang

How to use dataslab/DSLM-LST-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dataslab/DSLM-LST-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataslab/DSLM-LST-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dataslab/DSLM-LST-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataslab/DSLM-LST-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dataslab/DSLM-LST-9B with Docker Model Runner:
```
docker model run hf.co/dataslab/DSLM-LST-9B
```

WONBKIM commited on 3 days ago

Commit

94a210e

verified ·

1 Parent(s): c19ecea

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -26

README.md CHANGED Viewed

@@ -33,8 +33,8 @@ tags:
   - composite-vision-language
 ---
-# DLM-LST-9B
-**DLM-LST-9B** is a `Qwen3.5-9B` derivative refined with our in-house **Language Selection Tuning (LST)** technique.
 The goal is to suppress unwanted Chinese-character generation when the model is used to serve non-Chinese (English / Korean / Japanese etc.) users.
 The adjustment is intentionally minimal in scope; most of the network — including vision and multimodal components — is preserved bit-identical to the base model.
 Vision and multimodal capabilities are preserved unchanged.
@@ -59,7 +59,7 @@ so the effect tends to **persist through downstream full-parameter SFT / RLHF st
 The recommended serving path is **vLLM**, which is also what we used in our evaluation pipeline.
 ```bash
-vllm serve dataslab/DLM-LST-9B \
     --port 8000 \
     --dtype bfloat16 \
     --gpu-memory-utilization 0.90 \
@@ -78,7 +78,7 @@ vllm serve dataslab/DLM-LST-9B \
 import torch
 from transformers import AutoTokenizer, AutoModelForImageTextToText
-REPO = "dataslab/DLM-LST-9B"
 tokenizer = AutoTokenizer.from_pretrained(REPO)
 model = AutoModelForImageTextToText.from_pretrained(
@@ -261,7 +261,7 @@ Korean prompts → Korean answers expected; any Chinese token leaked into the an
 ### Chinese Suppression (**Thinking mode**)
-Evaluated with `enable_thinking=True`. The DLM-LST-9B column is calibrated with thinking enabled.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
@@ -277,7 +277,7 @@ Evaluated with `enable_thinking=True`. The DLM-LST-9B column is calibrated with
       <th>Qwen3.5-9B (base)</th>
       <th>LST-L1</th>
       <th>LST-L2</th>
-      <th style="color:#EAB308;"><b>DLM-LST-9B</b><br/></th>
     </tr>
   </thead>
   <tbody>
@@ -296,14 +296,14 @@ Evaluated with `enable_thinking=True`. The DLM-LST-9B column is calibrated with
   </tbody>
 </table>
-**DLM-LST-9B keeps `chin_refusal` at 0.065.** It preserves the ability to generate Chinese when the user explicitly asks for it,
 while still cutting unintended Chinese leakage to the level of `chin_total ≈ 0.99`.
 Downstream reasoning (`acc_*`, HumanEval, GSM8K) is comparable to, or in some cases even better than, the base model.
 ### Chinese Suppression (**Non-Thinking mode**)
-Evaluated with `enable_thinking=False`. The DLM-LST-9B column here is a **separate think-OFF-calibrated checkpoint** (not this release).
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
@@ -319,7 +319,7 @@ Evaluated with `enable_thinking=False`. The DLM-LST-9B column here is a **separa
       <th>Qwen3.5-9B (base)</th>
       <th>LST-L1</th>
       <th>LST-L2</th>
-      <th style="color:#EAB308;"><b>DLM-LST-9B</b><br/></th>
     </tr>
   </thead>
   <tbody>
@@ -341,7 +341,7 @@ Evaluated with `enable_thinking=False`. The DLM-LST-9B column here is a **separa
 ### Suppression Persistence after SFT-stage (**Non-Thinking mode**)
 Each pipeline was fine-tuned via full-parameter SFT (all weights trainable, no PEFT / LoRA) on the beomi/KoAlpaca-v1.1a dataset.
-After the SFT stage, DLM-LST-9B keeps both its Chinese-leak suppression (`SRR ≈ 1.000`) and its selectivity (`|Δ_selectivity| ≈ 0.08`) almost unchanged.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
@@ -353,7 +353,7 @@ After the SFT stage, DLM-LST-9B keeps both its Chinese-leak suppression (`SRR
     <tr>
       <th>Metric</th>
       <th>Qwen3.5-9B → SFT</th>
-      <th style="color:#EAB308;"><b>DLM-LST-9B → SFT</b></th>
     </tr>
   </thead>
   <tbody>
@@ -380,7 +380,7 @@ After the SFT stage, DLM-LST-9B keeps both its Chinese-leak suppression (`SRR
       <th>Metric</th>
       <th>Qwen3.5-9B (base)</th>
       <th>Qwen3.5-9B → SFT</th>
-      <th style="color:#EAB308;"><b>DLM-LST-9B → SFT</b></th>
     </tr>
   </thead>
   <tbody>
@@ -400,12 +400,12 @@ After the SFT stage, DLM-LST-9B keeps both its Chinese-leak suppression (`SRR
 </table>
 The base model's selectivity shifts substantially after full-parameter SFT (`chin_refusal` 0.037 → 0.128),
-while DLM-LST-9B's suppression behavior remains nearly invariant before and after full-parameter SFT.
 This shows that LST does not act as a thin surface patch — its effect is encoded in a way that **survives downstream fine-tuning**.
 ### English Suppression (**Non-Thinking mode**) — generalization check
 To confirm LST is not tied to a specific language pair, we applied the same approach to `Llama-3.1-8B-Instruct` for *English* leakage suppression.
-The DLM-LST configuration is the only variant that keeps coding (HumanEval) and math (GSM8K) usable while still meaningfully reducing leakage.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
@@ -417,7 +417,7 @@ The DLM-LST configuration is the only variant that keeps coding (HumanEval) and
     <tr>
       <th>Metric</th>
       <th>Llama-3.1-8B-Instruct (base)</th>
-      <th style="color:#EAB308;"><b>DLM-LST (Llama-3.1-8B)</b></th>
     </tr>
   </thead>
   <tbody>
@@ -440,12 +440,12 @@ The DLM-LST configuration is the only variant that keeps coding (HumanEval) and
 ## Example Outputs
 <p align="center">
-  <img src="assets/banner.png" alt="DLM-LST-9B vs Qwen3.5-9B on a Korean KMMLU prompt: base leaks 9 Chinese tokens (伊利石, кaо린, 的), DLM-LST-9B emits 0 Chinese tokens." width="640" />
 </p>
 Asked in Korean about the most common clay mineral on the Korean
 Peninsula, Qwen3.5-9B leaks 9 Chinese / mixed-script tokens (`伊利石`,
-`кao린`, `的`) into its answer. DLM-LST-9B answers the same prompt
 entirely in Korean (0 Chinese tokens).
@@ -464,7 +464,7 @@ entirely in Korean (0 Chinese tokens).
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>才开始</code>)</th>
-      <th style="color:#EAB308;">DLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
@@ -505,7 +505,7 @@ entirely in Korean (0 Chinese tokens).
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>积压</code>)</th>
-      <th style="color:#EAB308;">DLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
@@ -545,7 +545,7 @@ entirely in Korean (0 Chinese tokens).
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>享有的</code>)</th>
-      <th style="color:#EAB308;">DLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
@@ -570,14 +570,14 @@ entirely in Korean (0 Chinese tokens).
 ### Cross-lingual Selectivity
-When the user **explicitly asks for Chinese**, DLM-LST-9B readily
 produces it. The previous examples showed the model *avoiding* unwanted
 Chinese inside an otherwise-Korean answer; the example below shows it
 emitting Chinese fluently when the user's instruction calls for it.
 **Prompt:** 피보나치 수열의 n번째 항을 반환하는 파이썬 함수를 작성해주세요. 설명은 중국어로 해주세요.
-**DLM-LST-9B (code in Python, explanation in Chinese):**
 ```
 다음은 파이썬을 사용하여 피보나치 수열의 n 번째 항을 계산하는 함수입니다.
@@ -604,7 +604,7 @@ def fibonacci(n):
 Qwen3.5-9B's `<think>` block leaks Chinese even more severely than its
 final answer, often slipping into Chinese once the reasoning gets stuck.
-DLM-LST-9B suppresses that leakage inside the thinking block too.
 **Prompt:** 업무 협조 요청을 받은 기관이 협조 요청 문서에 흠이 있음을 발견한 때에는 접수한 날부터 몇 일 이내에 보완을 요구하여야 하는가? (사무관리규정 개정으로 제외된 문제입니다. 정답은 3번 입니다.)
@@ -620,7 +620,7 @@ DLM-LST-9B suppresses that leakage inside the thinking block too.
     <tr>
       <th>Metric</th>
       <th>Qwen3.5-9B</th>
-      <th style="color:#EAB308;">DLM-LST-9B</th>
     </tr>
   </thead>
   <tbody>
@@ -643,9 +643,9 @@ DLM-LST-9B suppresses that leakage inside the thinking block too.
 </table>
 In the base model's trace, every cycle ends with `(Wait, I need to write in Korean). Okay, I will write in Korean.` — yet the very next token is Chinese again, and the trace slides right back into the same fragment.
-This loop fires **484 times** before the token budget runs out. DLM-LST-9B targets exactly this failure:
 Chinese tokens being chosen even right after the model says they should not be.
-On the same prompt, DLM-LST-9B's `<think>` block contains **0 Chinese characters** and terminates naturally,
 and the final user-facing answer is in clean Korean.

   - composite-vision-language
 ---
+# DSLM-LST-9B
+**DSLM-LST-9B** is a `Qwen3.5-9B` derivative refined with our in-house **Language Selection Tuning (LST)** technique.
 The goal is to suppress unwanted Chinese-character generation when the model is used to serve non-Chinese (English / Korean / Japanese etc.) users.
 The adjustment is intentionally minimal in scope; most of the network — including vision and multimodal components — is preserved bit-identical to the base model.
 Vision and multimodal capabilities are preserved unchanged.
 The recommended serving path is **vLLM**, which is also what we used in our evaluation pipeline.
 ```bash
+vllm serve dataslab/DSLM-LST-9B \
     --port 8000 \
     --dtype bfloat16 \
     --gpu-memory-utilization 0.90 \
 import torch
 from transformers import AutoTokenizer, AutoModelForImageTextToText
+REPO = "dataslab/DSLM-LST-9B"
 tokenizer = AutoTokenizer.from_pretrained(REPO)
 model = AutoModelForImageTextToText.from_pretrained(
 ### Chinese Suppression (**Thinking mode**)
+Evaluated with `enable_thinking=True`. The DSLM-LST-9B column is calibrated with thinking enabled.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
       <th>Qwen3.5-9B (base)</th>
       <th>LST-L1</th>
       <th>LST-L2</th>
+      <th style="color:#EAB308;"><b>DSLM-LST-9B</b><br/></th>
     </tr>
   </thead>
   <tbody>
   </tbody>
 </table>
+**DSLM-LST-9B keeps `chin_refusal` at 0.065.** It preserves the ability to generate Chinese when the user explicitly asks for it,
 while still cutting unintended Chinese leakage to the level of `chin_total ≈ 0.99`.
 Downstream reasoning (`acc_*`, HumanEval, GSM8K) is comparable to, or in some cases even better than, the base model.
 ### Chinese Suppression (**Non-Thinking mode**)
+Evaluated with `enable_thinking=False`. The DSLM-LST-9B column here is a **separate think-OFF-calibrated checkpoint** (not this release).
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
       <th>Qwen3.5-9B (base)</th>
       <th>LST-L1</th>
       <th>LST-L2</th>
+      <th style="color:#EAB308;"><b>DSLM-LST-9B</b><br/></th>
     </tr>
   </thead>
   <tbody>
 ### Suppression Persistence after SFT-stage (**Non-Thinking mode**)
 Each pipeline was fine-tuned via full-parameter SFT (all weights trainable, no PEFT / LoRA) on the beomi/KoAlpaca-v1.1a dataset.
+After the SFT stage, DSLM-LST-9B keeps both its Chinese-leak suppression (`SRR ≈ 1.000`) and its selectivity (`|Δ_selectivity| ≈ 0.08`) almost unchanged.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
     <tr>
       <th>Metric</th>
       <th>Qwen3.5-9B → SFT</th>
+      <th style="color:#EAB308;"><b>DSLM-LST-9B → SFT</b></th>
     </tr>
   </thead>
   <tbody>
       <th>Metric</th>
       <th>Qwen3.5-9B (base)</th>
       <th>Qwen3.5-9B → SFT</th>
+      <th style="color:#EAB308;"><b>DSLM-LST-9B → SFT</b></th>
     </tr>
   </thead>
   <tbody>
 </table>
 The base model's selectivity shifts substantially after full-parameter SFT (`chin_refusal` 0.037 → 0.128),
+while DSLM-LST-9B's suppression behavior remains nearly invariant before and after full-parameter SFT.
 This shows that LST does not act as a thin surface patch — its effect is encoded in a way that **survives downstream fine-tuning**.
 ### English Suppression (**Non-Thinking mode**) — generalization check
 To confirm LST is not tied to a specific language pair, we applied the same approach to `Llama-3.1-8B-Instruct` for *English* leakage suppression.
+The DSLM-LST configuration is the only variant that keeps coding (HumanEval) and math (GSM8K) usable while still meaningfully reducing leakage.
 <table style="table-layout: fixed; width: 100%;">
   <colgroup>
     <tr>
       <th>Metric</th>
       <th>Llama-3.1-8B-Instruct (base)</th>
+      <th style="color:#EAB308;"><b>DSLM-LST (Llama-3.1-8B)</b></th>
     </tr>
   </thead>
   <tbody>
 ## Example Outputs
 <p align="center">
+  <img src="assets/banner.png" alt="DSLM-LST-9B vs Qwen3.5-9B on a Korean KMMLU prompt: base leaks 9 Chinese tokens (伊利石, кaо린, 的), DSLM-LST-9B emits 0 Chinese tokens." width="640" />
 </p>
 Asked in Korean about the most common clay mineral on the Korean
 Peninsula, Qwen3.5-9B leaks 9 Chinese / mixed-script tokens (`伊利石`,
+`кao린`, `的`) into its answer. DSLM-LST-9B answers the same prompt
 entirely in Korean (0 Chinese tokens).
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>才开始</code>)</th>
+      <th style="color:#EAB308;">DSLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>积压</code>)</th>
+      <th style="color:#EAB308;">DSLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
   <thead>
     <tr>
       <th>Qwen3.5-9B (leaks <code>享有的</code>)</th>
+      <th style="color:#EAB308;">DSLM-LST-9B (clean Korean)</th>
     </tr>
   </thead>
   <tbody>
 ### Cross-lingual Selectivity
+When the user **explicitly asks for Chinese**, DSLM-LST-9B readily
 produces it. The previous examples showed the model *avoiding* unwanted
 Chinese inside an otherwise-Korean answer; the example below shows it
 emitting Chinese fluently when the user's instruction calls for it.
 **Prompt:** 피보나치 수열의 n번째 항을 반환하는 파이썬 함수를 작성해주세요. 설명은 중국어로 해주세요.
+**DSLM-LST-9B (code in Python, explanation in Chinese):**
 ```
 다음은 파이썬을 사용하여 피보나치 수열의 n 번째 항을 계산하는 함수입니다.
 Qwen3.5-9B's `<think>` block leaks Chinese even more severely than its
 final answer, often slipping into Chinese once the reasoning gets stuck.
+DSLM-LST-9B suppresses that leakage inside the thinking block too.
 **Prompt:** 업무 협조 요청을 받은 기관이 협조 요청 문서에 흠이 있음을 발견한 때에는 접수한 날부터 몇 일 이내에 보완을 요구하여야 하는가? (사무관리규정 개정으로 제외된 문제입니다. 정답은 3번 입니다.)
     <tr>
       <th>Metric</th>
       <th>Qwen3.5-9B</th>
+      <th style="color:#EAB308;">DSLM-LST-9B</th>
     </tr>
   </thead>
   <tbody>
 </table>
 In the base model's trace, every cycle ends with `(Wait, I need to write in Korean). Okay, I will write in Korean.` — yet the very next token is Chinese again, and the trace slides right back into the same fragment.
+This loop fires **484 times** before the token budget runs out. DSLM-LST-9B targets exactly this failure:
 Chinese tokens being chosen even right after the model says they should not be.
+On the same prompt, DSLM-LST-9B's `<think>` block contains **0 Chinese characters** and terminates naturally,
 and the final user-facing answer is in clean Korean.