Instructions to use data-std/qwen3-4b-wiki-filter-28k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use data-std/qwen3-4b-wiki-filter-28k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="data-std/qwen3-4b-wiki-filter-28k")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("data-std/qwen3-4b-wiki-filter-28k")
model = AutoModelForCausalLM.from_pretrained("data-std/qwen3-4b-wiki-filter-28k")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use data-std/qwen3-4b-wiki-filter-28k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "data-std/qwen3-4b-wiki-filter-28k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "data-std/qwen3-4b-wiki-filter-28k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/data-std/qwen3-4b-wiki-filter-28k

SGLang

How to use data-std/qwen3-4b-wiki-filter-28k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "data-std/qwen3-4b-wiki-filter-28k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "data-std/qwen3-4b-wiki-filter-28k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "data-std/qwen3-4b-wiki-filter-28k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "data-std/qwen3-4b-wiki-filter-28k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use data-std/qwen3-4b-wiki-filter-28k with Docker Model Runner:
```
docker model run hf.co/data-std/qwen3-4b-wiki-filter-28k
```

language: - vi - en license: apache-2.0 base_model: Qwen/Qwen3-4B-Base tags: - qwen3 - causal-lm - vietnamese - continuous-pretraining - unsloth datasets: - data-std/vi-text-corpus pipeline_tag: text-generation

Qwen3-4B Vietnamese Continued Pre-trained Model

This model is a continued pre-training version of Qwen/Qwen3-4B-Base on Vietnamese text corpus, optimized using Unsloth for efficient training.

Model Details

Model Description

Base Model: Qwen/Qwen3-4B-Base
Model Type: Causal Language Model (Decoder-only Transformer)
Language(s): Vietnamese (primary), English (inherited from base)
Training Method: Continued Pre-Training (CPT) with Unsloth optimization
Parameters: ~4 Billion
Context Length: 4096 tokens
License: Apache 2.0

Training Data

The model was trained on:

Dataset: data-std/vi-text-corpus
Subset: filter-by-ppl-and-length (filtered for quality by perplexity and length)
Language: Vietnamese text corpus
Processing: Automatic EOS token appending

Training Details

Training Configuration

Parameter	Value
Base Model	unsloth/Qwen3-4B-Base
Max Sequence Length	4096 tokens
Training Epochs	1
Batch Size (per device)	2
Gradient Accumulation Steps	8
Effective Batch Size	16
Learning Rate	2e-5
Optimizer	AdamW (torch)
Weight Decay	0.01
LR Scheduler	Cosine
Warmup Steps	10
Warmup Ratio	0.03
Precision	BF16 (if supported) / FP16
Seed	3407

Training Framework

Framework: Unsloth + Hugging Face Transformers
Optimization: Full fine-tuning (all parameters trainable)
Checkpointing: Every 100 steps, keeping 1 checkpoint
Hardware: CUDA-enabled GPU

Training Methodology

This model uses Continued Pre-Training (CPT) to adapt the Qwen3-4B-Base model to Vietnamese language:

Trained on next-token prediction objective
Uses DataCollatorForLanguageModeling for causal LM
Maintains the original model architecture
Enhanced Vietnamese language understanding while preserving multilingual capabilities

Usage

Requirements

pip install transformers torch accelerate

Basic Text Generation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "data-std/qwen3-4b-wiki-filter-28k"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Use torch.float16 if BF16 not supported
)

# Generate text
prompt = "Việt Nam là một quốc gia"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1,
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Chat/Instruction Format

For instruction-following tasks, you may need additional fine-tuning. Here's a basic template:

def format_instruction(instruction, context=""):
    if context:
        prompt = f"### Instruction:\n{instruction}\n\n### Context:\n{context}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
    return prompt

instruction = "Giải thích về lịch sử Việt Nam"
prompt = format_instruction(instruction)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using with Unsloth (for further fine-tuning)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="data-std/qwen3-4b-wiki-filter-28k",
    max_seq_length=4096,
    dtype=None,  # Auto-detect
    load_in_4bit=True,  # Use 4-bit quantization for memory efficiency
)

# Continue training or perform inference

Quantization for Lower Memory Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# 4-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "data-std/qwen3-4b-wiki-filter-28k",
    quantization_config=quantization_config,
    device_map="auto",
)

Performance

Hardware Requirements

Precision	VRAM Required	Inference Speed
FP32	~16 GB	Baseline
FP16/BF16	~8 GB	2x faster
4-bit	~3-4 GB	Slightly slower, very memory efficient

Recommended Use Cases

✅ Vietnamese text generation
✅ Vietnamese language understanding
✅ Content creation in Vietnamese
✅ Further fine-tuning for downstream tasks
✅ Research on Vietnamese NLP
⚠️ Instruction-following (may need additional fine-tuning)
⚠️ Multi-turn conversation (may need additional fine-tuning)

Limitations

Training Data: The model's knowledge is limited to the Vietnamese corpus used during continued pre-training
Not Instruction-Tuned: This is a base model continued pre-trained on Vietnamese text. For instruction-following capabilities, additional supervised fine-tuning (SFT) is recommended
Potential Biases: May reflect biases present in the training data
Language: While enhanced for Vietnamese, performance may vary across different Vietnamese dialects and domains
Generation Quality: May produce repetitive or inconsistent outputs without proper generation parameters

Ethical Considerations

This model should not be used for generating harmful, misleading, or discriminatory content
Users should verify generated content for factual accuracy
The model may generate biased content reflecting biases in training data
Not suitable for high-stakes decision-making without human oversight

Acknowledgements

Base Model: Qwen Team for Qwen3-4B-Base
Training Framework: Unsloth AI for efficient training
Dataset: Vietnamese text corpus from data-std/vi-text-corpus
Infrastructure: Trained using CUDA-enabled GPUs

Contact

For questions, issues, or collaborations, please open an issue on the model repository or contact the maintainers.

Model Card Authors

Data Standard Team

Model Card Contact

[Your contact information or repository issues page]

License: Apache 2.0

Intended Use: Research and development of Vietnamese NLP applications

Out-of-Scope Use: Generating harmful content, impersonation, high-stakes decisions without human oversight

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16