Instructions to use wannaphong/numfalm-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wannaphong/numfalm-3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="wannaphong/numfalm-3b")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("wannaphong/numfalm-3b")
model = AutoModelForCausalLM.from_pretrained("wannaphong/numfalm-3b", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use wannaphong/numfalm-3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "wannaphong/numfalm-3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wannaphong/numfalm-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/wannaphong/numfalm-3b

SGLang

How to use wannaphong/numfalm-3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "wannaphong/numfalm-3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wannaphong/numfalm-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "wannaphong/numfalm-3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wannaphong/numfalm-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use wannaphong/numfalm-3b with Docker Model Runner:
```
docker model run hf.co/wannaphong/numfalm-3b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

NumFaLM 3B

NumFaLM 3B is a bilingual language model trained in Thai and English. The architecture model is Llama model that pretraining from scratch. It was built to open source AI and research for bilingual language models and improve small language models. We released the training script and train datasets so you can research the training and datasets.

GitHub: https://github.com/wannaphong/NumFaLM
Training script: https://github.com/wannaphong/EasyLM/tree/numfa_pretraining
Train Datasets: wannaphong/mark13

We fork EasyLM and added training by HuggingFace datasets, but HuggingFace was down many times during the time we trained the model, so we can train just one epoch. The model trained one epoch.

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). We use TPU4-64 for training model about 4 days / 1 epoch.

Thank you TPU Research Cloud and EasyLM project! We use EasyLM for pretraining model.

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

F16

Model tree for wannaphong/numfalm-3b

Quantizations

1 model

wannaphong
/

numfalm-3b

NumFaLM 3B

Acknowledgements

Model tree for wannaphong/numfalm-3b

Dataset used to train wannaphong/numfalm-3b