Instructions to use CallMeDaniel/Llama-2-7b-chat-hf_vn with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CallMeDaniel/Llama-2-7b-chat-hf_vn with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CallMeDaniel/Llama-2-7b-chat-hf_vn")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CallMeDaniel/Llama-2-7b-chat-hf_vn")
model = AutoModelForCausalLM.from_pretrained("CallMeDaniel/Llama-2-7b-chat-hf_vn")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use CallMeDaniel/Llama-2-7b-chat-hf_vn with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CallMeDaniel/Llama-2-7b-chat-hf_vn"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallMeDaniel/Llama-2-7b-chat-hf_vn",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CallMeDaniel/Llama-2-7b-chat-hf_vn

SGLang

How to use CallMeDaniel/Llama-2-7b-chat-hf_vn with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CallMeDaniel/Llama-2-7b-chat-hf_vn" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallMeDaniel/Llama-2-7b-chat-hf_vn",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CallMeDaniel/Llama-2-7b-chat-hf_vn" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallMeDaniel/Llama-2-7b-chat-hf_vn",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CallMeDaniel/Llama-2-7b-chat-hf_vn with Docker Model Runner:
```
docker model run hf.co/CallMeDaniel/Llama-2-7b-chat-hf_vn
```

Vietnamese Fine-tuned Llama-2-7b-chat-hf

This repository contains a Vietnamese-tuned version of the Llama-2-7b-chat-hf model, which has been fine-tuned on Vietnamese datasets using LoRA (Low-Rank Adaptation) techniques.

Model Details

This model is a fine-tuned version of the Llama-2-7b-chat-hf model, specifically adapted for improved performance on Vietnamese language tasks. It uses LoRA fine-tuning to efficiently adapt the large language model to Vietnamese data while maintaining much of the original model's general knowledge and capabilities.

Model Description

Developed by: Daniel Du
Model type: Large Language Model
Language(s) (NLP): Vietnamese
License: [More Information Needed]
Finetuned from model [optional]: meta-llama/Llama-2-7b-chat-hf
Language: Vietnamese

Direct Use

You can use this model directly with the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Load the LoRA configuration and model
peft_model_id = "CallMeMrFern/Llama-2-7b-chat-hf_vn"
config = PeftConfig.from_pretrained(peft_model_id)
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Example usage
input_text = "Xin chào, hôm nay thời tiết thế nào?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

This model is specifically fine-tuned for Vietnamese and may not perform as well on other languages.
The model inherits limitations from the base Llama-2-7b-chat-hf model.
Performance may vary depending on the specific task and domain.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

Dataset: alpaca_translate_GPT_35_10_20k.json (Vietnamese translation of the Alpaca dataset)

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Model Architecture and Objective

[More Information Needed]

Citation

If you use this model in your research, please cite:

@misc{vietnamese_llama2_7b_chat,
  author = {[Your Name]},
  title = {Vietnamese Fine-tuned Llama-2-7b-chat-hf},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://huggingface.co/CallMeMrFern/Llama-2-7b-chat-hf_vn}}
}

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: True
load_in_4bit: False
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: fp4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float32

Framework versions

PEFT 0.6.3.dev0

Model Description

Fine-tuning Details

Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Config:
- Target Modules: ["q_proj", "v_proj"]
- Precision: 8-bit
Dataset: alpaca_translate_GPT_35_10_20k.json (Vietnamese translation of the Alpaca dataset)

Training Procedure

The model was fine-tuned using the following command:

python finetune/lora.py \
--base_model meta-llama/Llama-2-7b-chat-hf \
--model_type llama \
--data_dir data/general/alpaca_translate_GPT_35_10_20k.json \
--output_dir finetuned/meta-llama/Llama-2-7b-chat-hf \
--lora_target_modules '["q_proj", "v_proj"]' \
--micro_batch_size 1

For multi-GPU training, a distributed training approach was used.

Evaluation Results

[Include any evaluation results, perplexity scores, or benchmark performances here]

Acknowledgements

This project is part of the TF07 Course offered by ProtonX.
We thank the creators of the original Llama-2-7b-chat-hf model and the Hugging Face team for their tools and resources.
Appreciation to VietnamAIHub/Vietnamese_LLMs for the translated dataset.

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

F32

F16

Model tree for CallMeDaniel/Llama-2-7b-chat-hf_vn

Base model

meta-llama/Llama-2-7b-chat-hf

Quantized

(101)

this model

Paper for CallMeDaniel/Llama-2-7b-chat-hf_vn

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 58