| | --- |
| | base_model: unsloth/phi-4-unsloth-bnb-4bit |
| | tags: |
| | - text-generation-inference |
| | - transformers |
| | - unsloth |
| | - llama |
| | - trl |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - vi |
| | datasets: |
| | - 5CD-AI/Vietnamese-meta-math-MetaMathQA-40K-gg-translated |
| | --- |
| | |
| | # Uploaded model |
| |
|
| | - **Developed by:** vankha |
| | - **License:** apache-2.0 |
| | - **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit |
| |
|
| | This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
| |
|
| | [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
| |
|
| | # How to using model |
| |
|
| | ## Setup Library |
| | ```bash |
| | !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
| | !pip install --no-deps xformers="trl<0.9.0" peft accelerate bitsandbytes |
| | ``` |
| |
|
| | ## 🚀 Load the refined 4-bit model with Unsloth |
| |
|
| | ```python |
| | from unsloth import FastLanguageModel |
| | |
| | model, tokenizer = FastLanguageModel.from_pretrained( |
| | model_name = "vankha/vietnamese-phi-4-reasoning", |
| | max_seq_length = 2048, |
| | load_in_4bit = True, |
| | ) |
| | ``` |
| |
|
| | ## 🧠 Send reminder for model inference |
| |
|
| | ```python |
| | messages = [ |
| | { |
| | "role": "user", |
| | "content": "Allen và Ben đang sơn hàng rào. Tỷ lệ giữa số lượng công việc Allen làm và số lượng công việc Ben làm là $3:5$. Nếu hàng rào cần sơn tổng cộng X feet vuông thì Ben sơn 150 feet vuông. Giá trị của biến X chưa biết là bao nhiêu?" |
| | } |
| | ] |
| | |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True, # Required for the model to generate responses |
| | enable_thinking=True, # Can enable or disable "think" feature |
| | ) |
| | |
| | from transformers import TextStreamer |
| | |
| | _ = model.generate( |
| | **tokenizer(text, return_tensors="pt").to("cuda"), |
| | max_new_tokens=1024, |
| | temperature=0.6, |
| | top_p=0.95, |
| | top_k=20, |
| | streamer=TextStreamer(tokenizer, skip_prompt=True), |
| | ) |
| | ``` |
| |
|
| |
|