Instructions to use ngbaoan/intent-banking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ngbaoan/intent-banking with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-7b-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "ngbaoan/intent-banking") - Transformers
How to use ngbaoan/intent-banking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ngbaoan/intent-banking")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ngbaoan/intent-banking", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ngbaoan/intent-banking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ngbaoan/intent-banking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ngbaoan/intent-banking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ngbaoan/intent-banking
- SGLang
How to use ngbaoan/intent-banking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ngbaoan/intent-banking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ngbaoan/intent-banking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ngbaoan/intent-banking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ngbaoan/intent-banking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use ngbaoan/intent-banking with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ngbaoan/intent-banking to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ngbaoan/intent-banking to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ngbaoan/intent-banking to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ngbaoan/intent-banking", max_seq_length=2048, ) - Docker Model Runner
How to use ngbaoan/intent-banking with Docker Model Runner:
docker model run hf.co/ngbaoan/intent-banking
| base_model: unsloth/qwen2.5-7b-unsloth-bnb-4bit | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| tags: | |
| - base_model:adapter:unsloth/qwen2.5-7b-unsloth-bnb-4bit | |
| - lora | |
| - sft | |
| - transformers | |
| - trl | |
| - unsloth | |
| - intent-classification | |
| - banking77 | |
| # Qwen2.5-7B Banking Intent Classification | |
| This is a LoRA adapter fine-tuned on the **BANKING77** dataset to perform fine-grained intent classification in the banking domain. The model is based on `unsloth/Qwen2.5-7B` and trained using the [Unsloth](https://github.com/unslothai/unsloth) library for highly efficient training. | |
| ## Model Details | |
| - **Model Type:** Causal Language Model with LoRA adapter | |
| - **Developer:** ngbaoan | |
| - **Base Model:** `unsloth/qwen2.5-7b-unsloth-bnb-4bit` | |
| - **Language:** English | |
| - **Task:** Intent Classification | |
| - **Dataset:** [BANKING77](https://huggingface.co/datasets/banking77) (77 distinct banking-related intents) | |
| ## Performance | |
| The model was evaluated on the test set and achieved the following results: | |
| - **Accuracy:** **92.29%** (0.9229) | |
| - **Macro F1-Score:** 0.85 | |
| - **Weighted F1-Score:** 0.92 | |
| *(Note: Some labels in the dataset subset might have 0 support, which affects the macro average. For supported intents, the F1 score ranges from 0.80 to 1.00).* | |
| ## Intended Use | |
| This model is designed to classify user queries related to banking operations (e.g., card activation, lost cards, top-up failures, exchange rates, etc.) into one of 77 specific intents. | |
| **Example Input:** | |
| > "I tried to top up my account using a card but it failed, what should I do?" | |
| **Example Output:** | |
| > `top_up_failed` | |
| ## Training Details | |
| The model was fine-tuned efficiently using Unsloth with 4-bit quantization and LoRA. | |
| ### Training Hyperparameters | |
| - **LoRA Rank (r):** 64 | |
| - **LoRA Alpha:** 64 | |
| - **Batch Size:** 2 (per device) | |
| - **Gradient Accumulation Steps:** 4 | |
| - **Learning Rate:** 5.0e-5 | |
| - **Optimizer:** `adamw_8bit` | |
| - **LR Scheduler:** `cosine` | |
| - **Warmup Steps:** 20 | |
| - **Weight Decay:** 0.01 | |
| - **Epochs:** 6 | |
| - **Max Sequence Length:** 512 | |
| ## How to Get Started with the Model | |
| Since this is a LoRA adapter, you need to load the base model and then apply these PEFT weights. The easiest way is using the `unsloth` library or standard `transformers`. | |
| ```python | |
| from unsloth import FastLanguageModel | |
| import torch | |
| max_seq_length = 512 | |
| # 1. Load the model and tokenizer | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name = "ngbaoan/intent-banking", # Your Hugging Face repo | |
| max_seq_length = max_seq_length, | |
| dtype = None, | |
| load_in_4bit = True, | |
| ) | |
| FastLanguageModel.for_inference(model) | |
| # 2. Format your prompt | |
| prompt = """Instruct: Classify the following banking query into the correct intent. | |
| Query: I lost my card yesterday and I need a replacement. | |
| Intent: """ | |
| inputs = tokenizer([prompt], return_tensors = "pt").to("cuda") | |
| # 3. Generate the response | |
| outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True) | |
| print(tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]) | |
| ``` | |
| ## Framework Versions | |
| - PEFT 0.18.1 | |
| - Transformers | |
| - Unsloth | |
| - TRL |