Instructions to use fapasw/Insur_Qns_Responsew with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fapasw/Insur_Qns_Responsew with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="fapasw/Insur_Qns_Responsew")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fapasw/Insur_Qns_Responsew")
model = AutoModelForCausalLM.from_pretrained("fapasw/Insur_Qns_Responsew")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use fapasw/Insur_Qns_Responsew with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fapasw/Insur_Qns_Responsew"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fapasw/Insur_Qns_Responsew",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/fapasw/Insur_Qns_Responsew

SGLang

How to use fapasw/Insur_Qns_Responsew with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fapasw/Insur_Qns_Responsew" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fapasw/Insur_Qns_Responsew",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fapasw/Insur_Qns_Responsew" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fapasw/Insur_Qns_Responsew",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use fapasw/Insur_Qns_Responsew with Docker Model Runner:
```
docker model run hf.co/fapasw/Insur_Qns_Responsew
```

Model Card for Insur_Qns_Responsew

Introduction

The objective is to train an LLM to answer insurance questions. This LLM will be able to generate a text that answers general questions about insurance when queried with relevant questions. The motivation is that many people have insurance related questions and rely on their agents for answers. Some of the answers are not of high quality because the agent does not have enough knowledge or has a vested interest. The development of a good LLM to answer insurance related questions can be of good use to consumers, agents, customer service reps and other players in the insurance industry. Currently there isn’t an LLM that does well at answering insurance

Training Data

The model was trained on a 750 question and answer set from the Insurance QA corpus (“https://github.com/shuzi/insuranceQA”). This corpus contains questions and answers collected from the website Insurance Library (“https://www.insurancelibrary.com/”). This consists of questions from real world users, the answers with high quality were composed by professionals with deep domain knowledge. A training to validation split of 80% (600) to 20% (150) split was used.

Training Method

The training method I implemented is Low Rank Adaptation (LoRA).
LoRA has faster training times than fine tuning and many other approaches. It doesn't use too many memory resources. LoRA enables tuning without making substantial changes to the pretrained model. In some research reviewed, LoRA achieved better results than fine-tuning, and required much fewer parameters to train. I compared LoRA versus various versions of prompt tuning. The output from LoRA gave better results than prompt tuning. GPT2, Llama-3.2-1B and Gemma-2-2b-it were considered as starting base models. Based on some evaluation, Llama-3.2-1B was chosen.

Data Sample

The hyperparameter values used:

lora_r = 64
lora_alpha = 64
lora_dropout = .05
bias="none"
task_type="CAUSAL_LM"
learning_rate=0.00001
epochs=2
eval_steps=50

Different epochs and eval_steps were tested. The values shown above gave the best values.

Evaluation

I had four benchmarks: gsm8k, mmlu, race and percent correct response of a small sample.

		Base	Prompt Tuning	LORA
gsm8k_cot	exact_match,strict-match	0.0400	0.0000	0.0333
gsm8k_cot	exact_match_stderr,strict-match	0.0280	0.0000	0.0333
gsm8k_cot	exact_match,flexible-extract	0.0400	0.0200	0.0667
gsm8k_cot	exact_match_stderr,flexible-extract	0.0280	0.0200	0.0463
mmlu	acc,none	0.387018	0.247368	0.387719
mmlu	acc_stderr,none	0.008993	0.008047	0.011674
race	acc,none	0.380000	0.340000	0.366667
race	acc_stderr,none	0.069341	0.067673	0.089486
percent correct	accurate	0.1500	0.2500	0.5000

Testing using the insurance set shows that the model after LoRA provides better answers

Before LoRA
Q: When Can I Switch From Medicare Advantage To Medigap?

A: The decision to switch from Medicare Advantage to Medigap is not an easy one. It involves a series of steps and decisions that will have an impact on your financial situation. This guide will help you understand your options and provide you with some guidance on the decision process. The first step is to assess your financial situation. This includes assessing your income, expenses, and assets. It is important to understand

After LoRA
Q: When Can I Switch From Medicare Advantage To Medigap?

A: You can switch from Medicare Advantage to Medigap at any time. You can also switch from Medicare Advantage to Medicare Part D at any time. If you switch from Medicare Advantage to Medicare Part D, you will have to switch back to Medicare Advantage to get your Medicare Part D coverage again.

Post training the model gives better answers to both the training and validation sample questions tested. The answers before training sometimes are just like rambling without answering the question. The model doesn't really change on the standard benchmarks

Usage and Intended Uses

The model can be used to get better responses to basic insurance related questions. It can also be used for answer selection.

To use the model from the huggingface transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "fapasw/Insur_Qns_Responsew"

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B', device_map="auto", torch_dtype=torch.bfloat16)

Prompt Format

Example user_prompt = f"Q: Who Qualifies For Life Insurance?\n\nA: "

Expected Output Format

inputs = tokenizer.encode(user_prompt, return_tensors="pt").to(model.device)
output = model.generate(inputs, max_new_tokens = 100, pad_token_id = tokenizer.pad_token_id)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The expected output is a text answer

Limitations

The corpus used to train the model was created in 2015.A lot of changes have happened since then. The accuracy after training is average.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Model Card Authors [optional]

Ade Faparusi

Downloads last month: 5

Safetensors

Model size

1B params

Tensor type

BF16