Model Card for Model ID

This model is trined based on artificially generated dataset based on walmart data(available in https://drive.google.com/file/d/12_az-c5XyBStjJdrX7lRMceODJqUveXd/view?usp=drive_link) It returns JSON schema having date time, keywords and values

Model Details

Base model: 'Qwen/Qwen2.5-0.5B-Instruct'

Lora config : LoraConfig(
    r=16,                   # the rank of the adapter, the lower the fewer parameters you'll need to train
    lora_alpha=32,         # multiplier, usually 2*r
    bias="none",           # BEWARE: training biases *modifies* base model's behavior
    lora_dropout=0.05,
    task_type="QUESTION_ANS",
    target_modules='all-linear',
)

Dataset Link : https://drive.google.com/file/d/12_az-c5XyBStjJdrX7lRMceODJqUveXd/view?usp=sharing (It is also available in web)

Number of artificially created data sample in dataset : 22000

Checkpoint shared :10000

GPU: NVIDIA GeForce RTX 3070 Laptop GPU

Model Description

Training Time : 2.3 hours

Epoch : 01

Model Sources [optional]

Uses

Examlpe Inference:

from transformers import pipeline

QS = "I need colors,sizes and cutomer riviews for product id 100986 for the date between 01/02/2020 to 11/11/2025 and make a comparative summary"

try:
    print("Query :\n", QS)

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": QS},
    ]
    outputs = pipe(messages, max_new_tokens=2024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95, num_beams=1)
    print("\nAssistant answer :: \n", outputs[0]["generated_text"][-1]['content'])

    res = outputs[0]["generated_text"][-1]['content']
    actual_res = res.split("Response:")[-1].split("can be mapped to")
    for r in actual_res:
        print("\n reasoning map >> ", r.split("::")[0])

    # status, json_obj = json_match(res=res)
    # print("\n\n JSON SCHEMA \n", json_obj)
    JS = res.split("JSON response will be ")[-1]
    JS = '"'+JS+'"'
    new_json = json.loads(JS)
    print("JSON RES :", new_json)
except:
        traceback.print_exc()

LLM Response

Since user query includes some parameters as following product id:100986 can be mapped to product_id:100986:: user wants to know some values as following ['colors','sizes', 'cutomer reviews'].
User also mentioned the start time 01/02/2020 and end time 11/11/2025 so start_time and end_time will be included in JSON response.
and corresponding JSON response will be {'params': [{'product_id': '100986'}],'start_time': '01/02/2020', 'end_time': '11/11/2025', 'variable': ['colors','sizes', 'cutomer reviews']}

Filtered JSON

{'params': [{'product_id': '100986'}],'start_time': '01/02/2020', 'end_time': '11/11/2025', 'variable': ['colors','sizes', 'cutomer reviews']}

Model is trained for parameters and values given below

parameters: (To be used to filter DB to have desired variables/values)

- 'timestamp'
- 'brand' 
- 'product_id' 
- 'product_name' 
- 'category_name' 
- 'free_returns' 
- 'final_price' 
- 'sku'
- 'rating'
- 'discount' 
- 'unit_price' 
- 'initial_price

Variable/values : (Desired variables against given)

- 'specifications'
- 'currency'
- 'top reviews'
- 'rating stars'
- 'review count'
- 'description'
- 'customer reviews'
- 'final price'
- 'sku'
- 'rating'
- ' sizes'
- 'discount'
- 'rating'
- 'unit price'
- 'colors'
- 'initial price'

Prompt guideline

Your prompt may have 3 parts in general

1.Desired variables (Ex: I need colors,sizes and cutomer reviews)
2.Given parameters ( Ex : for product id 100986 and brand ZARA)
3.Date (Ex : between 01/02/2020 to 11/11/2025)
4.Your Extended prompt

You can shuffle the order as your wish You can also give partial prompt

Example:

prompt = "provide available data for brand ZARA and rating 4.5"

LLM answer:

" Since user query includes some parameters as following brand:ZARA can be mapped to brand:ZARA::rating:4.5 can be mapped to rating:4.5::
Corresponding JSON response will be {'params': [{'brand': 'ZARA', 'rating': '4.5'}]}"

Fintered JSON: 
{'params': [{'brand': 'ZARA', 'rating': '4.5'}]}

Out-of-Scope Use

The model is partially trained to discriminate out domain prompt

Example

prompt = "can you tell me something about JAPAN?"

LLM answer:
" This conversation is out of scope for this LLM and corresponding JSON response will be {'topic': 'out of scope'}"

Bias, Risks, and Limitations

This model is fine tuned for research purposes and not guaranteed for production use.

If you desire to have only single value then use a comma(",")

Example:

prompt = "get all unit price, for product id 002322827"
 
LLM answer:

"Since user query includes some parameters as following product id:002322827 can be mapped to product_id:002322827::
user wants to know some values as following ['unit price'].
Corresponding JSON response will be {'params': [{'product_id': '002322827'}], 'variable': ['unit price']}"

Model is somewhat biased to following starting words for quering values

- "give me report on"
- "report on"
- "I need"
- "get"
- "I want"
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Munnafaisal/walmart_entitty_extractor_for_DB_query

Base model

Qwen/Qwen2.5-0.5B
Adapter
(408)
this model