Model Card for Model ID
This model is trined based on artificially generated dataset based on walmart data(available in https://drive.google.com/file/d/12_az-c5XyBStjJdrX7lRMceODJqUveXd/view?usp=drive_link) It returns JSON schema having date time, keywords and values
Model Details
Base model: 'Qwen/Qwen2.5-0.5B-Instruct'
Lora config : LoraConfig(
r=16, # the rank of the adapter, the lower the fewer parameters you'll need to train
lora_alpha=32, # multiplier, usually 2*r
bias="none", # BEWARE: training biases *modifies* base model's behavior
lora_dropout=0.05,
task_type="QUESTION_ANS",
target_modules='all-linear',
)
Dataset Link : https://drive.google.com/file/d/12_az-c5XyBStjJdrX7lRMceODJqUveXd/view?usp=sharing (It is also available in web)
Number of artificially created data sample in dataset : 22000
Checkpoint shared :10000
GPU: NVIDIA GeForce RTX 3070 Laptop GPU
Model Description
Training Time : 2.3 hours
Epoch : 01
- Developed by: Faisal Ahmed Siddiqi(Email : ahmedfaisal.fa21@gmail.com, LinkedIn: https://www.linkedin.com/in/faisal-ahmed-siddiqi/ )
- License: MIT
- Finetuned from model [optional]: Qwen/Qwen2.5-0.5B-Instruct
Model Sources [optional]
- Repository: [https://huggingface.co/Munnafaisal/walmart_entitty_extractor_for_DB_query]
- Demo [optional]: [More Information Needed]
Uses
Examlpe Inference:
from transformers import pipeline
QS = "I need colors,sizes and cutomer riviews for product id 100986 for the date between 01/02/2020 to 11/11/2025 and make a comparative summary"
try:
print("Query :\n", QS)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": QS},
]
outputs = pipe(messages, max_new_tokens=2024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95, num_beams=1)
print("\nAssistant answer :: \n", outputs[0]["generated_text"][-1]['content'])
res = outputs[0]["generated_text"][-1]['content']
actual_res = res.split("Response:")[-1].split("can be mapped to")
for r in actual_res:
print("\n reasoning map >> ", r.split("::")[0])
# status, json_obj = json_match(res=res)
# print("\n\n JSON SCHEMA \n", json_obj)
JS = res.split("JSON response will be ")[-1]
JS = '"'+JS+'"'
new_json = json.loads(JS)
print("JSON RES :", new_json)
except:
traceback.print_exc()
LLM Response
Since user query includes some parameters as following product id:100986 can be mapped to product_id:100986:: user wants to know some values as following ['colors','sizes', 'cutomer reviews'].
User also mentioned the start time 01/02/2020 and end time 11/11/2025 so start_time and end_time will be included in JSON response.
and corresponding JSON response will be {'params': [{'product_id': '100986'}],'start_time': '01/02/2020', 'end_time': '11/11/2025', 'variable': ['colors','sizes', 'cutomer reviews']}
Filtered JSON
{'params': [{'product_id': '100986'}],'start_time': '01/02/2020', 'end_time': '11/11/2025', 'variable': ['colors','sizes', 'cutomer reviews']}
Model is trained for parameters and values given below
parameters: (To be used to filter DB to have desired variables/values)
- 'timestamp'
- 'brand'
- 'product_id'
- 'product_name'
- 'category_name'
- 'free_returns'
- 'final_price'
- 'sku'
- 'rating'
- 'discount'
- 'unit_price'
- 'initial_price
Variable/values : (Desired variables against given)
- 'specifications'
- 'currency'
- 'top reviews'
- 'rating stars'
- 'review count'
- 'description'
- 'customer reviews'
- 'final price'
- 'sku'
- 'rating'
- ' sizes'
- 'discount'
- 'rating'
- 'unit price'
- 'colors'
- 'initial price'
Prompt guideline
Your prompt may have 3 parts in general
1.Desired variables (Ex: I need colors,sizes and cutomer reviews)
2.Given parameters ( Ex : for product id 100986 and brand ZARA)
3.Date (Ex : between 01/02/2020 to 11/11/2025)
4.Your Extended prompt
You can shuffle the order as your wish You can also give partial prompt
Example:
prompt = "provide available data for brand ZARA and rating 4.5"
LLM answer:
" Since user query includes some parameters as following brand:ZARA can be mapped to brand:ZARA::rating:4.5 can be mapped to rating:4.5::
Corresponding JSON response will be {'params': [{'brand': 'ZARA', 'rating': '4.5'}]}"
Fintered JSON:
{'params': [{'brand': 'ZARA', 'rating': '4.5'}]}
Out-of-Scope Use
The model is partially trained to discriminate out domain prompt
Example
prompt = "can you tell me something about JAPAN?"
LLM answer:
" This conversation is out of scope for this LLM and corresponding JSON response will be {'topic': 'out of scope'}"
Bias, Risks, and Limitations
This model is fine tuned for research purposes and not guaranteed for production use.
If you desire to have only single value then use a comma(",")
Example:
prompt = "get all unit price, for product id 002322827"
LLM answer:
"Since user query includes some parameters as following product id:002322827 can be mapped to product_id:002322827::
user wants to know some values as following ['unit price'].
Corresponding JSON response will be {'params': [{'product_id': '002322827'}], 'variable': ['unit price']}"
Model is somewhat biased to following starting words for quering values
- "give me report on"
- "report on"
- "I need"
- "get"
- "I want"
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support