|
|
--- |
|
|
datasets: |
|
|
- zero-systems/ColumnMapping.8k.INSTRUCT |
|
|
--- |
|
|
|
|
|
# StructuredLLM-7b.GGUF |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
**StructuredLLM** models aim to map a target data object consisting of a title and corresponding example values to a set of input titles and their example values. Mapping can be one-to-one or one-to-many. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Model type:** LLM |
|
|
- **Finetuned from model [optional]:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) |
|
|
|
|
|
## Inference |
|
|
|
|
|
`.gguf` models can be inferenced using [llama.cpp](https://github.com/ggerganov/llama.cpp) ([llama-cpp-python](https://github.com/abetlen/llama-cpp-python)). |
|
|
Please follow the instructions within these repos to get started. |
|
|
|
|
|
### Inference Examples |
|
|
|
|
|
<!-- Example prompts or input data that the model is expected to handle, and examples of expected output --> |
|
|
|
|
|
This is an instruct finetune utilizing the Alpaca instruct format (introduced by [stanford-alpaca](https://github.com/tatsu-lab/stanford_alpaca)): |
|
|
|
|
|
```python |
|
|
"{system_prompt}\n\n### Instruction:\n{instruction}\n\n### Response: " |
|
|
``` |
|
|
|
|
|
The model is finetuned to perform the following task: |
|
|
|
|
|
#### Structured Data Mapping |
|
|
|
|
|
Expected input: |
|
|
|
|
|
``` |
|
|
You are a business assistant that specialized in normalizing JSON data into a consistent structure. |
|
|
You will be given tasks such as assessing which keys in an Input-JSON object map to a given Target-JSON. |
|
|
The usecase of the information you respond with will be to carry over and transform data in the direction Input-JSON -> Target-JSON. |
|
|
|
|
|
Keys can be considered valid mappings when either their names indicate the same or a very similar concept, or when their values are a close match in what they represent (salary, dates, IDs, etc.) or have similar formats. Mappings do not need to be an exact match, only have sufficient overlap. |
|
|
For some cases, multiple keys from the Input-JSON might be required to map to the Target-JSON. This is the case if the value in the Target-JSON can only be arrived at due to the information present under multiple Input-JSON keys. |
|
|
|
|
|
### Instruction: |
|
|
Map the following Input-JSON to the given Target-JSON: |
|
|
|
|
|
Input-JSON: |
|
|
{ |
|
|
"FormEntryDate_INT": [ |
|
|
"1615785600", |
|
|
"1615910400", |
|
|
"1616131200" |
|
|
], |
|
|
"SPECIAL INDEX 2": [ |
|
|
"#00010", |
|
|
"#00321", |
|
|
"#00543" |
|
|
], |
|
|
"RET_BEN_VEST_Y5": [ |
|
|
"Fifty thousand, one dollar and eleven cents", |
|
|
"Ninety-nine thousand, nine hundred ninety-nine dollars and ninety-nine cents", |
|
|
"One thousand, two hundred thirty-four dollars and fifty-six cents" |
|
|
], |
|
|
"COM_GOV_GRD": [ |
|
|
"R", |
|
|
"S", |
|
|
"T" |
|
|
], |
|
|
"FSA_Trns_Elct_Sts": [ |
|
|
"ELE", |
|
|
"STAT", |
|
|
"WAIV" |
|
|
], |
|
|
"Variable Pay Structure ID": [ |
|
|
"VP-018/str", |
|
|
"VP-019/str", |
|
|
"VP-020/str" |
|
|
], |
|
|
"ParkingSlotID": [ |
|
|
"P10018", |
|
|
"P10019", |
|
|
"P10020" |
|
|
], |
|
|
"Ds3.DdQr": [ |
|
|
"2,718", |
|
|
"4,057", |
|
|
"3,951" |
|
|
], |
|
|
"CARRIER DEFINED REPORTING 6": [ |
|
|
"CR6Definition", |
|
|
"R6ByCarrier", |
|
|
"CarrierDefinition6" |
|
|
], |
|
|
"EE MaritalStatus": [ |
|
|
"Married but Reserved", |
|
|
"Living Life as a Bachelor", |
|
|
"Can't Be Tamed" |
|
|
], |
|
|
"PredictionMatrix_2030": [ |
|
|
"1.025", |
|
|
"1.242", |
|
|
"0.832" |
|
|
], |
|
|
"PensionFund[9]": [ |
|
|
"PRM-CHC-FND", |
|
|
"ELITE-PEN-SCH", |
|
|
"PLAT-PEN-PLAN" |
|
|
], |
|
|
"TOTAL_AMT": [ |
|
|
"$125.75", |
|
|
"$790.80", |
|
|
"$975.65" |
|
|
], |
|
|
"TrendRate": [ |
|
|
"1.0", |
|
|
"1.0", |
|
|
"1.0" |
|
|
], |
|
|
"YearlyValueChange": [ |
|
|
"0.01", |
|
|
"0.04", |
|
|
"0.01" |
|
|
], |
|
|
"EeEducationLevel": [ |
|
|
"BS Diploma", |
|
|
"MA", |
|
|
"MS Degree" |
|
|
], |
|
|
"retnedEarnsPrcntg": [ |
|
|
"0.90 of Total Earnings", |
|
|
"0.95 of Total Earnings", |
|
|
"1.00 of Total Earnings" |
|
|
] |
|
|
} |
|
|
|
|
|
Target-JSON: |
|
|
{ |
|
|
"PredictionMatrix_2040": [ |
|
|
"0.325", |
|
|
"0.918", |
|
|
"0.752" |
|
|
] |
|
|
} |
|
|
|
|
|
### Response: |
|
|
``` |
|
|
|
|
|
Expected Output: |
|
|
|
|
|
``` |
|
|
{ |
|
|
"reasoning": "1. The target key 'PredictionMatrix_2040' likely refers to a prediction matrix for the year 2040. 2. The values of the target data are floating point numbers, indicating a certain rate or ratio. 3. The 'PredictionMatrix_2030' input key provides similar floating point values, which could be used to calculate or estimate the 'PredictionMatrix_2040' target key. The 'TrendRate' and 'YearlyValueChange' input keys could also potentially factor into calculating the target key's value.", |
|
|
"mapped_input_keys": [ |
|
|
"PredictionMatrix_2030", |
|
|
"TrendRate", |
|
|
"YearlyValueChange" |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
### Training Data |
|
|
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
[zero-systems/ColumnMapping.10k.INSTRUCT](https://huggingface.co/datasets/zero-systems/ColumnMapping.10k.INSTRUCT) |
|
|
|
|
|
#### Training Methodology |
|
|
|
|
|
<!--Summary of methodology used to train model --> |
|
|
StructuredLLM was trained using [QLoRA](https://github.com/artidoro/qlora). |
|
|
Resulting adapter was merged into the base model weights, converted to the `gguf` format and finally quantized to 4 bits. |