|
|
--- |
|
|
base_model: unsloth/Llama-3.2-3B-Instruct |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- llama |
|
|
- trl |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
- **Developed by:** betterdataai |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct |
|
|
|
|
|
|
|
|
## Prerequisite |
|
|
Following packages are needed to do the inference |
|
|
``` |
|
|
unsloth |
|
|
transformers |
|
|
pandas |
|
|
datasets |
|
|
trl |
|
|
torch |
|
|
accelerate |
|
|
scipy |
|
|
``` |
|
|
## Model Demonstration |
|
|
|
|
|
This is a large tabular model that can generate tabular data according to the user's data column description. |
|
|
|
|
|
The example prompt looks like this: |
|
|
``` |
|
|
instruction = """ |
|
|
You are tasked with generating a synthetic dataset based on the following description. The dataset represents network traffic information. The dataset should include the following columns: |
|
|
|
|
|
- IPV4_SRC_ADDR (String): IPv4 source address, following the standard format (e.g., ""59.166.0.6"", ""149.171.126.0"",""175.45.176.2""). |
|
|
- L4_SRC_PORT (Integer): IPv4 source port number, a value between 1024 and 65535 (e.g., 443). |
|
|
- IPV4_DST_ADDR (String): IPv4 destination address, following the standard format (e.g., ""149.171.126.6""). |
|
|
- L4_DST_PORT (Integer): IPv4 destination port number, a value between 1024 and 65535 (e.g., 80). |
|
|
- PROTOCOL (Integer): IP protocol identifier byte, representing the protocol used (e.g., 6 for TCP or 17 for UDP). |
|
|
- L7_PROTO (Integer): Layer 7 protocol (numeric), indicating the application protocol, ranging from 0 to 249 (e.g., 1 for HTTP, 2 for HTTPS). |
|
|
- IN_BYTES (Integer): Incoming number of bytes, representing the data transferred into the network, ranging from 0 to 10,000,000 (e.g., 1500). |
|
|
- OUT_BYTES (Integer): Outgoing number of bytes, representing the data transferred out of the network, ranging from 0 to 10,000,000 (e.g., 2000). |
|
|
- IN_PKTS (Integer): Incoming number of packets, representing the count of packets entering the network (e.g., 120). |
|
|
- OUT_PKTS (Integer): Outgoing number of packets, representing the count of packets leaving the network (e.g., 110). |
|
|
- TCP_FLAGS (Integer): Cumulative of all TCP flags (e.g., 27, 0, 19, 18 ). |
|
|
- FLOW_DURATION_MILLISECONDS (Integer): Flow duration in milliseconds, indicating how long the flow lasted (e.g., 15000). |
|
|
- Label (Integer): Label for indicating malicious attack or not (e.g., 0 for benign traffic or 1 for attack) |
|
|
""" |
|
|
``` |
|
|
|
|
|
With following code, we can generate tabular data: |
|
|
|
|
|
``` |
|
|
from unsloth import FastLanguageModel |
|
|
from transformers import TextStreamer |
|
|
|
|
|
max_seq_length = 2048 |
|
|
dtype = None |
|
|
load_in_4bit = False |
|
|
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
|
model_name = "betterdataai/large-tabular-model", |
|
|
max_seq_length = max_seq_length, |
|
|
dtype = dtype, |
|
|
load_in_4bit = load_in_4bit, |
|
|
) |
|
|
FastLanguageModel.for_inference(model) |
|
|
|
|
|
messages = [{"role": "system", "content": instruction}, |
|
|
{"role": "user", "content": "Create 20 rows data}}"}] |
|
|
|
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize = True, |
|
|
add_generation_prompt = True, # Must add for generation |
|
|
return_tensors = "pt", |
|
|
).to("cuda") |
|
|
|
|
|
|
|
|
text_streamer = TextStreamer(tokenizer, skip_prompt = True) |
|
|
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, |
|
|
use_cache = True, temperature = 1.5, min_p = 0.1) |
|
|
``` |
|
|
|
|
|
The output looks like this: |
|
|
|
|
|
``` |
|
|
IPV4_SRC_ADDR,L4_SRC_PORT,IPV4_DST_ADDR,L4_DST_PORT,PROTOCOL,L7_PROTO,IN_BYTES,OUT_BYTES,IN_PKTS,OUT_PKTS,TCP_FLAGS,FLOW_DURATION_MILLISECONDS,Label |
|
|
175.45.176.3,65502,149.171.126.11,80,6,7.0,800,1338,10,10,27,1429,0 |
|
|
59.166.0.2,51487,149.171.126.3,80,6,7.0,1580,10168,12,18,27,0,0 |
|
|
59.166.0.0,13943,149.171.126.0,11862,6,36.0,2334,16822,36,38,27,9,0 |
|
|
59.166.0.7,40294,149.171.126.7,21,6,1.0,2934,3740,52,54,27,844,0 |
|
|
59.166.0.9,63416,149.171.126.5,21,6,1.0,2934,3742,52,54,27,0,0 |
|
|
175.45.176.2,0,149.171.126.17,0,45,0.0,200,0,2,0,0,0,1 |
|
|
175.45.176.3,64403,149.171.126.14,179,6,13.0,472,336,10,8,19,538,0 |
|
|
59.166.0.8,39142,149.171.126.3,53,17,5.0,130,162,2,2,0,1,0 |
|
|
59.166.0.3,60342,149.171.126.4,25,6,3.0,37868,3380,54,42,27,35,0 |
|
|
59.166.0.3,40433,149.171.126.5,5190,6,0.0,2158,2464,24,24,27,6,0 |
|
|
59.166.0.0,21116,149.171.126.5,53,17,5.0,130,162,2,2,0,0,0 |
|
|
175.45.176.1,0,149.171.126.17,0,23,0.0,200,0,2,0,0,0,1 |
|
|
59.166.0.5,27940,149.171.126.2,21,6,1.0,2934,3738,52,54,27,4294952,0 |
|
|
59.166.0.2,14905,149.171.126.1,22,6,92.0,3728,5474,32,24,27,0,0 |
|
|
175.45.176.1,0,149.171.126.10,0,33,0.0,200,0,2,0,0,0,1 |
|
|
59.166.0.3,37986,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,4,0 |
|
|
59.166.0.1,49949,149.171.126.7,80,6,7.0,1580,10168,12,18,27,4294952,0 |
|
|
59.166.0.2,51911,149.171.126.6,53,17,0.0,146,178,2,2,0,0,0 |
|
|
59.166.0.1,17727,149.171.126.9,5190,6,0.0,2158,2464,24,24,27,7,0 |
|
|
59.166.0.3,56144,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,0,0<|eot_id|> |
|
|
``` |