Update README.md
Browse filesUpdate description for the model usage
README.md
CHANGED
|
@@ -11,12 +11,97 @@ language:
|
|
| 11 |
- en
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# Uploaded model
|
| 15 |
-
|
| 16 |
- **Developed by:** betterdataai
|
| 17 |
- **License:** apache-2.0
|
| 18 |
- **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct
|
| 19 |
|
| 20 |
-
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- en
|
| 12 |
---
|
| 13 |
|
|
|
|
|
|
|
| 14 |
- **Developed by:** betterdataai
|
| 15 |
- **License:** apache-2.0
|
| 16 |
- **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct
|
| 17 |
|
|
|
|
| 18 |
|
| 19 |
+
## Prerequisite
|
| 20 |
+
Following packages are needed to do the inference
|
| 21 |
+
```
|
| 22 |
+
unsloth
|
| 23 |
+
transformers
|
| 24 |
+
pandas
|
| 25 |
+
datasets
|
| 26 |
+
trl
|
| 27 |
+
torch
|
| 28 |
+
accelerate
|
| 29 |
+
scipy
|
| 30 |
+
```
|
| 31 |
+
## Model Demonstration
|
| 32 |
+
|
| 33 |
+
This is a large tabular model that can generate tabular data according to the user's data column description.
|
| 34 |
+
|
| 35 |
+
The example prompt looks like this:
|
| 36 |
+
```
|
| 37 |
+
instruction = """ You are tasked with generating a synthetic dataset based on the following description. The dataset represents network traffic information. The dataset should include the following columns:
|
| 38 |
+
|
| 39 |
+
- IPV4_SRC_ADDR (String): IPv4 source address, following the standard format (e.g., "59.166.0.6", "149.171.126.0","175.45.176.2").
|
| 40 |
+
- L4_SRC_PORT (Integer): IPv4 source port number, a value between 1024 and 65535 (e.g., 443).
|
| 41 |
+
- IPV4_DST_ADDR (String): IPv4 destination address, following the standard format (e.g., "149.171.126.6").
|
| 42 |
+
- L4_DST_PORT (Integer): IPv4 destination port number, a value between 1024 and 65535 (e.g., 80).
|
| 43 |
+
- PROTOCOL (Integer): IP protocol identifier byte, representing the protocol used (e.g., 6 for TCP or 17 for UDP).
|
| 44 |
+
- L7_PROTO (Integer): Layer 7 protocol (numeric), indicating the application protocol, ranging from 0 to 249 (e.g., 1 for HTTP, 2 for HTTPS).
|
| 45 |
+
"""
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
With following code, we can generate tabular data:
|
| 49 |
+
|
| 50 |
+
```
|
| 51 |
+
from unsloth import FastLanguageModel
|
| 52 |
+
from transformers import TextStreamer
|
| 53 |
+
|
| 54 |
+
max_seq_length = 2048
|
| 55 |
+
dtype = None
|
| 56 |
+
load_in_4bit = False
|
| 57 |
+
|
| 58 |
+
model, tokenizer = FastLanguageModel.from_pretrained(
|
| 59 |
+
model_name = "betterdataai/large-tabular-model",
|
| 60 |
+
max_seq_length = max_seq_length,
|
| 61 |
+
dtype = dtype,
|
| 62 |
+
load_in_4bit = load_in_4bit,
|
| 63 |
+
)
|
| 64 |
+
FastLanguageModel.for_inference(model)
|
| 65 |
+
|
| 66 |
+
messages = [{"role": "system", "content": instruction},
|
| 67 |
+
{"role": "user", "content": "Create 20 rows data}}"}]
|
| 68 |
+
|
| 69 |
+
inputs = tokenizer.apply_chat_template(
|
| 70 |
+
messages,
|
| 71 |
+
tokenize = True,
|
| 72 |
+
add_generation_prompt = True, # Must add for generation
|
| 73 |
+
return_tensors = "pt",
|
| 74 |
+
).to("cuda")
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
|
| 78 |
+
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048,
|
| 79 |
+
use_cache = True, temperature = 1.5, min_p = 0.1)
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
The output looks like this:
|
| 83 |
+
|
| 84 |
+
```
|
| 85 |
+
IPV4_SRC_ADDR,L4_SRC_PORT,IPV4_DST_ADDR,L4_DST_PORT,PROTOCOL,L7_PROTO,IN_BYTES,OUT_BYTES,IN_PKTS,OUT_PKTS,TCP_FLAGS,FLOW_DURATION_MILLISECONDS,Label
|
| 86 |
+
175.45.176.3,65502,149.171.126.11,80,6,7.0,800,1338,10,10,27,1429,0
|
| 87 |
+
59.166.0.2,51487,149.171.126.3,80,6,7.0,1580,10168,12,18,27,0,0
|
| 88 |
+
59.166.0.0,13943,149.171.126.0,11862,6,36.0,2334,16822,36,38,27,9,0
|
| 89 |
+
59.166.0.7,40294,149.171.126.7,21,6,1.0,2934,3740,52,54,27,844,0
|
| 90 |
+
59.166.0.9,63416,149.171.126.5,21,6,1.0,2934,3742,52,54,27,0,0
|
| 91 |
+
175.45.176.2,0,149.171.126.17,0,45,0.0,200,0,2,0,0,0,1
|
| 92 |
+
175.45.176.3,64403,149.171.126.14,179,6,13.0,472,336,10,8,19,538,0
|
| 93 |
+
59.166.0.8,39142,149.171.126.3,53,17,5.0,130,162,2,2,0,1,0
|
| 94 |
+
59.166.0.3,60342,149.171.126.4,25,6,3.0,37868,3380,54,42,27,35,0
|
| 95 |
+
59.166.0.3,40433,149.171.126.5,5190,6,0.0,2158,2464,24,24,27,6,0
|
| 96 |
+
59.166.0.0,21116,149.171.126.5,53,17,5.0,130,162,2,2,0,0,0
|
| 97 |
+
175.45.176.1,0,149.171.126.17,0,23,0.0,200,0,2,0,0,0,1
|
| 98 |
+
59.166.0.5,27940,149.171.126.2,21,6,1.0,2934,3738,52,54,27,4294952,0
|
| 99 |
+
59.166.0.2,14905,149.171.126.1,22,6,92.0,3728,5474,32,24,27,0,0
|
| 100 |
+
175.45.176.1,0,149.171.126.10,0,33,0.0,200,0,2,0,0,0,1
|
| 101 |
+
59.166.0.3,37986,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,4,0
|
| 102 |
+
59.166.0.1,49949,149.171.126.7,80,6,7.0,1580,10168,12,18,27,4294952,0
|
| 103 |
+
59.166.0.2,51911,149.171.126.6,53,17,0.0,146,178,2,2,0,0,0
|
| 104 |
+
59.166.0.1,17727,149.171.126.9,5190,6,0.0,2158,2464,24,24,27,7,0
|
| 105 |
+
59.166.0.3,56144,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,0,0<|eot_id|>
|
| 106 |
+
```
|
| 107 |
+
|