betterdataai commited on
Commit
00df947
·
verified ·
1 Parent(s): a04cd57

Update README.md

Browse files

Update description for the model usage

Files changed (1) hide show
  1. README.md +89 -4
README.md CHANGED
@@ -11,12 +11,97 @@ language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
15
-
16
  - **Developed by:** betterdataai
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - en
12
  ---
13
 
 
 
14
  - **Developed by:** betterdataai
15
  - **License:** apache-2.0
16
  - **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct
17
 
 
18
 
19
+ ## Prerequisite
20
+ Following packages are needed to do the inference
21
+ ```
22
+ unsloth
23
+ transformers
24
+ pandas
25
+ datasets
26
+ trl
27
+ torch
28
+ accelerate
29
+ scipy
30
+ ```
31
+ ## Model Demonstration
32
+
33
+ This is a large tabular model that can generate tabular data according to the user's data column description.
34
+
35
+ The example prompt looks like this:
36
+ ```
37
+ instruction = """ You are tasked with generating a synthetic dataset based on the following description. The dataset represents network traffic information. The dataset should include the following columns:
38
+
39
+ - IPV4_SRC_ADDR (String): IPv4 source address, following the standard format (e.g., "59.166.0.6", "149.171.126.0","175.45.176.2").
40
+ - L4_SRC_PORT (Integer): IPv4 source port number, a value between 1024 and 65535 (e.g., 443).
41
+ - IPV4_DST_ADDR (String): IPv4 destination address, following the standard format (e.g., "149.171.126.6").
42
+ - L4_DST_PORT (Integer): IPv4 destination port number, a value between 1024 and 65535 (e.g., 80).
43
+ - PROTOCOL (Integer): IP protocol identifier byte, representing the protocol used (e.g., 6 for TCP or 17 for UDP).
44
+ - L7_PROTO (Integer): Layer 7 protocol (numeric), indicating the application protocol, ranging from 0 to 249 (e.g., 1 for HTTP, 2 for HTTPS).
45
+ """
46
+ ```
47
+
48
+ With following code, we can generate tabular data:
49
+
50
+ ```
51
+ from unsloth import FastLanguageModel
52
+ from transformers import TextStreamer
53
+
54
+ max_seq_length = 2048
55
+ dtype = None
56
+ load_in_4bit = False
57
+
58
+ model, tokenizer = FastLanguageModel.from_pretrained(
59
+ model_name = "betterdataai/large-tabular-model",
60
+ max_seq_length = max_seq_length,
61
+ dtype = dtype,
62
+ load_in_4bit = load_in_4bit,
63
+ )
64
+ FastLanguageModel.for_inference(model)
65
+
66
+ messages = [{"role": "system", "content": instruction},
67
+ {"role": "user", "content": "Create 20 rows data}}"}]
68
+
69
+ inputs = tokenizer.apply_chat_template(
70
+ messages,
71
+ tokenize = True,
72
+ add_generation_prompt = True, # Must add for generation
73
+ return_tensors = "pt",
74
+ ).to("cuda")
75
+
76
+
77
+ text_streamer = TextStreamer(tokenizer, skip_prompt = True)
78
+ _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048,
79
+ use_cache = True, temperature = 1.5, min_p = 0.1)
80
+ ```
81
+
82
+ The output looks like this:
83
+
84
+ ```
85
+ IPV4_SRC_ADDR,L4_SRC_PORT,IPV4_DST_ADDR,L4_DST_PORT,PROTOCOL,L7_PROTO,IN_BYTES,OUT_BYTES,IN_PKTS,OUT_PKTS,TCP_FLAGS,FLOW_DURATION_MILLISECONDS,Label
86
+ 175.45.176.3,65502,149.171.126.11,80,6,7.0,800,1338,10,10,27,1429,0
87
+ 59.166.0.2,51487,149.171.126.3,80,6,7.0,1580,10168,12,18,27,0,0
88
+ 59.166.0.0,13943,149.171.126.0,11862,6,36.0,2334,16822,36,38,27,9,0
89
+ 59.166.0.7,40294,149.171.126.7,21,6,1.0,2934,3740,52,54,27,844,0
90
+ 59.166.0.9,63416,149.171.126.5,21,6,1.0,2934,3742,52,54,27,0,0
91
+ 175.45.176.2,0,149.171.126.17,0,45,0.0,200,0,2,0,0,0,1
92
+ 175.45.176.3,64403,149.171.126.14,179,6,13.0,472,336,10,8,19,538,0
93
+ 59.166.0.8,39142,149.171.126.3,53,17,5.0,130,162,2,2,0,1,0
94
+ 59.166.0.3,60342,149.171.126.4,25,6,3.0,37868,3380,54,42,27,35,0
95
+ 59.166.0.3,40433,149.171.126.5,5190,6,0.0,2158,2464,24,24,27,6,0
96
+ 59.166.0.0,21116,149.171.126.5,53,17,5.0,130,162,2,2,0,0,0
97
+ 175.45.176.1,0,149.171.126.17,0,23,0.0,200,0,2,0,0,0,1
98
+ 59.166.0.5,27940,149.171.126.2,21,6,1.0,2934,3738,52,54,27,4294952,0
99
+ 59.166.0.2,14905,149.171.126.1,22,6,92.0,3728,5474,32,24,27,0,0
100
+ 175.45.176.1,0,149.171.126.10,0,33,0.0,200,0,2,0,0,0,1
101
+ 59.166.0.3,37986,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,4,0
102
+ 59.166.0.1,49949,149.171.126.7,80,6,7.0,1580,10168,12,18,27,4294952,0
103
+ 59.166.0.2,51911,149.171.126.6,53,17,0.0,146,178,2,2,0,0,0
104
+ 59.166.0.1,17727,149.171.126.9,5190,6,0.0,2158,2464,24,24,27,7,0
105
+ 59.166.0.3,56144,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,0,0<|eot_id|>
106
+ ```
107
+