Enhance model card with pipeline tag, library, and paper link

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +32 -15
README.md CHANGED
@@ -3,8 +3,16 @@ license: mit
3
  tags:
4
  - decompile
5
  - binary
 
 
6
  ---
7
 
 
 
 
 
 
 
8
  ### 1. Introduction of LLM4Decompile
9
 
10
  LLM4Decompile aims to decompile x86 assembly instructions into C. The newly released V1.5 series are trained with a larger dataset (15B tokens) and a maximum token length of 4,096, with remarkable performance (up to 100% improvement) compared to the previous model.
@@ -15,12 +23,12 @@ LLM4Decompile aims to decompile x86 assembly instructions into C. The newly rele
15
  ### 2. Evaluation Results
16
 
17
  | Model/Benchmark | HumanEval-Decompile | | | | | ExeBench | | | | |
18
- |:----------------------:|:-------------------:|:-------:|:-------:|:-------:|:-------:|:--------:|:-------:|:-------:|:-------:|:-------:|
19
- | Optimization Level | O0 | O1 | O2 | O3 | AVG | O0 | O1 | O2 | O3 | AVG |
20
- | DeepSeek-Coder-6.7B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0000 |
21
- | GPT-4o | 0.3049 | 0.1159 | 0.1037 | 0.1159 | 0.1601 | 0.0443 | 0.0328 | 0.0397 | 0.0343 | 0.0378 |
22
- | LLM4Decompile-End-1.3B | 0.4720 | 0.2061 | 0.2122 | 0.2024 | 0.2732 | 0.1786 | 0.1362 | 0.1320 | 0.1328 | 0.1449 |
23
- | LLM4Decompile-End-6.7B | 0.6805 | 0.3951 | 0.3671 | 0.3720 | 0.4537 | 0.2289 | 0.1660 | 0.1618 | 0.1625 | 0.1798 |
24
  | LLM4Decompile-End-33B | 0.5168 | 0.2956 | 0.2815 | 0.2675 | 0.3404 | 0.1886 | 0.1465 | 0.1396 | 0.1411 | 0.1540 |
25
 
26
  ### 3. How to Use
@@ -47,9 +55,12 @@ for opt_state in OPT:
47
  asm= f.read()
48
  if '<'+'func0'+'>:' not in asm: #IMPORTANT replace func0 with the function name
49
  raise ValueError("compile fails")
50
- asm = '<'+'func0'+'>:' + asm.split('<'+'func0'+'>:')[-1].split('\n\n')[0] #IMPORTANT replace func0 with the function name
 
 
51
  asm_clean = ""
52
- asm_sp = asm.split("\n")
 
53
  for tmp in asm_sp:
54
  if len(tmp.split("\t"))<3 and '00' in tmp:
55
  continue
@@ -58,10 +69,14 @@ for opt_state in OPT:
58
  )
59
  tmp_asm = "\t".join(tmp.split("\t")[idx:]) # remove the binary code
60
  tmp_asm = tmp_asm.split("#")[0].strip() # remove the comments
61
- asm_clean += tmp_asm + "\n"
 
62
  input_asm = asm_clean.strip()
63
- before = f"# This is the assembly code:\n"#prompt
64
- after = "\n# What is the source code?\n"#prompt
 
 
 
65
  input_asm_prompt = before+input_asm.strip()+after
66
  with open(fileName +'_' + opt_state +'.asm','w',encoding='utf-8') as f:
67
  f.write(input_asm_prompt)
@@ -86,13 +101,15 @@ c_func_decompile = tokenizer.decode(outputs[0][len(inputs[0]):-1])
86
  with open(fileName +'.c','r') as f:#original file
87
  func = f.read()
88
 
89
- print(f'original function:\n{func}')# Note we only decompile one function, where the original file may contain multiple functions
90
- print(f'decompiled function:\n{c_func_decompile}')
 
 
91
  ```
92
 
93
  ### 4. License
94
- This code repository is licensed under the MIT License.
95
 
96
  ### 5. Contact
97
 
98
- If you have any questions, please raise an issue.
 
3
  tags:
4
  - decompile
5
  - binary
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
  ---
9
 
10
+ # LLM4Decompile-6.7b-v1.5
11
+
12
+ This model, `LLM4Decompile-6.7b-v1.5`, is part of the LLM4Decompile family, introduced in the paper [Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation](https://huggingface.co/papers/2505.12668).
13
+
14
+ **[📚 Paper](https://huggingface.co/papers/2505.12668)** | **[💻 Code](https://github.com/albertan017/LLM4Decompile)**
15
+
16
  ### 1. Introduction of LLM4Decompile
17
 
18
  LLM4Decompile aims to decompile x86 assembly instructions into C. The newly released V1.5 series are trained with a larger dataset (15B tokens) and a maximum token length of 4,096, with remarkable performance (up to 100% improvement) compared to the previous model.
 
23
  ### 2. Evaluation Results
24
 
25
  | Model/Benchmark | HumanEval-Decompile | | | | | ExeBench | | | | |
26
+ |:----------------------:|:-------------------:|:-------:|:-------:|:-------:|:-------:|:--------:|:-------:|:-------:|:-------:|:-------:|\
27
+ | Optimization Level | O0 | O1 | O2 | O3 | AVG | O0 | O1 | O2 | O3 | AVG |\
28
+ | DeepSeek-Coder-6.7B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0000 |\
29
+ | GPT-4o | 0.3049 | 0.1159 | 0.1037 | 0.1159 | 0.1601 | 0.0443 | 0.0328 | 0.0397 | 0.0343 | 0.0378 |\
30
+ | LLM4Decompile-End-1.3B | 0.4720 | 0.2061 | 0.2122 | 0.2024 | 0.2732 | 0.1786 | 0.1362 | 0.1320 | 0.1328 | 0.1449 |\
31
+ | LLM4Decompile-End-6.7B | 0.6805 | 0.3951 | 0.3671 | 0.3671 | 0.4537 | 0.2289 | 0.1660 | 0.1618 | 0.1625 | 0.1798 |\
32
  | LLM4Decompile-End-33B | 0.5168 | 0.2956 | 0.2815 | 0.2675 | 0.3404 | 0.1886 | 0.1465 | 0.1396 | 0.1411 | 0.1540 |
33
 
34
  ### 3. How to Use
 
55
  asm= f.read()
56
  if '<'+'func0'+'>:' not in asm: #IMPORTANT replace func0 with the function name
57
  raise ValueError("compile fails")
58
+ asm = '<'+'func0'+'>:' + asm.split('<'+'func0'+'>:')[-1].split('
59
+
60
+ ')[0] #IMPORTANT replace func0 with the function name
61
  asm_clean = ""
62
+ asm_sp = asm.split("
63
+ ")
64
  for tmp in asm_sp:
65
  if len(tmp.split("\t"))<3 and '00' in tmp:
66
  continue
 
69
  )
70
  tmp_asm = "\t".join(tmp.split("\t")[idx:]) # remove the binary code
71
  tmp_asm = tmp_asm.split("#")[0].strip() # remove the comments
72
+ asm_clean += tmp_asm + "
73
+ "
74
  input_asm = asm_clean.strip()
75
+ before = f"# This is the assembly code:
76
+ "#prompt
77
+ after = "
78
+ # What is the source code?
79
+ "#prompt
80
  input_asm_prompt = before+input_asm.strip()+after
81
  with open(fileName +'_' + opt_state +'.asm','w',encoding='utf-8') as f:
82
  f.write(input_asm_prompt)
 
101
  with open(fileName +'.c','r') as f:#original file
102
  func = f.read()
103
 
104
+ print(f'original function:
105
+ {func}')# Note we only decompile one function, where the original file may contain multiple functions
106
+ print(f'decompiled function:
107
+ {c_func_decompile}')
108
  ```
109
 
110
  ### 4. License
111
+ This code repository is licensed under the MIT and DeepSeek License.
112
 
113
  ### 5. Contact
114
 
115
+ If you have any questions, please raise an issue.