lkongam commited on
Commit
c747965
·
verified ·
1 Parent(s): 46c2f22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -20,6 +20,35 @@ import re
20
  from typing import List, Tuple
21
  from string import Template
22
  PROMPT_TEMPLATE = Template('''
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ''')
24
 
25
  class KernelCoder:
 
20
  from typing import List, Tuple
21
  from string import Template
22
  PROMPT_TEMPLATE = Template('''
23
+ You are a Machine Learning Engineer trying to write custom cuda kernels to replace the pytorch operators in the given architecture to get speedups. You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom cuda kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.
24
+
25
+ For [Imports], you will likely need but not limited to the following libraries:
26
+ ```python
27
+ import torch
28
+ import torch.nn as nn
29
+ import torch.nn.functional as F
30
+ import math
31
+ ```
32
+
33
+ Here’s an example to show you the syntax of inline embedding custom operators from the cuda kernel in torch:
34
+
35
+ The pytorch module needed to be optimize is:
36
+ ```python
37
+ $ref_arch_torch
38
+ ```
39
+
40
+ The example new arch with custom cuda kernels looks like this:
41
+ ```python
42
+ $ref_arch_kernel
43
+ ```
44
+
45
+ And the PyTorch code you need to optimize is:
46
+ ```python
47
+ $code
48
+ ```
49
+
50
+ Optimize the architecture named Model with custom cuda kernels! Optimize the architecture named Model with custom cuda kernels! Name your optimized output architecture ModelNew. Output the new code in codeblocks. Please generate real code, NOT pseudocode, make sure the code compiles and is fully functional. Just output the new model code, no other text, and NO testing code!
51
+
52
  ''')
53
 
54
  class KernelCoder: