devilyouwei commited on
Commit
0960430
·
verified ·
1 Parent(s): 6d3ed54

anonymous readme

Browse files
Files changed (1) hide show
  1. README.md +18 -25
README.md CHANGED
@@ -59,38 +59,31 @@ training_args = TrainingArguments(
59
 
60
  To train and deploy the SmartBERT V2 model for Web API services, please refer to our GitHub repository: [web3se-lab/SmartBERT](https://github.com/web3se-lab/SmartBERT).
61
 
62
- Or use pipline:
63
 
64
  ```python
65
- from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
 
66
 
67
- model = RobertaForMaskedLM.from_pretrained('web3se/SmartBERT-v3')
68
- tokenizer = RobertaTokenizer.from_pretrained('web3se/SmartBERT-v3')
69
 
70
- code_example = "function totalSupply() external view <mask> (uint256);"
71
- fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
72
 
73
- outputs = fill_mask(code_example)
74
- print(outputs)
75
- ```
76
-
77
- ## Contributors
 
78
 
79
- - [Youwei Huang](https://www.devil.ren)
80
- - [Sen Fang](https://github.com/TomasAndersonFang)
81
 
82
- ## Citations
 
83
 
84
- ```tex
85
- @article{huang2025smart,
86
- title={Smart Contract Intent Detection with Pre-trained Programming Language Model},
87
- author={Huang, Youwei and Li, Jianwen and Fang, Sen and Li, Yao and Yang, Peng and Hu, Bin},
88
- journal={arXiv preprint arXiv:2508.20086},
89
- year={2025}
90
- }
91
  ```
92
 
93
- ## Sponsors
94
-
95
- - [Institute of Intelligent Computing Technology, Suzhou, CAS](http://iict.ac.cn/)
96
- - CAS Mino (中科劢诺)
 
59
 
60
  To train and deploy the SmartBERT V2 model for Web API services, please refer to our GitHub repository: [web3se-lab/SmartBERT](https://github.com/web3se-lab/SmartBERT).
61
 
62
+ Or use pipeline:
63
 
64
  ```python
65
+ import torch
66
+ from transformers import RobertaTokenizer, RobertaModel
67
 
68
+ tokenizer = RobertaTokenizer.from_pretrained("web3se/SmartBERT-v2")
69
+ model = RobertaModel.from_pretrained("web3se/SmartBERT-v2")
70
 
71
+ code = "function totalSupply() external view returns (uint256);"
 
72
 
73
+ inputs = tokenizer(
74
+ code,
75
+ return_tensors="pt",
76
+ truncation=True,
77
+ max_length=512
78
+ )
79
 
80
+ with torch.no_grad():
81
+ outputs = model(**inputs)
82
 
83
+ # Option 1: CLS embedding
84
+ cls_embedding = outputs.last_hidden_state[:, 0, :]
85
 
86
+ # Option 2: Mean pooling (often better for code)
87
+ mean_embedding = outputs.last_hidden_state.mean(dim=1)
 
 
 
 
 
88
  ```
89