TechxGenus commited on
Commit
29f04b5
·
verified ·
1 Parent(s): c6faf7d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +407 -0
README.md ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CursorCore: Assist Programming through Aligning Anything
2
+
3
+ <p align="center">
4
+ <a href="http://arxiv.org/abs/2410.07002">[📄arXiv]</a> |
5
+ <a href="https://hf.co/papers/2410.07002">[🤗HF Paper]</a> |
6
+ <a href="https://huggingface.co/collections/TechxGenus/cursorcore-series-6706618c38598468866b60e2">[🤖Models]</a> |
7
+ <a href="https://github.com/TechxGenus/CursorCore">[🛠️Code]</a> |
8
+ <a href="https://github.com/TechxGenus/CursorWeb">[<img src="https://github.com/TechxGenus/CursorCore/blob/main/pictures/cursorcore.png" width="12.5px">Web]</a> |
9
+ <a href="https://discord.gg/Z5Tev8fV">[<img src="https://github.com/TechxGenus/CursorCore/blob/main/pictures/discord.png" width="15x">Discord]</a>
10
+ </p>
11
+
12
+ <hr>
13
+
14
+ - [CursorCore: Assist Programming through Aligning Anything](#cursorcore-assist-programming-through-aligning-anything)
15
+ - [Introduction](#introduction)
16
+ - [Models](#models)
17
+ - [Usage](#usage)
18
+ - [1) Normal chat](#1-normal-chat)
19
+ - [2) Assistant-Conversation](#2-assistant-conversation)
20
+ - [3) Web Demo](#3-web-demo)
21
+ - [Future Work](#future-work)
22
+ - [Citation](#citation)
23
+ - [Contribution](#contribution)
24
+
25
+ <hr>
26
+
27
+ ## Introduction
28
+
29
+ CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read [our paper](http://arxiv.org/abs/2410.07002) to learn more.
30
+
31
+ <p align="center">
32
+ <img width="100%" alt="conversation" src="https://github.com/TechxGenus/CursorCore/blob/main/pictures/conversation.png">
33
+ </p>
34
+
35
+ ![CursorWeb](https://github.com/TechxGenus/CursorCore/blob/main/pictures/CursorWeb.gif)
36
+
37
+ ## Models
38
+
39
+ Our models have been open-sourced on Hugging Face. You can access our models here: [CursorCore-Series](https://huggingface.co/collections/TechxGenus/cursorcore-series-6706618c38598468866b60e2"). We also provide pre-quantized weights for GPTQ and AWQ here: [CursorCore-Quantization](https://huggingface.co/collections/TechxGenus/cursorcore-quantization-67066431f29f252494ee8cf3)
40
+
41
+ ## Usage
42
+
43
+ Here are some examples of how to use our model:
44
+
45
+ ### 1) Normal chat
46
+
47
+ Script:
48
+
49
+ ````python
50
+ import torch
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ "TechxGenus/CursorCore-Yi-9B",
56
+ torch_dtype=torch.bfloat16,
57
+ device_map="auto"
58
+ )
59
+
60
+ messages = [
61
+ {"role": "user", "content": "Hi!"},
62
+ ]
63
+ prompt = tokenizer.apply_chat_template(
64
+ messages,
65
+ tokenize=False,
66
+ add_generation_prompt=True
67
+ )
68
+
69
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
70
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512)
71
+ print(tokenizer.decode(outputs[0]))
72
+ ````
73
+
74
+ Output:
75
+
76
+ ````txt
77
+ <|im_start|>system
78
+ You are a helpful programming assistant.<|im_end|>
79
+ <|im_start|>user
80
+ Hi!<|im_end|>
81
+ <|im_start|>assistant
82
+ Hello! I'm an AI language model and I can help you with any programming questions you might have. What specific problem or task are you trying to solve?<|im_end|>
83
+ ````
84
+
85
+ ### 2) Assistant-Conversation
86
+
87
+ In our work, we introduce a new framework of AI-assisted programming task. It is designed for aligning anything during programming process, used for the implementation of features like Tab and Inline Chat.
88
+
89
+ Script 1:
90
+
91
+ ````python
92
+ import torch
93
+ from transformers import AutoTokenizer, AutoModelForCausalLM
94
+ from eval.utils import prepare_input_for_wf
95
+
96
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
97
+ model = AutoModelForCausalLM.from_pretrained(
98
+ "TechxGenus/CursorCore-Yi-9B",
99
+ torch_dtype=torch.bfloat16,
100
+ device_map="auto"
101
+ )
102
+ sample = {
103
+ "history": [
104
+ {
105
+ "type": "code",
106
+ "lang": "python",
107
+ "code": """def quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
108
+ }
109
+ ],
110
+ "current": {
111
+ "type": "code",
112
+ "lang": "python",
113
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
114
+ },
115
+ "user": ""
116
+ }
117
+
118
+ prompt = tokenizer.apply_chat_template(
119
+ prepare_input_for_wf(sample),
120
+ tokenize=False,
121
+ chat_template="assistant-conversation",
122
+ add_generation_prompt=True
123
+ )
124
+
125
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
126
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
127
+ print(tokenizer.decode(outputs[0]))
128
+ ````
129
+
130
+ Output 1:
131
+
132
+ ````txt
133
+ <|im_start|>system
134
+ You are a helpful programming assistant.<|im_end|>
135
+ <|im_start|>history
136
+ ```python
137
+ def quick_sort(arr):
138
+ if len(arr) <= 1:
139
+ return arr
140
+ pivot = arr[len(arr) // 2]
141
+ left = [x for x in arr if x < pivot]
142
+ middle = [x for x in arr if x == pivot]
143
+ right = [x for x in arr if x > pivot]
144
+ return quick_sort(left) + middle + quick_sort(right)
145
+ ```<|im_end|>
146
+ <|im_start|>current
147
+ ```python
148
+ def quick_sort(array):
149
+ if len(arr) <= 1:
150
+ return arr
151
+ pivot = arr[len(arr) // 2]
152
+ left = [x for x in arr if x < pivot]
153
+ middle = [x for x in arr if x == pivot]
154
+ right = [x for x in arr if x > pivot]
155
+ return quick_sort(left) + middle + quick_sort(right)
156
+ ```<|im_end|>
157
+ <|im_start|>assistant
158
+ <|next_start|>```python
159
+ def quick_sort(array):
160
+ if len(array) <= 1:
161
+ return array
162
+ pivot = array[len(array) // 2]
163
+ left = [x for x in array if x < pivot]
164
+ middle = [x for x in array if x == pivot]
165
+ right = [x for x in array if x > pivot]
166
+ return quick_sort(left) + middle + quick_sort(right)
167
+ ```<|next_end|>
168
+ The user has provided a revised code change that corrects the parameter name from `arr` to `array` in the `quick_sort` function. This change ensures consistency in the function definition and avoids potential confusion or errors.
169
+
170
+ To implement this, we will:
171
+ 1. Update the parameter name in the function definition from `arr` to `array`.
172
+ 2. Ensure that all references to `arr` within the function are updated to `array`.
173
+
174
+ This will make the function definition and internal references consistent, improving code readability and maintainability.<|im_end|>
175
+ ````
176
+
177
+ Script 2:
178
+
179
+ ````python
180
+ import torch
181
+ from transformers import AutoTokenizer, AutoModelForCausalLM
182
+ from eval.utils import prepare_input_for_wf
183
+
184
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
185
+ model = AutoModelForCausalLM.from_pretrained(
186
+ "TechxGenus/CursorCore-Yi-9B",
187
+ torch_dtype=torch.bfloat16,
188
+ device_map="auto"
189
+ )
190
+ sample = {
191
+ "history": [],
192
+ "current": {
193
+ "type": "code",
194
+ "lang": "python",
195
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
196
+ },
197
+ "user": "Add Docstring."
198
+ }
199
+
200
+ prompt = tokenizer.apply_chat_template(
201
+ prepare_input_for_wf(sample),
202
+ tokenize=False,
203
+ chat_template="assistant-conversation",
204
+ add_generation_prompt=True
205
+ )
206
+
207
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
208
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
209
+ print(tokenizer.decode(outputs[0]))
210
+ ````
211
+
212
+ Output 2:
213
+
214
+ ````txt
215
+ <|im_start|>system
216
+ You are a helpful programming assistant.<|im_end|>
217
+ <|im_start|>current
218
+ ```python
219
+ def quick_sort(array):
220
+ if len(arr) <= 1:
221
+ return arr
222
+ pivot = arr[len(arr) // 2]
223
+ left = [x for x in arr if x < pivot]
224
+ middle = [x for x in arr if x == pivot]
225
+ right = [x for x in arr if x > pivot]
226
+ return quick_sort(left) + middle + quick_sort(right)
227
+ ```<|im_end|>
228
+ <|im_start|>user
229
+ Add Docstring.<|im_end|>
230
+ <|im_start|>assistant
231
+ <|next_start|>```python
232
+ def quick_sort(array):
233
+ """
234
+ This is an implementation of the quick sort algorithm.
235
+ """
236
+ if len(arr) <= 1:
237
+ return arr
238
+ pivot = arr[len(arr) // 2]
239
+ left = [x for x in arr if x < pivot]
240
+ middle = [x for x in arr if x == pivot]
241
+ right = [x for x in arr if x > pivot]
242
+ return quick_sort(left) + middle + quick_sort(right)
243
+ ```<|next_end|><|im_end|>
244
+ ````
245
+
246
+ For models in Locate-and-Change (LC) and Search-and-Replace (SR) formats, the output examples are as follows:
247
+
248
+ Script for LC:
249
+
250
+ ````python
251
+ import torch
252
+ from transformers import AutoTokenizer, AutoModelForCausalLM
253
+ from eval.utils import prepare_input_for_lc
254
+
255
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-LC")
256
+ model = AutoModelForCausalLM.from_pretrained(
257
+ "TechxGenus/CursorCore-Yi-1.5B-LC",
258
+ torch_dtype=torch.bfloat16,
259
+ device_map="auto"
260
+ )
261
+ sample = {
262
+ "history": [],
263
+ "current": {
264
+ "type": "code",
265
+ "lang": "python",
266
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
267
+ },
268
+ "user": "Add Docstring."
269
+ }
270
+
271
+ prompt = tokenizer.apply_chat_template(
272
+ prepare_input_for_lc(sample),
273
+ tokenize=False,
274
+ chat_template="assistant-conversation",
275
+ add_generation_prompt=True
276
+ )
277
+
278
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
279
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
280
+ print(tokenizer.decode(outputs[0]))
281
+ ````
282
+
283
+ Output for LC:
284
+
285
+ ````txt
286
+ <|im_start|>system
287
+ You are a helpful programming assistant.<|im_end|>
288
+ <|im_start|>current
289
+ ```python
290
+ 1 def quick_sort(array):
291
+ 2 if len(arr) <= 1:
292
+ 3 return arr
293
+ 4 pivot = arr[len(arr) // 2]
294
+ 5 left = [x for x in arr if x < pivot]
295
+ 6 middle = [x for x in arr if x == pivot]
296
+ 7 right = [x for x in arr if x > pivot]
297
+ 8 return quick_sort(left) + middle + quick_sort(right)
298
+ ```<|im_end|>
299
+ <|im_start|>user
300
+ Add Docstring.<|im_end|>
301
+ <|im_start|>assistant
302
+ <|next_start|>1,1
303
+ ```
304
+ '''This function will sort an array using quick sort algorithm'''
305
+ ```<|next_end|>
306
+ To enhance the readability and maintainability of the code, we should add a docstring to the `quick_sort` function. A docstring provides a clear description of what the function does, which is particularly useful for other developers who might use or modify this code in the future.
307
+
308
+ The docstring will be added immediately after the function definition, explaining that the function uses the quick sort algorithm to sort an array. This will make the code more self-explanatory and easier to understand.
309
+
310
+ Here's the plan:
311
+ 1. Add a docstring at the beginning of the `quick_sort` function.
312
+ 2. Ensure the docstring is clear and concise, describing the purpose of the function.
313
+
314
+ This modification will improve the code's documentation without altering its functionality.<|im_end|>
315
+ ````
316
+
317
+ Script for SR:
318
+
319
+ ````python
320
+ import torch
321
+ from transformers import AutoTokenizer, AutoModelForCausalLM
322
+ from eval.utils import prepare_input_for_sr
323
+
324
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-SR")
325
+ model = AutoModelForCausalLM.from_pretrained(
326
+ "TechxGenus/CursorCore-Yi-1.5B-SR",
327
+ torch_dtype=torch.bfloat16,
328
+ device_map="auto"
329
+ )
330
+ sample = {
331
+ "history": [],
332
+ "current": {
333
+ "type": "code",
334
+ "lang": "python",
335
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
336
+ },
337
+ "user": "Add Docstring."
338
+ }
339
+
340
+ prompt = tokenizer.apply_chat_template(
341
+ prepare_input_for_sr(sample),
342
+ tokenize=False,
343
+ chat_template="assistant-conversation",
344
+ add_generation_prompt=True
345
+ )
346
+
347
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
348
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
349
+ print(tokenizer.decode(outputs[0]))
350
+ ````
351
+
352
+ Output for SR:
353
+
354
+ ````txt
355
+ <|im_start|>system
356
+ You are a helpful programming assistant.<|im_end|>
357
+ <|im_start|>current
358
+ ```python
359
+ def quick_sort(array):
360
+ if len(arr) <= 1:
361
+ return arr
362
+ pivot = arr[len(arr) // 2]
363
+ left = [x for x in arr if x < pivot]
364
+ middle = [x for x in arr if x == pivot]
365
+ right = [x for x in arr if x > pivot]
366
+ return quick_sort(left) + middle + quick_sort(right)
367
+ ```<|im_end|>
368
+ <|im_start|>user
369
+ Add Docstring.<|im_end|>
370
+ <|im_start|>assistant
371
+ <|next_start|>```python
372
+ def quick_sort(array):
373
+ <|search_and_replace|>
374
+ def quick_sort(array):
375
+ """
376
+ This function implements quick sort algorithm
377
+ """
378
+ ```<|next_end|><|im_end|>
379
+ ````
380
+
381
+ ### 3) Web Demo
382
+
383
+ We create a web demo for CursorCore. Please visit [CursorWeb](https://github.com/TechxGenus/CursorWeb) for more details.
384
+
385
+ ## Future Work
386
+
387
+ CursorCore is still in a very early stage, and lots of work is needed to achieve a better user experience. For example:
388
+
389
+ - Repository-level editing support
390
+ - Better and faster editing formats
391
+ - Better user interface and presentation
392
+ - ...
393
+
394
+ ## Citation
395
+
396
+ ```bibtex
397
+ @article{jiang2024cursorcore,
398
+ title = {CursorCore: Assist Programming through Aligning Anything},
399
+ author = {Hao Jiang and Qi Liu and Rui Li and Shengyu Ye and Shijin Wang},
400
+ year = {2024},
401
+ journal = {arXiv preprint arXiv: 2410.07002}
402
+ }
403
+ ```
404
+
405
+ ## Contribution
406
+
407
+ Contributions are welcome! If you find any bugs or have suggestions for improvements, please open an issue or submit a pull request.