Add link to paper in model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +6 -387
README.md CHANGED
@@ -1,24 +1,21 @@
1
-
2
  ---
3
-
4
- tags:
5
- - code
6
  base_model:
7
  - Qwen/Qwen2.5-Coder-7B
8
  library_name: transformers
9
- pipeline_tag: text-generation
10
  license: apache-2.0
11
-
 
 
12
  ---
13
 
14
  [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
15
 
16
-
17
  # QuantFactory/CursorCore-QW2.5-7B-GGUF
18
  This is quantized version of [TechxGenus/CursorCore-QW2.5-7B](https://huggingface.co/TechxGenus/CursorCore-QW2.5-7B) created using llama.cpp
19
 
20
  # Original Model Card
21
 
 
22
 
23
  # CursorCore: Assist Programming through Aligning Anything
24
 
@@ -48,383 +45,5 @@ This is quantized version of [TechxGenus/CursorCore-QW2.5-7B](https://huggingfac
48
 
49
  ## Introduction
50
 
51
- CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read [our paper](http://arxiv.org/abs/2410.07002) to learn more.
52
-
53
- <p align="center">
54
- <img width="100%" alt="conversation" src="https://raw.githubusercontent.com/TechxGenus/CursorCore/main/pictures/conversation.png">
55
- </p>
56
-
57
- ![CursorWeb](https://raw.githubusercontent.com/TechxGenus/CursorCore/main/pictures/CursorWeb.gif)
58
-
59
- ## Models
60
-
61
- Our models have been open-sourced on Hugging Face. You can access our models here: [CursorCore-Series](https://huggingface.co/collections/TechxGenus/cursorcore-series-6706618c38598468866b60e2"). We also provide pre-quantized weights for GPTQ and AWQ here: [CursorCore-Quantization](https://huggingface.co/collections/TechxGenus/cursorcore-quantization-67066431f29f252494ee8cf3)
62
-
63
- ## Usage
64
-
65
- Here are some examples of how to use our model:
66
-
67
- ### 1) Normal chat
68
-
69
- Script:
70
-
71
- ````python
72
- import torch
73
- from transformers import AutoTokenizer, AutoModelForCausalLM
74
-
75
- tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
76
- model = AutoModelForCausalLM.from_pretrained(
77
- "TechxGenus/CursorCore-Yi-9B",
78
- torch_dtype=torch.bfloat16,
79
- device_map="auto"
80
- )
81
-
82
- messages = [
83
- {"role": "user", "content": "Hi!"},
84
- ]
85
- prompt = tokenizer.apply_chat_template(
86
- messages,
87
- tokenize=False,
88
- add_generation_prompt=True
89
- )
90
-
91
- inputs = tokenizer.encode(prompt, return_tensors="pt")
92
- outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512)
93
- print(tokenizer.decode(outputs[0]))
94
- ````
95
-
96
- Output:
97
-
98
- ````txt
99
- <|im_start|>system
100
- You are a helpful programming assistant.<|im_end|>
101
- <|im_start|>user
102
- Hi!<|im_end|>
103
- <|im_start|>assistant
104
- Hello! I'm an AI language model and I can help you with any programming questions you might have. What specific problem or task are you trying to solve?<|im_end|>
105
- ````
106
-
107
- ### 2) Assistant-Conversation
108
-
109
- In our work, we introduce a new framework of AI-assisted programming task. It is designed for aligning anything during programming process, used for the implementation of features like Tab and Inline Chat.
110
-
111
- Script 1:
112
-
113
- ````python
114
- import torch
115
- from transformers import AutoTokenizer, AutoModelForCausalLM
116
- from eval.utils import prepare_input_for_wf
117
-
118
- tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
119
- model = AutoModelForCausalLM.from_pretrained(
120
- "TechxGenus/CursorCore-Yi-9B",
121
- torch_dtype=torch.bfloat16,
122
- device_map="auto"
123
- )
124
- sample = {
125
- "history": [
126
- {
127
- "type": "code",
128
- "lang": "python",
129
- "code": """def quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
130
- }
131
- ],
132
- "current": {
133
- "type": "code",
134
- "lang": "python",
135
- "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
136
- },
137
- "user": ""
138
- }
139
-
140
- prompt = tokenizer.apply_chat_template(
141
- prepare_input_for_wf(sample),
142
- tokenize=False,
143
- chat_template="assistant-conversation",
144
- add_generation_prompt=True
145
- )
146
-
147
- inputs = tokenizer.encode(prompt, return_tensors="pt")
148
- outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
149
- print(tokenizer.decode(outputs[0]))
150
- ````
151
-
152
- Output 1:
153
-
154
- ````txt
155
- <|im_start|>system
156
- You are a helpful programming assistant.<|im_end|>
157
- <|im_start|>history
158
- ```python
159
- def quick_sort(arr):
160
- if len(arr) <= 1:
161
- return arr
162
- pivot = arr[len(arr) // 2]
163
- left = [x for x in arr if x < pivot]
164
- middle = [x for x in arr if x == pivot]
165
- right = [x for x in arr if x > pivot]
166
- return quick_sort(left) + middle + quick_sort(right)
167
- ```<|im_end|>
168
- <|im_start|>current
169
- ```python
170
- def quick_sort(array):
171
- if len(arr) <= 1:
172
- return arr
173
- pivot = arr[len(arr) // 2]
174
- left = [x for x in arr if x < pivot]
175
- middle = [x for x in arr if x == pivot]
176
- right = [x for x in arr if x > pivot]
177
- return quick_sort(left) + middle + quick_sort(right)
178
- ```<|im_end|>
179
- <|im_start|>assistant
180
- <|next_start|>```python
181
- def quick_sort(array):
182
- if len(array) <= 1:
183
- return array
184
- pivot = array[len(array) // 2]
185
- left = [x for x in array if x < pivot]
186
- middle = [x for x in array if x == pivot]
187
- right = [x for x in array if x > pivot]
188
- return quick_sort(left) + middle + quick_sort(right)
189
- ```<|next_end|>
190
- The user has provided a revised code change that corrects the parameter name from `arr` to `array` in the `quick_sort` function. This change ensures consistency in the function definition and avoids potential confusion or errors.
191
-
192
- To implement this, we will:
193
- 1. Update the parameter name in the function definition from `arr` to `array`.
194
- 2. Ensure that all references to `arr` within the function are updated to `array`.
195
-
196
- This will make the function definition and internal references consistent, improving code readability and maintainability.<|im_end|>
197
- ````
198
-
199
- Script 2:
200
-
201
- ````python
202
- import torch
203
- from transformers import AutoTokenizer, AutoModelForCausalLM
204
- from eval.utils import prepare_input_for_wf
205
-
206
- tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
207
- model = AutoModelForCausalLM.from_pretrained(
208
- "TechxGenus/CursorCore-Yi-9B",
209
- torch_dtype=torch.bfloat16,
210
- device_map="auto"
211
- )
212
- sample = {
213
- "history": [],
214
- "current": {
215
- "type": "code",
216
- "lang": "python",
217
- "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
218
- },
219
- "user": "Add Docstring."
220
- }
221
-
222
- prompt = tokenizer.apply_chat_template(
223
- prepare_input_for_wf(sample),
224
- tokenize=False,
225
- chat_template="assistant-conversation",
226
- add_generation_prompt=True
227
- )
228
-
229
- inputs = tokenizer.encode(prompt, return_tensors="pt")
230
- outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
231
- print(tokenizer.decode(outputs[0]))
232
- ````
233
-
234
- Output 2:
235
-
236
- ````txt
237
- <|im_start|>system
238
- You are a helpful programming assistant.<|im_end|>
239
- <|im_start|>current
240
- ```python
241
- def quick_sort(array):
242
- if len(arr) <= 1:
243
- return arr
244
- pivot = arr[len(arr) // 2]
245
- left = [x for x in arr if x < pivot]
246
- middle = [x for x in arr if x == pivot]
247
- right = [x for x in arr if x > pivot]
248
- return quick_sort(left) + middle + quick_sort(right)
249
- ```<|im_end|>
250
- <|im_start|>user
251
- Add Docstring.<|im_end|>
252
- <|im_start|>assistant
253
- <|next_start|>```python
254
- def quick_sort(array):
255
- """
256
- This is an implementation of the quick sort algorithm.
257
- """
258
- if len(arr) <= 1:
259
- return arr
260
- pivot = arr[len(arr) // 2]
261
- left = [x for x in arr if x < pivot]
262
- middle = [x for x in arr if x == pivot]
263
- right = [x for x in arr if x > pivot]
264
- return quick_sort(left) + middle + quick_sort(right)
265
- ```<|next_end|><|im_end|>
266
- ````
267
-
268
- For models in Locate-and-Change (LC) and Search-and-Replace (SR) formats, the output examples are as follows:
269
-
270
- Script for LC:
271
-
272
- ````python
273
- import torch
274
- from transformers import AutoTokenizer, AutoModelForCausalLM
275
- from eval.utils import prepare_input_for_lc
276
-
277
- tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-LC")
278
- model = AutoModelForCausalLM.from_pretrained(
279
- "TechxGenus/CursorCore-Yi-1.5B-LC",
280
- torch_dtype=torch.bfloat16,
281
- device_map="auto"
282
- )
283
- sample = {
284
- "history": [],
285
- "current": {
286
- "type": "code",
287
- "lang": "python",
288
- "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
289
- },
290
- "user": "Add Docstring."
291
- }
292
-
293
- prompt = tokenizer.apply_chat_template(
294
- prepare_input_for_lc(sample),
295
- tokenize=False,
296
- chat_template="assistant-conversation",
297
- add_generation_prompt=True
298
- )
299
-
300
- inputs = tokenizer.encode(prompt, return_tensors="pt")
301
- outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
302
- print(tokenizer.decode(outputs[0]))
303
- ````
304
-
305
- Output for LC:
306
-
307
- ````txt
308
- <|im_start|>system
309
- You are a helpful programming assistant.<|im_end|>
310
- <|im_start|>current
311
- ```python
312
- 1 def quick_sort(array):
313
- 2 if len(arr) <= 1:
314
- 3 return arr
315
- 4 pivot = arr[len(arr) // 2]
316
- 5 left = [x for x in arr if x < pivot]
317
- 6 middle = [x for x in arr if x == pivot]
318
- 7 right = [x for x in arr if x > pivot]
319
- 8 return quick_sort(left) + middle + quick_sort(right)
320
- ```<|im_end|>
321
- <|im_start|>user
322
- Add Docstring.<|im_end|>
323
- <|im_start|>assistant
324
- <|next_start|>1,1
325
- ```
326
- '''This function will sort an array using quick sort algorithm'''
327
- ```<|next_end|>
328
- To enhance the readability and maintainability of the code, we should add a docstring to the `quick_sort` function. A docstring provides a clear description of what the function does, which is particularly useful for other developers who might use or modify this code in the future.
329
-
330
- The docstring will be added immediately after the function definition, explaining that the function uses the quick sort algorithm to sort an array. This will make the code more self-explanatory and easier to understand.
331
-
332
- Here's the plan:
333
- 1. Add a docstring at the beginning of the `quick_sort` function.
334
- 2. Ensure the docstring is clear and concise, describing the purpose of the function.
335
-
336
- This modification will improve the code's documentation without altering its functionality.<|im_end|>
337
- ````
338
-
339
- Script for SR:
340
-
341
- ````python
342
- import torch
343
- from transformers import AutoTokenizer, AutoModelForCausalLM
344
- from eval.utils import prepare_input_for_sr
345
-
346
- tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-SR")
347
- model = AutoModelForCausalLM.from_pretrained(
348
- "TechxGenus/CursorCore-Yi-1.5B-SR",
349
- torch_dtype=torch.bfloat16,
350
- device_map="auto"
351
- )
352
- sample = {
353
- "history": [],
354
- "current": {
355
- "type": "code",
356
- "lang": "python",
357
- "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
358
- },
359
- "user": "Add Docstring."
360
- }
361
-
362
- prompt = tokenizer.apply_chat_template(
363
- prepare_input_for_sr(sample),
364
- tokenize=False,
365
- chat_template="assistant-conversation",
366
- add_generation_prompt=True
367
- )
368
-
369
- inputs = tokenizer.encode(prompt, return_tensors="pt")
370
- outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
371
- print(tokenizer.decode(outputs[0]))
372
- ````
373
-
374
- Output for SR:
375
-
376
- ````txt
377
- <|im_start|>system
378
- You are a helpful programming assistant.<|im_end|>
379
- <|im_start|>current
380
- ```python
381
- def quick_sort(array):
382
- if len(arr) <= 1:
383
- return arr
384
- pivot = arr[len(arr) // 2]
385
- left = [x for x in arr if x < pivot]
386
- middle = [x for x in arr if x == pivot]
387
- right = [x for x in arr if x > pivot]
388
- return quick_sort(left) + middle + quick_sort(right)
389
- ```<|im_end|>
390
- <|im_start|>user
391
- Add Docstring.<|im_end|>
392
- <|im_start|>assistant
393
- <|next_start|>```python
394
- def quick_sort(array):
395
- <|search_and_replace|>
396
- def quick_sort(array):
397
- """
398
- This function implements quick sort algorithm
399
- """
400
- ```<|next_end|><|im_end|>
401
- ````
402
-
403
- ### 3) Web Demo
404
-
405
- We create a web demo for CursorCore. Please visit [CursorWeb](https://github.com/TechxGenus/CursorWeb) for more details.
406
-
407
- ## Future Work
408
-
409
- CursorCore is still in a very early stage, and lots of work is needed to achieve a better user experience. For example:
410
-
411
- - Repository-level editing support
412
- - Better and faster editing formats
413
- - Better user interface and presentation
414
- - ...
415
-
416
- ## Citation
417
-
418
- ```bibtex
419
- @article{jiang2024cursorcore,
420
- title = {CursorCore: Assist Programming through Aligning Anything},
421
- author = {Hao Jiang and Qi Liu and Rui Li and Shengyu Ye and Shijin Wang},
422
- year = {2024},
423
- journal = {arXiv preprint arXiv: 2410.07002}
424
- }
425
- ```
426
-
427
- ## Contribution
428
-
429
- Contributions are welcome! If you find any bugs or have suggestions for improvements, please open an issue or submit a pull request.
430
-
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-Coder-7B
4
  library_name: transformers
 
5
  license: apache-2.0
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - code
9
  ---
10
 
11
  [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
12
 
 
13
  # QuantFactory/CursorCore-QW2.5-7B-GGUF
14
  This is quantized version of [TechxGenus/CursorCore-QW2.5-7B](https://huggingface.co/TechxGenus/CursorCore-QW2.5-7B) created using llama.cpp
15
 
16
  # Original Model Card
17
 
18
+ This model is based on the work described in the paper: [CursorCore: Assist Programming through Aligning Anything](https://huggingface.co/papers/2410.07002).
19
 
20
  # CursorCore: Assist Programming through Aligning Anything
21
 
 
45
 
46
  ## Introduction
47
 
48
+ CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read [our paper](http://arxiv.org/abs/2410.07002) to learn more后再
49
+