TheBloke
/

starcoder-GPTQ

@@ -271,13 +271,20 @@ extra_gated_fields:
 These files are GPTQ 4bit model files for [Bigcode's Starcoder](https://huggingface.co/bigcode/starcoder).
-It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starcoder-GPTQ)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/none)
-* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bigcode/starcoder)
 ## How to easily download and use this model in text-generation-webui
@@ -308,7 +315,6 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/starcoder-GPTQ"
-model_basename = "gptq_model-4bit--1g"
 use_triton = False
@@ -322,33 +328,9 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         use_triton=use_triton,
         quantize_config=None)
-print("\n\n*** Generate:")
-input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
-output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
-print(tokenizer.decode(output[0]))
-# Inference can also be done using transformers' pipeline
-# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
-logging.set_verbosity(logging.CRITICAL)
-prompt = "Tell me about AI"
-prompt_template=f'''### Human: {prompt}
-### Assistant:'''
-print("*** Pipeline:")
-pipe = pipeline(
-    "text-generation",
-    model=model,
-    tokenizer=tokenizer,
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.95,
-    repetition_penalty=1.15
-)
-print(pipe(prompt_template)[0]['generated_text'])
 ```
 ## Provided files
@@ -361,7 +343,7 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
 * `gptq_model-4bit--1g.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
-  * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = -1. Act Order / desc_act = True.

 These files are GPTQ 4bit model files for [Bigcode's Starcoder](https://huggingface.co/bigcode/starcoder).
+It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
 ## Repositories available
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starcoder-GPTQ)
+* [Bigcoder's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bigcode/starcoder)
+## Prompting
+The model was trained on GitHub code.
+As such it is _not_ an instruction model and commands like "Write a function that computes the square root." do not work well.
+However, by using the [Tech Assistant prompt](https://huggingface.co/datasets/bigcode/ta-prompt) you can turn it into a capable technical assistant.
 ## How to easily download and use this model in text-generation-webui
 import argparse
 model_name_or_path = "TheBloke/starcoder-GPTQ"
 use_triton = False
         use_triton=use_triton,
         quantize_config=None)
+inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
 ```
 ## Provided files
 * `gptq_model-4bit--1g.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
+  * Does not work with GPTQ-for-LLaMa.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = -1. Act Order / desc_act = True.