| | --- |
| | license: bsd |
| | datasets: |
| | - ManthanKulakarni/Text2JQL_v2 |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | tags: |
| | - LLaMa |
| | - JQL |
| | - Jira |
| | - GGML |
| | - GGML-q8_0 |
| | - GPU |
| | - CPU |
| | - 7B |
| | - llama.cpp |
| | - text-generation-webui |
| | --- |
| | |
| | GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) |
| |
|
| | ## How to run in `llama.cpp` |
| |
|
| |
|
| | ``` |
| | ./main -t 10 -ngl 32 -m ggml-model-q8_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write JQL(Jira query Language) for give input ### Input: stories assigned to manthan which are created in last 10 days with highest priority and label is set to release ### Response:" |
| | ``` |
| | Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. |
| |
|
| | Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. |
| |
|
| | Tto have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins` |
| |
|
| | ## How to run in `text-generation-webui` |
| |
|
| | Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md). |
| |
|
| | ## How to run using `LangChain` |
| |
|
| | ##### Instalation on CPU |
| | ``` |
| | pip install llama-cpp-python |
| | ``` |
| | ##### Instalation on GPU |
| | ``` |
| | CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python |
| | ``` |
| |
|
| | ```python |
| | from langchain.llms import LlamaCpp |
| | from langchain import PromptTemplate, LLMChain |
| | from langchain.callbacks.manager import CallbackManager |
| | from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler |
| | |
| | n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool. |
| | n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU. |
| | n_ctx=2048 |
| | |
| | callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) |
| | |
| | # Make sure the model path is correct for your system! |
| | llm = LlamaCpp( |
| | model_path="./ggml-model-q8_0.bin", |
| | n_gpu_layers=n_gpu_layers, n_batch=n_batch, |
| | callback_manager=callback_manager, |
| | verbose=True, |
| | n_ctx=n_ctx |
| | ) |
| | |
| | llm("""### Instruction: |
| | Write JQL(Jira query Language) for give input |
| | |
| | ### Input: |
| | stories assigned to manthan which are created in last 10 days with highest priority and label is set to release |
| | |
| | ### Response:""") |
| | ``` |
| | For more information refer [LangChain](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/llamacpp) |