Instructions to use semantixai/Lloro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use semantixai/Lloro with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="semantixai/Lloro")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("semantixai/Lloro")
model = AutoModelForCausalLM.from_pretrained("semantixai/Lloro")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use semantixai/Lloro with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "semantixai/Lloro"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "semantixai/Lloro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/semantixai/Lloro

SGLang

How to use semantixai/Lloro with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "semantixai/Lloro" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "semantixai/Lloro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "semantixai/Lloro" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "semantixai/Lloro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use semantixai/Lloro with Docker Model Runner:
```
docker model run hf.co/semantixai/Lloro
```

Update README.md

by rafaelgeraldini - opened May 7, 2024

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+47

-48

Files changed (1) hide show

README.md +47 -48

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ library_name: transformers
 base_model: codellama/CodeLlama-7b-Instruct-hf
 license: llama2
 datasets:
-- semantixai/Test-Dataset-Lloro
 language:
 - pt
 tags:
@@ -11,41 +11,45 @@ tags:
 - analytics
 - analise-dados
 - portugues-BR
 ---
 **Lloro 7B**
 <img src="https://cdn-uploads.huggingface.co/production/uploads/653176dc69fffcfe1543860a/h0kNd9OTEu1QdGNjHKXoq.png" width="300" alt="Lloro-7b Logo"/>
-Lloro, developed by Semantix Research Labs , is a language Model that was trained  to effectively perform Portuguese Data Analysis in Python. It is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf,  that was trained on synthetic datasets .  The fine-tuning process was performed using the QLORA metodology on a GPU V100 with 16 GB of RAM.
 **Model description**
 Model type: A 7B parameter  fine-tuned on synthetic datasets.
 Language(s) (NLP): Primarily Portuguese, but the model is capable to understand English as well
 Finetuned from model: codellama/CodeLlama-7b-Instruct-hf
 **What is Lloro's intended use(s)?**
 Lloro is built for data analysis in Portuguese contexts .
 Input : Text
 Output : Text (Code)
 **Usage**
 Using Transformers
 ```python
 #Import required libraries
 import torch
@@ -55,7 +59,7 @@ from transformers import (
 )
 #Load Model
-model_name = "semantixai/LloroV2"
 base_model = AutoModelForCausalLM.from_pretrained(
         model_name,
         return_dict=True,
@@ -80,7 +84,7 @@ outputs = base_model.generate(
     input_ids,
     do_sample=True,
     top_p=0.95,
-    max_new_tokens=1024,
     temperature=0.1,
     )
@@ -90,6 +94,7 @@ display(output_text)
 ```
 Using an OpenAI compatible inference server (like [vLLM](https://docs.vllm.ai/en/latest/index.html))
 ```python
 from openai import OpenAI
@@ -98,65 +103,59 @@ client = OpenAI(
     base_url="http://localhost:8000/v1",
 )
 user_prompt = "Desenvolva um algoritmo em Python para calcular a média e a mediana dos preços de vendas por tipo de material do produto."
-completion = client.chat.completions.create(temperature=0.1,frequency_penalty=0.1,model="semantixai/LloroV2",messages=[{"role":"system","content":"Provide answers in Python without explanations, only the code"},{"role":"user","content":user_prompt}])
 ```
 **Params**
 Training Parameters
-| Params                           | Training Data                   | Examples                        | Tokens   | LR     |
-|----------------------------------|---------------------------------|---------------------------------|----------|--------|
-| 7B                               | Pairs synthetic instructions/code | 28907                             | 3 031 188 | 1e-5   |
 **Model Sources**
-Test Dataset Repository: https://huggingface.co/datasets/semantixai/Test-Dataset-Lloro
-Model Dates Lloro was trained between November 2023 and January 2024.
 **Performance**
  | Modelo         | LLM as Judge | Code Bleu Score | Rouge-L |  CodeBert- Precision | CodeBert-Recall | CodeBert-F1 | CodeBert-F3 |
 |----------------|--------------|------------------|---------|----------------------|-----------------|-------------|-------------|
-| GPT 3.5        | 91.22%       | 0.2745           | 0.2189  |  0.7502               | 0.7146          | 0.7303       | 0.7175      |
-| Instruct -Base | 97.40%       | 0.2487           | 0.1146  | 0.6997               | 0.6473          | 0.6713       | 0.6518      |
-| Instruct -FT   | 97.76%       | 0.3264           | 0.3602  |  0.7942               | 0.8178          | 0.8042       | 0.8147      |
 **Training Infos:**
 The following hyperparameters were used during training:
-| Parameter                 | Value                |
-|---------------------------|----------------------|
-| learning_rate             | 1e-5                 |
-| weight_decay              | 0.0001               |
-| train_batch_size          | 1                    |
-| eval_batch_size           | 1                    |
-| seed                      | 42                   |
 | optimizer                 | Adam - paged_adamw_32bit |
-| lr_scheduler_type         | cosine               |
-| lr_scheduler_warmup_ratio | 0.03                 |
-| num_epochs                | 5.0                  |
 **QLoRA hyperparameters**
 The following parameters related with the Quantized Low-Rank Adaptation  and Quantization were used during training:
-| Parameter       | Value   |
-|------------------|---------|
-| lora_r           | 16      |
-| lora_alpha       | 64      |
-| lora_dropout     | 0.1     |
-| storage_dtype    | "nf4"   |
-| compute_dtype    | "float16"|
 **Experiments**
-| Model                 | Epochs | Overfitting | Final Epochs | Training Hours | CO2 Emission (Kg) |
-|-----------------------|--------|-------------|--------------|-----------------|--------------------|
-| Code Llama Instruct   | 1      | No          | 1            | 8.1               | 1.337                  |
-| Code Llama Instruct   | 5      | Yes         | 3            | 45.6               | 9.12                  |
 **Framework versions**
@@ -166,4 +165,4 @@ The following parameters related with the Quantized Low-Rank Adaptation  and Qua
 | Datasets      | 2.14.3    |
 | Pytorch       | 2.0.1     |
 | Tokenizers    | 0.14.1    |
-| Transformers  | 4.34.0    |

 base_model: codellama/CodeLlama-7b-Instruct-hf
 license: llama2
 datasets:
+- semantixai/LloroV3
 language:
 - pt
 tags:
 - analytics
 - analise-dados
 - portugues-BR
+co2_eq_emissions:
+  emissions: 1320
+  source: "Lacoste, Alexandre, et al. “Quantifying the Carbon Emissions of Machine Learning.” ArXiv (Cornell University), 21 Oct. 2019, https://doi.org/10.48550/arxiv.1910.09700."
+  training_type: "fine-tuning"
+  geographical_location: "Council Bluffs, Iowa, USA."
+  hardware_used: "1 A100 40GB GPU"
 ---
 **Lloro 7B**
 <img src="https://cdn-uploads.huggingface.co/production/uploads/653176dc69fffcfe1543860a/h0kNd9OTEu1QdGNjHKXoq.png" width="300" alt="Lloro-7b Logo"/>
+Lloro, developed by Semantix Research Labs , is a language Model that was trained  to effectively perform Portuguese Data Analysis in Python. It is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf,  that was trained on synthetic datasets.  The fine-tuning process was performed using the QLORA metodology on a GPU A100 with 40 GB of RAM.
 **Model description**
 Model type: A 7B parameter  fine-tuned on synthetic datasets.
 Language(s) (NLP): Primarily Portuguese, but the model is capable to understand English as well
 Finetuned from model: codellama/CodeLlama-7b-Instruct-hf
 **What is Lloro's intended use(s)?**
 Lloro is built for data analysis in Portuguese contexts .
 Input : Text
 Output : Text (Code)
+**V3 Release**
+- Context Lenght increased to 2048.
+- Fine-tuning dataset increased to 74222 examples.
 **Usage**
 Using Transformers
 ```python
 #Import required libraries
 import torch
 )
 #Load Model
+model_name = "semantixai/Lloro"
 base_model = AutoModelForCausalLM.from_pretrained(
         model_name,
         return_dict=True,
     input_ids,
     do_sample=True,
     top_p=0.95,
+    max_new_tokens=2048,
     temperature=0.1,
     )
 ```
 Using an OpenAI compatible inference server (like [vLLM](https://docs.vllm.ai/en/latest/index.html))
 ```python
 from openai import OpenAI
     base_url="http://localhost:8000/v1",
 )
 user_prompt = "Desenvolva um algoritmo em Python para calcular a média e a mediana dos preços de vendas por tipo de material do produto."
+completion = client.chat.completions.create(temperature=0.1,frequency_penalty=0.1,model="semantixai/Lloro",messages=[{"role":"system","content":"Provide answers in Python without explanations, only the code"},{"role":"user","content":user_prompt}])
 ```
 **Params**
 Training Parameters
+| Params                           | Training Data                     | Examples                        | Tokens   | LR     |
+|----------------------------------|-----------------------------------|---------------------------------|----------|--------|
+| 7B                               | Pairs synthetic instructions/code | 74222                           | 9 351 532| 2e-4   |
 **Model Sources**
+Test Dataset Repository: <https://huggingface.co/datasets/semantixai/LloroV3>
+Model Dates: Lloro was trained between February 2024 and April 2024.
 **Performance**
  | Modelo         | LLM as Judge | Code Bleu Score | Rouge-L |  CodeBert- Precision | CodeBert-Recall | CodeBert-F1 | CodeBert-F3 |
 |----------------|--------------|------------------|---------|----------------------|-----------------|-------------|-------------|
+| GPT 3.5        | 94.29%       | 0.3538           | 0.3756  | 0.8099               | 0.8176          | 0.8128      | 0.8164      |
+| Instruct -Base | 88.77%       | 0.3666           | 0.3351  | 0.8244               | 0.8025          | 0.8121      | 0.8052      |
+| Instruct -FT   | 97.95%       | 0.5967           | 0.6717  | 0.9090               | 0.9182          | 0.9131      | 0.9171      |
 **Training Infos:**
 The following hyperparameters were used during training:
+| Parameter                 | Value                    |
+|---------------------------|--------------------------|
+| learning_rate             | 2e-4                     |
+| weight_decay              | 0.0001                   |
+| train_batch_size          | 7                        |
+| eval_batch_size           | 7                        |
+| seed                      | 42                       |
 | optimizer                 | Adam - paged_adamw_32bit |
+| lr_scheduler_type         | cosine                   |
+| lr_scheduler_warmup_ratio | 0.06                     |
+| num_epochs                | 4.0                      |
 **QLoRA hyperparameters**
 The following parameters related with the Quantized Low-Rank Adaptation  and Quantization were used during training:
+| Parameter        | Value     |
+|------------------|-----------|
+| lora_r           | 64        |
+| lora_alpha       | 256       |
+| lora_dropout     | 0.1       |
+| storage_dtype    | "nf4"     |
+| compute_dtype    | "bfloat16"|
 **Experiments**
+| Model                 | Epochs | Overfitting | Final Epochs | Training Hours  | CO2 Emission (Kg) |
+|-----------------------|--------|-------------|--------------|-----------------|-------------------|
+| Code Llama Instruct   | 1      | No          | 1            | 3.01           | 0.43              |
+| Code Llama Instruct   | 4      | Yes         | 3            | 9.25           | 1.32              |
 **Framework versions**
 | Datasets      | 2.14.3    |
 | Pytorch       | 2.0.1     |
 | Tokenizers    | 0.14.1    |
+| Transformers  | 4.34.0    |