Instructions to use incedo/codegen25-7b-ft-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use incedo/codegen25-7b-ft-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="incedo/codegen25-7b-ft-v1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("incedo/codegen25-7b-ft-v1")
model = AutoModelForCausalLM.from_pretrained("incedo/codegen25-7b-ft-v1")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use incedo/codegen25-7b-ft-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "incedo/codegen25-7b-ft-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "incedo/codegen25-7b-ft-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/incedo/codegen25-7b-ft-v1

SGLang

How to use incedo/codegen25-7b-ft-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "incedo/codegen25-7b-ft-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "incedo/codegen25-7b-ft-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "incedo/codegen25-7b-ft-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "incedo/codegen25-7b-ft-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use incedo/codegen25-7b-ft-v1 with Docker Model Runner:
```
docker model run hf.co/incedo/codegen25-7b-ft-v1
```

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This is a fine-tuned model, trained on 400+ test scripts, written in Java using `Cucumber` and `Selenium` frameworks.

Base model used is Salesforce/Codegen25-7b-multi. The dataset used can be found at shyam-incedoinc/qa-finetune-dataset.

Training metrics can be seen in the metrics section.

Training Parameters

    num_train_epochs=25,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    #save_steps=save_steps,
    logging_steps=25,
    save_strategy="epoch",
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    #max_steps=max_steps,
    group_by_length=False,
    lr_scheduler_type="cosine",
    disable_tqdm=False,
    report_to="tensorboard",
    seed=42
)

LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
)

Run the below code block for getting inferences from this model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

hf_model_repo = "shyam-incedoinc/codegen25-7b-multi-peft-qlora-finetuned-qa"

# Get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_repo)

# Load the model
model = AutoModelForCausalLM.from_pretrained(hf_model_repo, load_in_4bit=True,
                                             torch_dtype=torch.float16,
                                             device_map="auto")

# Load dataset from the hub
hf_data_repo = "shyam-incedoinc/qa-finetune-dataset"
train_dataset = load_dataset(hf_data_repo, split="train")
valid_dataset = load_dataset(hf_data_repo, split="validation")

# Load the sample
sample = valid_dataset[randrange(len(valid_dataset))]['text']
groundtruth = sample.split("### Output:\n")[1]
prompt = sample.split("### Output:\n")[0]+"### Output:\n"

# Generate response
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
outputs = model.generate(input_ids=input_ids, max_new_tokens=1024,
                                do_sample=True, top_p=0.9, temperature=0.6)

# Print the result
print(f"Generated response:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")
print(f"Ground Truth:\n{groundtruth}")

Downloads last month: 3

This is a fine-tuned model, trained on 400+ test scripts, written in Java using Cucumber and Selenium frameworks.

Training Parameters

Run the below code block for getting inferences from this model.

This is a fine-tuned model, trained on 400+ test scripts, written in Java using `Cucumber` and `Selenium` frameworks.