Instructions to use incedo/codegen25-7b-ft-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use incedo/codegen25-7b-ft-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="incedo/codegen25-7b-ft-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("incedo/codegen25-7b-ft-v1") model = AutoModelForCausalLM.from_pretrained("incedo/codegen25-7b-ft-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use incedo/codegen25-7b-ft-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "incedo/codegen25-7b-ft-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "incedo/codegen25-7b-ft-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/incedo/codegen25-7b-ft-v1
- SGLang
How to use incedo/codegen25-7b-ft-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "incedo/codegen25-7b-ft-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "incedo/codegen25-7b-ft-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "incedo/codegen25-7b-ft-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "incedo/codegen25-7b-ft-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use incedo/codegen25-7b-ft-v1 with Docker Model Runner:
docker model run hf.co/incedo/codegen25-7b-ft-v1
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This is a fine-tuned model, trained on 400+ test scripts, written in Java using Cucumber and Selenium frameworks.
Base model used is Salesforce/Codegen25-7b-multi. The dataset used can be found at shyam-incedoinc/qa-finetune-dataset.
Training metrics can be seen in the metrics section.
Training Parameters
num_train_epochs=25,
per_device_train_batch_size=2,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
optim="paged_adamw_32bit",
#save_steps=save_steps,
logging_steps=25,
save_strategy="epoch",
learning_rate=2e-4,
weight_decay=0.001,
fp16=True,
bf16=False,
max_grad_norm=0.3,
warmup_ratio=0.03,
#max_steps=max_steps,
group_by_length=False,
lr_scheduler_type="cosine",
disable_tqdm=False,
report_to="tensorboard",
seed=42
)
LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
)
Run the below code block for getting inferences from this model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
hf_model_repo = "shyam-incedoinc/codegen25-7b-multi-peft-qlora-finetuned-qa"
# Get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_repo)
# Load the model
model = AutoModelForCausalLM.from_pretrained(hf_model_repo, load_in_4bit=True,
torch_dtype=torch.float16,
device_map="auto")
# Load dataset from the hub
hf_data_repo = "shyam-incedoinc/qa-finetune-dataset"
train_dataset = load_dataset(hf_data_repo, split="train")
valid_dataset = load_dataset(hf_data_repo, split="validation")
# Load the sample
sample = valid_dataset[randrange(len(valid_dataset))]['text']
groundtruth = sample.split("### Output:\n")[1]
prompt = sample.split("### Output:\n")[0]+"### Output:\n"
# Generate response
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
outputs = model.generate(input_ids=input_ids, max_new_tokens=1024,
do_sample=True, top_p=0.9, temperature=0.6)
# Print the result
print(f"Generated response:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")
print(f"Ground Truth:\n{groundtruth}")
- Downloads last month
- 3