SebastianBodza/Ger_WizardLM_evol_instruct_70k_V0
Viewer • Updated • 70k • 17 • 1
How to use SebastianBodza/DElefant with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="SebastianBodza/DElefant") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SebastianBodza/DElefant")
model = AutoModelForCausalLM.from_pretrained("SebastianBodza/DElefant")How to use SebastianBodza/DElefant with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SebastianBodza/DElefant"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "SebastianBodza/DElefant",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/SebastianBodza/DElefant
How to use SebastianBodza/DElefant with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "SebastianBodza/DElefant" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "SebastianBodza/DElefant",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "SebastianBodza/DElefant" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "SebastianBodza/DElefant",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use SebastianBodza/DElefant with Docker Model Runner:
docker model run hf.co/SebastianBodza/DElefant
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SebastianBodza/DElefant")
model = AutoModelForCausalLM.from_pretrained("SebastianBodza/DElefant")
DElefant is a LLM developed for instruction tuned German interactions. This version is built on top of the adapted BLOOM version from Malte Ostendorff with a opus-mt translated and afterwards filtered WizardLM dataset. The evolved dataset led to SOTA english LLMs and we hope by incoperating the dataset to a german base model we can leverage the capabilities for various tasks including Code generation.
Due to limitation in translation, the comments inside of the code blocks remained english, however the Coding was kept in working condition.
Full-Finetuning of the German-BLOOM model on an RTX 3090 with the translated WizardLM Dataset.
If there is sufficient demand, additional adjustments can be made:
Prompt-Template:
{instruction}\n\n### Response:
Code example for inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SebastianBodza/DElefant")
model = AutoModelForCausalLM.from_pretrained("SebastianBodza/DElefant", device_map="auto")
frage = "Wie heißt der Bundeskanzler?"
prompt = f"{frage}\n\n### Response:"
txt = tokenizer(prompt, return_tensors="pt").to("cuda")
txt = model.generate(**txt,
max_new_tokens=256,
eos_token_id=tokenizer.eos_token_id)
tokenizer.decode(txt[0], skip_special_tokens=True)
Training was based on Llama-X with the adaptions of WizardLMs training script.
deepspeed Llama-X/src/train_freeform.py \
--model_name_or_path malteos/bloom-6b4-clp-german \
--data_path ger_alpaca_evol_instruct_70k_e.json \
--output_dir ./full_finetune \
--num_train_epochs 2 \
--model_max_length 2048 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 400 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed deepspeed.json \
--bf16 True
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SebastianBodza/DElefant")