Instructions to use DaertML/LLaMA-Turrera-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DaertML/LLaMA-Turrera-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DaertML/LLaMA-Turrera-7B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DaertML/LLaMA-Turrera-7B") model = AutoModelForCausalLM.from_pretrained("DaertML/LLaMA-Turrera-7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DaertML/LLaMA-Turrera-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DaertML/LLaMA-Turrera-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DaertML/LLaMA-Turrera-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/DaertML/LLaMA-Turrera-7B
- SGLang
How to use DaertML/LLaMA-Turrera-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DaertML/LLaMA-Turrera-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DaertML/LLaMA-Turrera-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DaertML/LLaMA-Turrera-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DaertML/LLaMA-Turrera-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use DaertML/LLaMA-Turrera-7B with Docker Model Runner:
docker model run hf.co/DaertML/LLaMA-Turrera-7B
This model has been trained with the "turras" of "El Turrero", which can be found at: https://turrero.vercel.app/
The credits of the "turras": https://twitter.com/Recuenco The credits of the sysadmin of the site: https://twitter.com/k4rliky
With the objective to provide a true fine tuning of an LLM, and get beyond the capabilities of the GPTs, this model has been trained. You can also find a GPT that uses the content of the "turras" to chat with the user at: https://chat.openai.com/g/g-nam1wBUJm-turrero
"La LLaMA Turrera" can produce "turras" in the same manner as "El Turrero" and provide further feedback from other "turras". The model has been trained with a non-profit intent, for fun and to serve as a base for further development.
With the objective to learn about AI alignment, further versions of this model will be trained that attempt to avoid malicious usage from the users.
The model is meant to be used with HuggingFace transformers API in Python, a version for LLaMA.cpp is under evaluation.
- Downloads last month
- 7
