Instructions to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF", dtype="auto") - llama-cpp-python
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF", filename="SpydazWeb_AI_Qstar_Nemo_BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16 # Run inference directly in the terminal: ./llama-cli -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
Use Docker
docker model run hf.co/LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
- LM Studio
- Jan
- Ollama
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with Ollama:
ollama run hf.co/LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
- Unsloth Studio new
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF to start chatting
- Docker Model Runner
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with Docker Model Runner:
docker model run hf.co/LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
- Lemonade
How to use LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LeroyDyer/LCARS_AI_QstaR_Nemo_GGUF:BF16
Run and chat with the model
lemonade run user.LCARS_AI_QstaR_Nemo_GGUF-BF16
List all available models
lemonade list
Spydaz WEB AI
Model Architecture
Mistral Nemo is a transformer model, with the following architecture choices:
- Layers: 40
- Dim: 5,120
- Head dim: 128
- Hidden dim: 14,436
- Activation Function: SwiGLU
- Number of heads: 32
- Number of kv-heads: 8 (GQA)
- Vocabulary size: 2**17 ~= 128k
- Rotary embeddings (theta = 1M)
- Developed by: LeroyDyer
- License: apache-2.0
- Finetuned from model : unsloth/mistral-nemo-instruct-2407-bnb-4bit
Introduction :
STAR REASONERS !
this provides a platform for the model to commuicate pre-response , so an internal objective can be set ie adding an extra planning stage to the model improving its focus and output: the thought head can be charged with a thought or methodolgy, such as a ststing to take a step by step approach to the problem or to make an object oriented model first and consider the use cases before creating an output: so each thought head can be dedicated to specific ppurpose such as Planning or artifact generation or use case design : or even deciding which methodology should be applied before planning the potential solve route for the response : Another head could also be dedicated to retrieving content based on the query from the self which can also be used in the pregenerations stages : all pre- reasoners can be seen to be Self Guiding ! essentially removing the requirement to give the model a system prompt instead aligning the heads to a thoght pathways ! these chains produce data which can be considered to be thoughts : and can further be displayed by framing these thoughts with thought tokens : even allowing for editors comments giving key guidance to the model during training : these thoughts will be used in future genrations assisting the model as well a displaying explantory informations in the output :
these tokens can be displayed or with held also a setting in the model !
can this be applied in other areas ?
Yes! , we can use this type of method to allow for the model to generate code in another channel or head potentially creating a head to produce artifacts for every output , or to produce entity lilsts for every output and framing the outputs in thier relative code tags or function call tags : these can also be displayed or hidden for the response . but these can also be used in problem solvibng tasks internally , which again enables for the model to simualte the inpouts and outputs from an interpretor ! it may even be prudent to include a function executing internally to the model ! ( allowing the model to execute functions in the background! before responding ) as well this oul hae tpo also be specified in the config , as autoexecute or not !.
AI AGI ?
so yes we can see we are not far from an ai which can evolve : an advance general inteligent system ( still non sentient by the way )
Conclusion
the resonaer methodology , might be seen to be the way forwards , adding internal funciton laity to the models instead of external connectivity enables for faster and seemless model usage : as well as enriched and informed responses , as even outputs could essentially be cleanss and formated before being presented to the Calling interface, internally to the model : the take away is that arre we seeing the decoder/encoder model as simple a function of the inteligence which in truth need to be autonomus ! ie internal functions and tools as well as disk interaction : an agent must have awareness and control over its environment with sensors and actuators : as a fuction callingmodel it has actuators and canread the directorys it has sensors ... its a start: as we can eget media in and out , but the model needs to get its own control to inpout and output also !
Fine tuning : agin this issue of fine tuning : the disussion above eplains the requirement to control the environment from within the moel ( with constraints ) does this eliminate theneed to fine tune a model ! in fact it should as this give transparency to ther growth ofthe model and if the model fine tuned itself we would be in danger of a model evolveing ! hence an AGI !
LOAD MODEL
! git clone https://github.com/huggingface/transformers.git
## copy modeling_mistral.py and configuartion.py to the Transformers foler / Src/models/mistral and overwrite the existing files first:
## THEN :
!cd transformers
!pip install ./transformers
then restaet the environment: the model can then load without trust-remote and WILL work FINE ! it can even be trained : hence the 4 bit optimised version ::
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/_Spydaz_Web_AI_MistralStar_V2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/_Spydaz_Web_AI_MistralStar_V2", trust_remote_code=True)
model.tokenizer = tokenizer
- Downloads last month
- 121
4-bit
16-bit