Instructions to use dinalt/walsh-1-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dinalt/walsh-1-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dinalt/walsh-1-7b", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("dinalt/walsh-1-7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dinalt/walsh-1-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dinalt/walsh-1-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dinalt/walsh-1-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dinalt/walsh-1-7b
- SGLang
How to use dinalt/walsh-1-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dinalt/walsh-1-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dinalt/walsh-1-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dinalt/walsh-1-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dinalt/walsh-1-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dinalt/walsh-1-7b with Docker Model Runner:
docker model run hf.co/dinalt/walsh-1-7b
Hadamard-Walsh 1.7B is an experimental model using a new positional encoder. The encoder represents absolute positions by using a combination of rows from the Hadamard-Walsh matrix (https://en.wikipedia.org/wiki/Hadamard_code). Each row corresponds to a binary digit is the positional code, where the presence of a row codes for a 1 and the absence, a zero. While training, the base offset in the sequence is randomly chosen for each batch. The result is that the model is very proficient at sequences much longer than those seen in training.
The encoding scheme was devised when I was performing experiments to determine the degree to which various positional encodings schemes interfere with the information carrying capacity of embeddings. This particular scheme did exceptionally well. It was both highly resilient to interference as well as minimally interfering with the embeddings. As a follow-on experiment, I adapted the encoder to work with my transformer implementation and found that it performs exceptionally well when directly compared with other popular positional encoding schemes.
The model has had approximately three weeks of pretraining on six RTX4090s. It seems to be doing remarkably well when compared with my other model of similar size and training, but using ALiBi positional encodings and SwiGLU. I have also noted an unusual loss pattern, where evaluation loss shows large punctuated drops. I can only speculate, but my suspicion is that the random offset of the positional encoder may have a regularizing effect on training. The attention patterns are also quite "interesting." [TODO: add images of attention patterns].
Model Details:
- Model Dimension: 2048
- Hidden Layers: 32
- Attention Heads: 32
- Feedforward Dimension: 8192
- Feedforward Network Type: Conventional MLP with GeLU activation
- Vocabulary Size: 32000
- Max Sequence Length: 16K (14-bit absolute positional encoding via Walsh matrix)
- Weight Initialization: DeepNet, https://arxiv.org/abs/2203.00555
- Pretraining Datasets: RedPajama-Data-1T, mostly "books" and some Wikipedia.
Loading:
The model implementation is all my own, so you will need to use "trust_remote_code" to load the model.
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
)
model_id = "dinalt/walsh-1-7b"
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
# flash_attention_2 requires bfloat16 or float16
torch_dtype=torch.bfloat16,
# One of ["flash_attention_2", "sdpa", "eager"]
attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
The model has been tested with text-generation-webui, which needs to be started with the "--trust-remote-code" flag.
- Downloads last month
- 4