Instructions to use VaidikML0508/Access-Me-Instruct-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use VaidikML0508/Access-Me-Instruct-V2 with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("VaidikML0508/Access-Me-Instruct-V2", dtype="auto")

llama-cpp-python

How to use VaidikML0508/Access-Me-Instruct-V2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="VaidikML0508/Access-Me-Instruct-V2",
	filename="Vaidik_access_me_V2.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use VaidikML0508/Access-Me-Instruct-V2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VaidikML0508/Access-Me-Instruct-V2
# Run inference directly in the terminal:
llama-cli -hf VaidikML0508/Access-Me-Instruct-V2

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VaidikML0508/Access-Me-Instruct-V2
# Run inference directly in the terminal:
llama-cli -hf VaidikML0508/Access-Me-Instruct-V2

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf VaidikML0508/Access-Me-Instruct-V2
# Run inference directly in the terminal:
./llama-cli -hf VaidikML0508/Access-Me-Instruct-V2

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf VaidikML0508/Access-Me-Instruct-V2
# Run inference directly in the terminal:
./build/bin/llama-cli -hf VaidikML0508/Access-Me-Instruct-V2

Use Docker

docker model run hf.co/VaidikML0508/Access-Me-Instruct-V2

LM Studio
Jan
Ollama
How to use VaidikML0508/Access-Me-Instruct-V2 with Ollama:
```
ollama run hf.co/VaidikML0508/Access-Me-Instruct-V2
```

Unsloth Studio new

How to use VaidikML0508/Access-Me-Instruct-V2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VaidikML0508/Access-Me-Instruct-V2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VaidikML0508/Access-Me-Instruct-V2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for VaidikML0508/Access-Me-Instruct-V2 to start chatting

Pi new

How to use VaidikML0508/Access-Me-Instruct-V2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VaidikML0508/Access-Me-Instruct-V2

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "VaidikML0508/Access-Me-Instruct-V2"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use VaidikML0508/Access-Me-Instruct-V2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VaidikML0508/Access-Me-Instruct-V2

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default VaidikML0508/Access-Me-Instruct-V2

Run Hermes

hermes

Docker Model Runner
How to use VaidikML0508/Access-Me-Instruct-V2 with Docker Model Runner:
```
docker model run hf.co/VaidikML0508/Access-Me-Instruct-V2
```

Lemonade

How to use VaidikML0508/Access-Me-Instruct-V2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull VaidikML0508/Access-Me-Instruct-V2

Run and chat with the model

lemonade run user.Access-Me-Instruct-V2-{{QUANT_TAG}}

List all available models

lemonade list

VaidikML0508 commited on Jan 7, 2025

Commit

582a000

verified ·

1 Parent(s): 1aed14d

Delete handler.py

Browse files

Files changed (1) hide show

handler.py +0 -102

handler.py DELETED Viewed

@@ -1,102 +0,0 @@
-from typing import Dict, List, Any
-import torch
-from unsloth import FastLanguageModel
-import os
-class EndpointHandler:
-    def __init__(self, path=""):
-        # Model configuration
-        self.max_seq_length = 8192
-        self.load_in_4bit = True
-        self.dtype = None  # Auto detection
-        # Print the CUDA version
-        print(f"CUDA version: {torch.version.cuda}")
-        # Load model and tokenizer
-        self.model_id = "VaidikML0508/Access-Me-Instruct-V2"
-        self.model, self.tokenizer = FastLanguageModel.from_pretrained(
-            model_name=self.model_id,
-            max_seq_length=self.max_seq_length,
-            dtype=self.dtype,
-            load_in_4bit=self.load_in_4bit,
-            token=os.environ['HF_KEY']  # Replace with actual token if needed
-        )
-        # Prepare model for inference
-        FastLanguageModel.for_inference(self.model)
-        # Define prompt template
-        self.prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
-{}<|eot_id|><|start_header_id|>user<|end_header_id|>
-{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
-{}<|eot_id|>"""
-    def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
-        """
-        Handle inference request
-        :param data: Dictionary containing 'system_instruction', 'question', and optional parameters
-        :return: Dictionary containing generated response
-        """
-        # Extract inputs
-        system_instruction = data.pop("system_instruction", "You are a helpful AI assistant.")
-        question = data.pop("question", None)
-        # Check if question is provided
-        if question is None:
-            return {"error": "Please provide a question."}
-        # Extract generation parameters
-        max_new_tokens = data.pop("max_new_tokens", 200)
-        use_cache = data.pop("use_cache", True)
-        try:
-            # Prepare input prompt
-            formatted_prompt = self.prompt_template.format(
-                system_instruction,
-                question,
-                ""  # Empty output for generation
-            )
-            # Tokenize input
-            inputs = self.tokenizer(
-                [formatted_prompt],
-                return_tensors="pt"
-            ).to("cuda")
-            # Generate response
-            outputs = self.model.generate(
-                **inputs,
-                max_new_tokens=max_new_tokens,
-                use_cache=use_cache
-            )
-            # Decode output
-            generated_text = self.tokenizer.batch_decode(outputs)[0]
-            # print(generated_text)
-            # Extract the assistant's response
-            # Find the last assistant section in the generated text
-            assistant_parts = generated_text.split("<|start_header_id|>assistant<|end_header_id|>")
-            if len(assistant_parts) > 1:
-                response = assistant_parts[-1].replace('<|eot_id|>', "").strip(" \n")
-            else:
-                response = generated_text
-            return {
-                "generated_text": response,
-                "full_prompt": formatted_prompt
-            }
-        except Exception as e:
-            return {
-                "error": f"Generation failed: {str(e)}",
-                "full_prompt": formatted_prompt if 'formatted_prompt' in locals() else None
-            }
-    @staticmethod
-    def check_cuda():
-        """
-        Verify CUDA availability
-        """
-        if not torch.cuda.is_available():
-            raise ValueError("CUDA is required for this model")