Instructions to use automajicly/Local-Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use automajicly/Local-Model with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="automajicly/Local-Model",
	filename="qwen2.5-1.5b.q8.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use automajicly/Local-Model with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf automajicly/Local-Model
# Run inference directly in the terminal:
llama-cli -hf automajicly/Local-Model

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf automajicly/Local-Model
# Run inference directly in the terminal:
llama-cli -hf automajicly/Local-Model

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf automajicly/Local-Model
# Run inference directly in the terminal:
./llama-cli -hf automajicly/Local-Model

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf automajicly/Local-Model
# Run inference directly in the terminal:
./build/bin/llama-cli -hf automajicly/Local-Model

Use Docker

docker model run hf.co/automajicly/Local-Model

LM Studio
Jan

vLLM

How to use automajicly/Local-Model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "automajicly/Local-Model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "automajicly/Local-Model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/automajicly/Local-Model

Ollama
How to use automajicly/Local-Model with Ollama:
```
ollama run hf.co/automajicly/Local-Model
```

Unsloth Studio new

How to use automajicly/Local-Model with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for automajicly/Local-Model to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for automajicly/Local-Model to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for automajicly/Local-Model to start chatting

Pi new

How to use automajicly/Local-Model with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf automajicly/Local-Model

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "automajicly/Local-Model"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use automajicly/Local-Model with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf automajicly/Local-Model

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default automajicly/Local-Model

Run Hermes

hermes

Docker Model Runner
How to use automajicly/Local-Model with Docker Model Runner:
```
docker model run hf.co/automajicly/Local-Model
```

Lemonade

How to use automajicly/Local-Model with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull automajicly/Local-Model

Run and chat with the model

lemonade run user.Local-Model-{{QUANT_TAG}}

List all available models

lemonade list

Model Card for Model ID

Large language Model(LLM) qwen2.5-1.5b-q4 quantized-ABLITERATED for iOS mobile.

Model Details

These models were fine tuned from original base model (QWEN2.5-1.5b Instruct-ABLITERATED) QWEN2.5-1.5B Instruct safetensors/q8/q4

Model Description

Qwen 2.5-1.5B is a compact, high-performance large language model optimized for local inference on resource-constrained devices, particularly iOS mobile platforms. This release provides three quantization variants—SafeTensors (full precision base), Q8 (8-bit), and Q4 (4-bit)—enabling flexible deployment across different hardware configurations while maintaining inference speed and output quality.

Developed for cybersecurity applications, coding tasks, and uncensored dialogue, this model prioritizes privacy-first inference without internet connectivity. It is designed for users who require on-device LLM capabilities with minimal external dependencies, making it ideal for penetration testing automation, local development workflows, and private mobile AI assistants.

Developed by: automajicly
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: Large Language Model(LLM)
Language(s) (NLP): NLP (English)
License: Apache 2.0
Finetuned from model [optional]: QWEN2.5-1.5b-Instruct

Model Sources [optional]

Repository: https://huggingface.co/automajicly/Local-Model

Uses

Direct Use

This model is designed for local, on-device inference without requiring external API calls or internet connectivity. Primary use cases include:

Local LLM Inference via Mobile Apps – Deploy via PocketPal AI, Off-Grid, or similar iOS LLM clients for real-time dialogue and task automation on iPhone without cloud dependencies.
Cybersecurity Education and Threat Analysis – Generate detailed step-by-step explanations of attack vectors (e.g., Wi-Fi compromise, network exploitation), defensive strategies, and system hardening procedures. Useful for learning penetration testing methodologies, VM configuration, and Linux security fundamentals.
Development and Automation – Use for code generation, debugging Python scripts, system administration tasks, and technical problem-solving in offline or air-gapped environments.

All inference runs locally on-device with no data transmission to external servers.

Downstream Use [optional]

This model is intended to be fine-tuned, quantized further, or integrated into custom applications and workflows. Users are encouraged to:

Adapt the model for domain-specific tasks (cybersecurity, coding, mobile deployment)
Further quantize to Q3 or lower for additional mobile optimization
Integrate into custom LLM applications or security automation frameworks
Modify, improve, and redistribute with appropriate attribution

If you create an improved version or novel application, please share your work and credit the original Qwen 2.5-1.5B base model and this repository.

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

Model Size and Capability Limitations: This is a 1.5B parameter model optimized for mobile inference. While performant on resource-constrained devices, it may lack the nuance, reasoning depth, and knowledge breadth of larger models (7B+). Complex multi-step reasoning or highly specialized tasks may exceed its design scope.

Uncensored Nature: This model is intentionally uncensored and will generate detailed responses to requests that larger, safety-filtered models would refuse. Users are responsible for prompt engineering and filtering outputs appropriately. Do not use for generating malicious content, actual hacking, or illegal activities.

Mobile App Dependency: Inference requires a third-party iOS LLM client (e.g., PocketPal AI). Currently tested and validated on PocketPal AI. Compatibility with other apps (Off-Grid, etc.) is still being evaluated. Performance and behavior may vary across different client implementations.

Privacy Considerations: While inference is local and does not transmit data externally, users should understand that their prompts and model outputs remain on-device only if the app itself does not log or sync data to cloud services.

Recommendations

Use PocketPal AI – Currently validated and optimized for PocketPal AI on iOS. Install from the App Store, load the model via local file or HuggingFace integration.
Start with Q4 Quantization – For iPhone 13 and similar devices, the Q4 variant (1.12 GB) offers the best balance of speed and quality. Only use SafeTensors or Q8 if you have sufficient device storage and RAM.
Test on Local Network – Ensure your iPhone and inference device (if separate) are on the same network for fastest performance. No internet required—purely local inference.
Prompt Engineering – This model responds well to detailed, structured prompts. Provide context and specificity for best results (e.g., "Step-by-step explanation of..." vs. vague queries).
Monitor App Compatibility – Currently testing compatibility with Off-Grid and other LLM clients. Check back for updates on broader app support.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Download your preferred quantization (Q4, Q8, or SafeTensors) from this repository.
Install PocketPal AI from the App Store.
Open PocketPal AI → Add Model → Select local file → Choose your downloaded model.
Start a new chat and begin using the model locally on-device.

For advanced users: Models can also be integrated into custom inference pipelines via llama.cpp or similar frameworks supporting GGUF formats.

Training Details

Training Data

This model is a quantized derivative of Qwen 2.5-1.5B-Instruct. It inherits the training data and methodology from the original Qwen 2.5 model. For detailed information on the base model's training data, architecture, and training procedures, refer to the official Qwen repository: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Based on QWEN2.5-1.5b-Instruct training data

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

christophersheridan@gmail.com or huggingface.co/automajicly/Local-Model

Downloads last month: 592

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for automajicly/Local-Model

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Quantized

(181)

this model

Paper for automajicly/Local-Model

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 45