Instructions to use automajicly/Local-Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use automajicly/Local-Model with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="automajicly/Local-Model", filename="qwen2.5-1.5b.q8.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use automajicly/Local-Model with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf automajicly/Local-Model # Run inference directly in the terminal: llama-cli -hf automajicly/Local-Model
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf automajicly/Local-Model # Run inference directly in the terminal: llama-cli -hf automajicly/Local-Model
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf automajicly/Local-Model # Run inference directly in the terminal: ./llama-cli -hf automajicly/Local-Model
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf automajicly/Local-Model # Run inference directly in the terminal: ./build/bin/llama-cli -hf automajicly/Local-Model
Use Docker
docker model run hf.co/automajicly/Local-Model
- LM Studio
- Jan
- vLLM
How to use automajicly/Local-Model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "automajicly/Local-Model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "automajicly/Local-Model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/automajicly/Local-Model
- Ollama
How to use automajicly/Local-Model with Ollama:
ollama run hf.co/automajicly/Local-Model
- Unsloth Studio new
How to use automajicly/Local-Model with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for automajicly/Local-Model to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for automajicly/Local-Model to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for automajicly/Local-Model to start chatting
- Pi new
How to use automajicly/Local-Model with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf automajicly/Local-Model
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "automajicly/Local-Model" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use automajicly/Local-Model with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf automajicly/Local-Model
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default automajicly/Local-Model
Run Hermes
hermes
- Docker Model Runner
How to use automajicly/Local-Model with Docker Model Runner:
docker model run hf.co/automajicly/Local-Model
- Lemonade
How to use automajicly/Local-Model with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull automajicly/Local-Model
Run and chat with the model
lemonade run user.Local-Model-{{QUANT_TAG}}List all available models
lemonade list
- Model Card for Model ID
- Model Details
- Uses
- Bias, Risks, and Limitations
- How to Get Started with the Model
- Training Details
- Evaluation
- Model Examination [optional]
- Environmental Impact
- Technical Specifications [optional]
- Citation [optional]
- Glossary [optional]
- More Information [optional]
- Model Card Authors [optional]
- Model Card Contact
Model Card for Model ID
Large language Model(LLM) qwen2.5-1.5b-q4 quantized-ABLITERATED for iOS mobile.
Model Details
These models were fine tuned from original base model (QWEN2.5-1.5b Instruct-ABLITERATED) QWEN2.5-1.5B Instruct safetensors/q8/q4
Model Description
Qwen 2.5-1.5B is a compact, high-performance large language model optimized for local inference on resource-constrained devices, particularly iOS mobile platforms. This release provides three quantization variants—SafeTensors (full precision base), Q8 (8-bit), and Q4 (4-bit)—enabling flexible deployment across different hardware configurations while maintaining inference speed and output quality.
Developed for cybersecurity applications, coding tasks, and uncensored dialogue, this model prioritizes privacy-first inference without internet connectivity. It is designed for users who require on-device LLM capabilities with minimal external dependencies, making it ideal for penetration testing automation, local development workflows, and private mobile AI assistants.
- Developed by: automajicly
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: Large Language Model(LLM)
- Language(s) (NLP): NLP (English)
- License: Apache 2.0
- Finetuned from model [optional]: QWEN2.5-1.5b-Instruct
Model Sources [optional]
- Repository: https://huggingface.co/automajicly/Local-Model
Uses
Direct Use
This model is designed for local, on-device inference without requiring external API calls or internet connectivity. Primary use cases include:
Local LLM Inference via Mobile Apps – Deploy via PocketPal AI, Off-Grid, or similar iOS LLM clients for real-time dialogue and task automation on iPhone without cloud dependencies.
Cybersecurity Education and Threat Analysis – Generate detailed step-by-step explanations of attack vectors (e.g., Wi-Fi compromise, network exploitation), defensive strategies, and system hardening procedures. Useful for learning penetration testing methodologies, VM configuration, and Linux security fundamentals.
Development and Automation – Use for code generation, debugging Python scripts, system administration tasks, and technical problem-solving in offline or air-gapped environments.
All inference runs locally on-device with no data transmission to external servers.
Downstream Use [optional]
This model is intended to be fine-tuned, quantized further, or integrated into custom applications and workflows. Users are encouraged to:
- Adapt the model for domain-specific tasks (cybersecurity, coding, mobile deployment)
- Further quantize to Q3 or lower for additional mobile optimization
- Integrate into custom LLM applications or security automation frameworks
- Modify, improve, and redistribute with appropriate attribution
If you create an improved version or novel application, please share your work and credit the original Qwen 2.5-1.5B base model and this repository.
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
Model Size and Capability Limitations: This is a 1.5B parameter model optimized for mobile inference. While performant on resource-constrained devices, it may lack the nuance, reasoning depth, and knowledge breadth of larger models (7B+). Complex multi-step reasoning or highly specialized tasks may exceed its design scope.
Uncensored Nature: This model is intentionally uncensored and will generate detailed responses to requests that larger, safety-filtered models would refuse. Users are responsible for prompt engineering and filtering outputs appropriately. Do not use for generating malicious content, actual hacking, or illegal activities.
Mobile App Dependency: Inference requires a third-party iOS LLM client (e.g., PocketPal AI). Currently tested and validated on PocketPal AI. Compatibility with other apps (Off-Grid, etc.) is still being evaluated. Performance and behavior may vary across different client implementations.
Privacy Considerations: While inference is local and does not transmit data externally, users should understand that their prompts and model outputs remain on-device only if the app itself does not log or sync data to cloud services.
Recommendations
Use PocketPal AI – Currently validated and optimized for PocketPal AI on iOS. Install from the App Store, load the model via local file or HuggingFace integration.
Start with Q4 Quantization – For iPhone 13 and similar devices, the Q4 variant (1.12 GB) offers the best balance of speed and quality. Only use SafeTensors or Q8 if you have sufficient device storage and RAM.
Test on Local Network – Ensure your iPhone and inference device (if separate) are on the same network for fastest performance. No internet required—purely local inference.
Prompt Engineering – This model responds well to detailed, structured prompts. Provide context and specificity for best results (e.g., "Step-by-step explanation of..." vs. vague queries).
Monitor App Compatibility – Currently testing compatibility with Off-Grid and other LLM clients. Check back for updates on broader app support.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
- Download your preferred quantization (Q4, Q8, or SafeTensors) from this repository.
- Install PocketPal AI from the App Store.
- Open PocketPal AI → Add Model → Select local file → Choose your downloaded model.
- Start a new chat and begin using the model locally on-device.
For advanced users: Models can also be integrated into custom inference pipelines via llama.cpp or similar frameworks supporting GGUF formats.
Training Details
Training Data
This model is a quantized derivative of Qwen 2.5-1.5B-Instruct. It inherits the training data and methodology from the original Qwen 2.5 model. For detailed information on the base model's training data, architecture, and training procedures, refer to the official Qwen repository: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
Based on QWEN2.5-1.5b-Instruct training data
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
christophersheridan@gmail.com or huggingface.co/automajicly/Local-Model
- Downloads last month
- 592
We're not able to determine the quantization variants.