Instructions to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LSXPrime/ProseFlow-v1-360M-Instruct-GGUF", filename="ProseFlow-v1-360M-Instruct-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LSXPrime/ProseFlow-v1-360M-Instruct-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LSXPrime/ProseFlow-v1-360M-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
- Ollama
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Ollama:
ollama run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
- Unsloth Studio new
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting
- Docker Model Runner
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Docker Model Runner:
docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
- Lemonade
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.ProseFlow-v1-360M-Instruct-GGUF-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_MUse Docker
docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_MProseFlow-v1-360M-Instruct
ProseFlow-v1-360M-Instruct is a lightweight, experimental instruction-tuned model created for the ProseFlow desktop application. This model is a fine-tune of HuggingFace's SmolLM-360M-Instruct and was created to explore the capabilities of smaller language models on a diverse set of text-processing tasks.
The model was fine-tuned on the ProseFlow-Actions-v1 dataset.
Note: This model is provided for research and experimental purposes and low-resource devices. For the best user
experience in the ProseFlow application, the larger and more capable
ProseFlow-v1-1.5B-Instruct model is strongly recommended.
Model Description
ProseFlow is a universal AI text processor that allows users to create and execute custom AI "Actions" on text in any application. This model was an experiment to see if a ~360M parameter model could reliably perform the wide range of tasks defined in the training dataset.
Performance and Capabilities
Evaluations show that while this model is extremely fast and has very low resource requirements, its capabilities are limited.
Strengths:
- Extremely Lightweight: Can run on devices with very limited RAM and computational power.
- Strict Formatting Adherence (sometimes): In some cases where it understands the task, it can follow rigid formatting instructions (like creating a bulleted list) more strictly than its larger counterpart.
- Simple Data Extraction: It shows some capability in basic data extraction and formatting tasks, such as creating Markdown tables or extracting contact information.
Provided Files & Quantization Details
This repository provides multiple versions of the model, allowing users to choose the best balance of performance and resource usage for their specific hardware. All quantized versions are provided in the GGUF format for broad compatibility.
| File Name (Quantization) | VRAM Usage (Approx.) | Performance | Recommended Use Case |
|---|---|---|---|
Q8_0 |
~1 GB | Best Overall. Nearly identical to FP16. | The recommended default for most users. |
Q4_K_M |
~900 MB | Low Quality. Noticeable degradation in nuance. | For maximum speed on low-power devices. |
Note on Quantization: To maintain the highest possible quality, the token embeddings and the final output layer were kept at F16 precision. Additionally, an importance matrix was used for calibration during the quantization process. This is why the quantized files are larger than what might typically be expected, as this method significantly improves their performance and coherence.
Weaknesses & Limitations:
- Poor Reasoning: The model struggles significantly with tasks that require logical reasoning, inference, or multi-step problem-solving. It often fails on word problems and logical puzzles.
- Limited Creativity: It is not effective at creative writing tasks like continuing a story or generating novel content. Its outputs are often repetitive or nonsensical.
- Instructional Failures: The model frequently violates the "no extra text" rule by adding conversational chatter. In many cases, it fails the task entirely and repeats the input verbatim.
- Hallucination: On some tasks (e.g.,
To Paragraph), the model hallucinates content completely unrelated to the input. - Unreliable for Complex Tasks: It is not suitable for complex tasks like code refactoring, bug finding, or drafting professional business correspondence.
Intended Use
This model is intended for experimental use and for users on extremely resource-constrained systems who are willing to accept a significant trade-off in performance and reliability. It may be suitable for a very limited subset of simple, repetitive text-formatting tasks.
It is designed to be used within the ProseFlow desktop application, but it is not the recommended model for general use.
How to Use in ProseFlow
- Download and install the ProseFlow application.
- Navigate to the Providers -> Local Provider tab.
- Click "Manage Models..." and select the desired version of
ProseFlow-v1-360M-Instructfrom the "Available for Download" list. We recommend starting withQ8_0. - Once downloaded, select it from the "My Models" list.
- Set your "Primary Service Type" in ProseFlow to Local.
- Be aware of the limitations described above when executing actions.
Training Details
- Base Model: HuggingFaceTB/SmolLM-360M-Instruct
- Dataset: LSXPrime/ProseFlow-Actions-v1
- Fine-tuning Library: Unsloth
- Fine-tuning Method: Supervised fine-tuning using LoRA on a dataset of structured instruction-input-output triplets.
License
This model is licensed under the Apache License, Version 2.0.
- Downloads last month
- 1
Model tree for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF
Base model
HuggingFaceTB/SmolLM-360M
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M# Run inference directly in the terminal: llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M