Instructions to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="LSXPrime/ProseFlow-v1-360M-Instruct-GGUF",
	filename="ProseFlow-v1-360M-Instruct-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LSXPrime/ProseFlow-v1-360M-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LSXPrime/ProseFlow-v1-360M-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Ollama
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Ollama:
```
ollama run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF to start chatting

Docker Model Runner
How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use LSXPrime/ProseFlow-v1-360M-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull LSXPrime/ProseFlow-v1-360M-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.ProseFlow-v1-360M-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

ProseFlow-v1-360M-Instruct

ProseFlow-v1-360M-Instruct is a lightweight, experimental instruction-tuned model created for the ProseFlow desktop application. This model is a fine-tune of HuggingFace's SmolLM-360M-Instruct and was created to explore the capabilities of smaller language models on a diverse set of text-processing tasks.

The model was fine-tuned on the ProseFlow-Actions-v1 dataset.

Note: This model is provided for research and experimental purposes and low-resource devices. For the best user experience in the ProseFlow application, the larger and more capable ProseFlow-v1-1.5B-Instruct model is strongly recommended.

Model Description

ProseFlow is a universal AI text processor that allows users to create and execute custom AI "Actions" on text in any application. This model was an experiment to see if a ~360M parameter model could reliably perform the wide range of tasks defined in the training dataset.

Performance and Capabilities

Evaluations show that while this model is extremely fast and has very low resource requirements, its capabilities are limited.

Strengths:

Extremely Lightweight: Can run on devices with very limited RAM and computational power.
Strict Formatting Adherence (sometimes): In some cases where it understands the task, it can follow rigid formatting instructions (like creating a bulleted list) more strictly than its larger counterpart.
Simple Data Extraction: It shows some capability in basic data extraction and formatting tasks, such as creating Markdown tables or extracting contact information.

Provided Files & Quantization Details

This repository provides multiple versions of the model, allowing users to choose the best balance of performance and resource usage for their specific hardware. All quantized versions are provided in the GGUF format for broad compatibility.

File Name (Quantization)	VRAM Usage (Approx.)	Performance	Recommended Use Case
`Q8_0`	~1 GB	Best Overall. Nearly identical to FP16.	The recommended default for most users.
`Q4_K_M`	~900 MB	Low Quality. Noticeable degradation in nuance.	For maximum speed on low-power devices.

Note on Quantization: To maintain the highest possible quality, the token embeddings and the final output layer were kept at F16 precision. Additionally, an importance matrix was used for calibration during the quantization process. This is why the quantized files are larger than what might typically be expected, as this method significantly improves their performance and coherence.

Weaknesses & Limitations:

Poor Reasoning: The model struggles significantly with tasks that require logical reasoning, inference, or multi-step problem-solving. It often fails on word problems and logical puzzles.
Limited Creativity: It is not effective at creative writing tasks like continuing a story or generating novel content. Its outputs are often repetitive or nonsensical.
Instructional Failures: The model frequently violates the "no extra text" rule by adding conversational chatter. In many cases, it fails the task entirely and repeats the input verbatim.
Hallucination: On some tasks (e.g., To Paragraph), the model hallucinates content completely unrelated to the input.
Unreliable for Complex Tasks: It is not suitable for complex tasks like code refactoring, bug finding, or drafting professional business correspondence.

Intended Use

This model is intended for experimental use and for users on extremely resource-constrained systems who are willing to accept a significant trade-off in performance and reliability. It may be suitable for a very limited subset of simple, repetitive text-formatting tasks.

It is designed to be used within the ProseFlow desktop application, but it is not the recommended model for general use.

How to Use in ProseFlow

Download and install the ProseFlow application.
Navigate to the Providers -> Local Provider tab.
Click "Manage Models..." and select the desired version of ProseFlow-v1-360M-Instruct from the "Available for Download" list. We recommend starting with Q8_0.
Once downloaded, select it from the "My Models" list.
Set your "Primary Service Type" in ProseFlow to Local.
Be aware of the limitations described above when executing actions.

Training Details

Base Model: HuggingFaceTB/SmolLM-360M-Instruct
Dataset: LSXPrime/ProseFlow-Actions-v1
Fine-tuning Library: Unsloth
Fine-tuning Method: Supervised fine-tuning using LoRA on a dataset of structured instruction-input-output triplets.

License

This model is licensed under the Apache License, Version 2.0.

Downloads last month: 1

GGUF

Model size

0.4B params

Architecture

llama

Hardware compatibility

4-bit

8-bit

View +1 variant

Model tree for LSXPrime/ProseFlow-v1-360M-Instruct-GGUF

Base model

HuggingFaceTB/SmolLM-360M

Quantized

HuggingFaceTB/SmolLM-360M-Instruct

Finetuned

LSXPrime/ProseFlow-v1-360M-Instruct