Text Generation
GGUF
English
Italian
question-answering
articles
change management
qwen3.5
cpu-compatible
local-inference
faiss
qdrant
conversational
knowledge-base
Instructions to use robertolofaro/articles-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use robertolofaro/articles-model with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="robertolofaro/articles-model", filename="articles-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use robertolofaro/articles-model with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf robertolofaro/articles-model:Q4_K_M # Run inference directly in the terminal: llama-cli -hf robertolofaro/articles-model:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf robertolofaro/articles-model:Q4_K_M # Run inference directly in the terminal: llama-cli -hf robertolofaro/articles-model:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf robertolofaro/articles-model:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf robertolofaro/articles-model:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf robertolofaro/articles-model:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf robertolofaro/articles-model:Q4_K_M
Use Docker
docker model run hf.co/robertolofaro/articles-model:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use robertolofaro/articles-model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "robertolofaro/articles-model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "robertolofaro/articles-model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/robertolofaro/articles-model:Q4_K_M
- Ollama
How to use robertolofaro/articles-model with Ollama:
ollama run hf.co/robertolofaro/articles-model:Q4_K_M
- Unsloth Studio new
How to use robertolofaro/articles-model with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for robertolofaro/articles-model to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for robertolofaro/articles-model to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for robertolofaro/articles-model to start chatting
- Pi new
How to use robertolofaro/articles-model with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf robertolofaro/articles-model:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "robertolofaro/articles-model:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use robertolofaro/articles-model with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf robertolofaro/articles-model:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default robertolofaro/articles-model:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use robertolofaro/articles-model with Docker Model Runner:
docker model run hf.co/robertolofaro/articles-model:Q4_K_M
- Lemonade
How to use robertolofaro/articles-model with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull robertolofaro/articles-model:Q4_K_M
Run and chat with the model
lemonade run user.articles-model-Q4_K_M
List all available models
lemonade list
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,241 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
license: cc-by-sa-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- it
|
| 5 |
license: cc-by-sa-4.0
|
| 6 |
+
library_name: gguf
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
base_model: Qwen/Qwen3.5-4B
|
| 9 |
+
base_model_relation: quantized
|
| 10 |
+
tags:
|
| 11 |
+
- recommendation-system
|
| 12 |
+
- question-answering
|
| 13 |
+
- books
|
| 14 |
+
- gguf
|
| 15 |
+
- qwen3.5
|
| 16 |
+
- cpu-compatible
|
| 17 |
+
- local-inference
|
| 18 |
+
- faiss
|
| 19 |
+
- qdrant
|
| 20 |
+
- conversational
|
| 21 |
+
- knowledge-base
|
| 22 |
+
doi: 10.57967/hf/8903
|
| 23 |
---
|
| 24 |
+
|
| 25 |
+
# Books Q&A and Recommendation Model
|
| 26 |
+
|
| 27 |
+
**DOI:** [10.57967/hf/8903](https://doi.org/10.57967/hf/8903)
|
| 28 |
+
**Demo Space:** [robertolofaro/articles](https://huggingface.co/spaces/robertolofaro/articles) *(CPU-only, currently private / testing)*
|
| 29 |
+
**Author:** [Roberto Lofaro](https://huggingface.co/robertolofaro)
|
| 30 |
+
**License:** [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Model Overview
|
| 35 |
+
|
| 36 |
+
This is a **GGUF quantisation** of [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B), fine-tuned via a structured system prompt and optional retrieval layer to serve as a **Q&A and recommendation assistant** over a corpus of 350+ articles extracted from robertolofaro.com.
|
| 37 |
+
|
| 38 |
+
The update is up to 2026-05-05.
|
| 39 |
+
|
| 40 |
+
The model is designed to answer questions about the articles and, primarily, to act as an **arguments outlining and guided brainstorming** system.
|
| 41 |
+
|
| 42 |
+
Its answers should be considered just representation of the material within the source articles, coupled with the capabilities of the underlying Qwen3.5-4B model.
|
| 43 |
+
|
| 44 |
+
No answer represents or should be considered advice, as the training material did not include your own specific context and a professional assessment of your context.
|
| 45 |
+
|
| 46 |
+
Hence, before acting on the answers, consult professional advice.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Intended Use
|
| 51 |
+
|
| 52 |
+
| Use | Supported |
|
| 53 |
+
|-----|-----------|
|
| 54 |
+
| Interactive Q&A on the 350+ articles | ✅ |
|
| 55 |
+
| Offline / local inference (CPU) | ✅ |
|
| 56 |
+
| General-purpose assistant | ⚠️ Not the primary intent |
|
| 57 |
+
| Commercial deployment without attribution | ❌ (see license) |
|
| 58 |
+
|
| 59 |
+
### Primary Task
|
| 60 |
+
|
| 61 |
+
Given a natural language request such as:
|
| 62 |
+
|
| 63 |
+
> *"that is the potential of Italy /nothing"*
|
| 64 |
+
|
| 65 |
+
the model returns a short answer structured as an executive summary followed by guidelines and hints.
|
| 66 |
+
|
| 67 |
+
As an example, with the request above, this is the answer that you could obtain from the [demo space](https://huggingface.co/spaces/robertolofaro/articles):
|
| 68 |
+
|
| 69 |
+
<code>
|
| 70 |
+
1. **Executive Summary:**
|
| 71 |
+
Italy is described as an "arrested development" where institutional and political structures are too weak to support change or innovation, often resulting in inefficiencies and a lack of coordination between levels of government. The potential lies not just in economic growth but also in cultural and social transformation through education, transparency, and shared leadership.
|
| 72 |
+
|
| 73 |
+
2. **Guidelines & Hints:**
|
| 74 |
+
- Italy has a "culture of failure" due to systemic corruption and weak institutional frameworks that discourage innovation at the local level.
|
| 75 |
+
- Success requires breaking down barriers between levels of government (national, regional, local) to foster coordinated development.
|
| 76 |
+
- Education is critical—Italy needs a shift from "exam-centric" learning to competency-based systems focused on real-world application.
|
| 77 |
+
- Transparency and accountability in public administration are prerequisites for sustainable change, as secrecy undermines trust and efficiency.
|
| 78 |
+
- Local governance must be empowered with clear mandates and resources; otherwise, initiatives remain fragmented and unscalable.
|
| 79 |
+
- Collaboration between civil society, business, and government is necessary to create a "culture of success" rather than relying on top-down directives.
|
| 80 |
+
- Historical context (e.g., the Risorgimento) shows that long-term national vision requires continuous adaptation—not just one-off reforms.
|
| 81 |
+
- Technology adoption must be aligned with human development; digital tools alone cannot replace systemic structural changes.
|
| 82 |
+
- Regional diversity demands tailored solutions, not a "one-size-fits-all" approach to policy implementation.
|
| 83 |
+
- The role of the "middle class" as both driver and beneficiary of change is often overlooked in Italy’s political landscape.
|
| 84 |
+
</code>
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
## About the articles
|
| 89 |
+
|
| 90 |
+
The 350+ articles cover topics spanning **organizational change**, **business transformation**, **knowledge management**, **AI adoption**, and **programme management**, drawing on the author's 35+ years of experience in consulting and C-level advisory roles across European industrial and, in Italy, also public-sector missions.
|
| 91 |
+
|
| 92 |
+
The abstract and content of each article (including those after the update date of the model) is on [GitHub](https://github.com/robertolofaro/supportmaterial/tree/master/kagglemetadata_content).
|
| 93 |
+
|
| 94 |
+
The metadata of the articles are on [Kaggle](https://www.kaggle.com/datasets/robertolofaro/articles-publication-metadata-and-ai-access).
|
| 95 |
+
|
| 96 |
+
You can searh the articles on [robertolofaro.com](https://robertolofaro.com) either by [cluster](https://robertolofaro.com/searchcluster.php) or by ["tag cloud"](https://robertolofaro.com/search.php), as well as see click on each article within the sections available or directly on the [latest released](https://robertolofaro.com/), [most read](https://robertolofaro.com/mostread), or [latest read](https://robertolofaro.com/latestread).
|
| 97 |
+
|
| 98 |
+
As some articles span over multiple releases, and even across multiple sections (i.e. are mini-book drafts in disguise), there is also a list of [multi-part articles](https://robertolofaro.com/multipart) where you can navigate across the sections of an article.
|
| 99 |
+
|
| 100 |
+
Access to each article is free and CC-BY-SA-4.0, this model is just to further ease access vs. the existing research facilities on the website, and to ensure permanent availability online.
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## Available Quantisations
|
| 106 |
+
|
| 107 |
+
| Quantisation | File | Size | Recommended For |
|
| 108 |
+
|---|---|---|---|
|
| 109 |
+
| Q4\_K\_M | `articles-Q4_K_M.gguf` | ~2.71 GB | CPU inference, everyday use |
|
| 110 |
+
|
| 111 |
+
The **Q4\_K\_M** variant is recommended for CPU-only environments and is the one used in the companion Space.
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## Usage
|
| 116 |
+
|
| 117 |
+
### Sample of results by using a custom Python script
|
| 118 |
+
|
| 119 |
+
To test the potential use of the model, a script in Python has been developed that enabled also to selectively add integration with:
|
| 120 |
+
* the AI-generated [MorningNews]() that release on a daily basis on [GitHub](https://github.com/robertolofaro/supportmaterial/tree/master/MorningNewsAgentTest) since March 2026
|
| 121 |
+
* DuckDuckGo search, to allow further integration
|
| 122 |
+
|
| 123 |
+
The concept was layering first answers from articles, to "steer" then the exploration of MorningNews for recent, certified information related to the answer, and finally add also web searches.
|
| 124 |
+
|
| 125 |
+
As that experimental script is strictly conditioned by the context, to avoid having answers considered, due to the structured approach, as advice, I did not release the script online.
|
| 126 |
+
|
| 127 |
+
This is anyway an example of what you can obtain by using locally the model and integrating it with your own sources plus focused web search.
|
| 128 |
+
|
| 129 |
+

|
| 130 |
+
|
| 131 |
+
You are free to download the model and to use this infographic to feed an AI with the instructions of using that as a template of what an agent, Claude project, etc should deliver based on the model and your own additional content.
|
| 132 |
+
|
| 133 |
+
Beware of designing a series of tests to assess the "reasoning boundaries" and limit both hallucinations and extension of information by similarity or association.
|
| 134 |
+
|
| 135 |
+
### Quick Start with Ollama
|
| 136 |
+
|
| 137 |
+
```bash
|
| 138 |
+
ollama run hf.co/robertolofaro/articles-model:Q4_K_M
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
The faiss_hnsw and qdrant files are provided for RAG use.
|
| 142 |
+
|
| 143 |
+
### Quick Start with llama.cpp
|
| 144 |
+
|
| 145 |
+
The pre-compiled llama.cpp with the version supporting Qwen3.5 is shared [on huggingface](https://huggingface.co/robertolofaro/libraries_prebuilt), and it has been built offline, tested offline with Python 3.12.3 under Ubuntu 24.04, and online with Python 3.13 within a HuggingFace space.
|
| 146 |
+
|
| 147 |
+
```bash
|
| 148 |
+
# macOS / Linux
|
| 149 |
+
brew install llama.cpp
|
| 150 |
+
llama-server -hf robertolofaro/articles-model:Q4_K_M
|
| 151 |
+
|
| 152 |
+
# Windows (WinGet)
|
| 153 |
+
winget install llama.cpp
|
| 154 |
+
llama-server -hf robertolofaro/articles-model:Q4_K_M
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### Quick Start with llama-cpp-python
|
| 158 |
+
|
| 159 |
+
```python
|
| 160 |
+
from llama_cpp import Llama
|
| 161 |
+
|
| 162 |
+
llm = Llama.from_pretrained(
|
| 163 |
+
repo_id="robertolofaro/articles-model",
|
| 164 |
+
filename="articles-Q4_K_M.gguf",
|
| 165 |
+
n_ctx=8192,
|
| 166 |
+
)
|
| 167 |
+
|
| 168 |
+
response = llm.create_chat_completion(
|
| 169 |
+
messages=[
|
| 170 |
+
{
|
| 171 |
+
"role": "user",
|
| 172 |
+
"content": "what is the potential of Italy? /nothink"
|
| 173 |
+
}
|
| 174 |
+
]
|
| 175 |
+
)
|
| 176 |
+
print(response["choices"][0]["message"]["content"])
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
### Sample Execution Output
|
| 182 |
+
|
| 183 |
+
`samples_hf/` also contains a pre-run **execution results example** showing expected model output for a representative set of queries, useful for calibrating expectations before running inference locally.
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## Companion Space
|
| 188 |
+
|
| 189 |
+
A Gradio-based interactive demo is available at:
|
| 190 |
+
|
| 191 |
+
🔗 **[robertolofaro/articles](https://huggingface.co/spaces/robertolofaro/articles)**
|
| 192 |
+
|
| 193 |
+
The Space runs the **Q4\_K\_M** quantisation on CPU hardware (no GPU required).
|
| 194 |
+
|
| 195 |
+
---
|
| 196 |
+
|
| 197 |
+
## Limitations
|
| 198 |
+
|
| 199 |
+
- The model is designed to support a system of arguments outlining and guided brainstorming using the articles within the training corpus.
|
| 200 |
+
- Recommendations are bounded by the 350+ article in the corpus; the model will not recommend external works.
|
| 201 |
+
- Already tested application variants enabling integration with e.g. an AI-generated [MorningNews](https://github.com/robertolofaro/supportmaterial/tree/master/MorningNewsAgentTest) and websearch with DuckDuckGo
|
| 202 |
+
- The model does not have live internet access; content reflects the corpus as indexed at build time; if you want access, you have to build the application.
|
| 203 |
+
- CPU inference with Q4\_K\_M typically yields response times of 15–60 seconds depending on hardware; within the huggingface space, could take few minutes.
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## Ethical Considerations
|
| 208 |
+
|
| 209 |
+
- The corpus consists entirely of original works by the author; no third-party copyrighted content is embedded.
|
| 210 |
+
- The system is informational; it does not collect user data.
|
| 211 |
+
- The model inherits any biases present in the Qwen3.5-4B base model; users should apply standard critical judgement to outputs.
|
| 212 |
+
|
| 213 |
+
---
|
| 214 |
+
|
| 215 |
+
## Citation
|
| 216 |
+
|
| 217 |
+
If you use this model or the associated scripts in research or derivative work, please cite:
|
| 218 |
+
|
| 219 |
+
```bibtex
|
| 220 |
+
@misc{roberto_lofaro_2026,
|
| 221 |
+
author = { Roberto Lofaro },
|
| 222 |
+
title = { articles-model (Revision 7caa2c6) },
|
| 223 |
+
year = 2026,
|
| 224 |
+
url = { https://huggingface.co/robertolofaro/articles-model },
|
| 225 |
+
doi = { 10.57967/hf/8903 },
|
| 226 |
+
publisher = { Hugging Face }
|
| 227 |
+
note = {GGUF quantisation of Qwen3.5-4B, fine-tuned for arguments outlining and guided brainstorming and retrieval (FAISS-HNSW / Qdrant)}
|
| 228 |
+
}
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
---
|
| 232 |
+
|
| 233 |
+
## License
|
| 234 |
+
|
| 235 |
+
This model card and associated scripts are released under **[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)**.
|
| 236 |
+
The base model weights are subject to the [Qwen3 License](https://huggingface.co/Qwen/Qwen3.5-4B/blob/main/LICENSE).
|
| 237 |
+
|
| 238 |
+
---
|
| 239 |
+
|
| 240 |
+
*Published openly as part of Roberto Lofaro's AI-assisted knowledge production initiative.
|
| 241 |
+
GitHub · Patreon · [robertolofaro.com](https://robertolofaro.com)*
|