Instructions to use robertolofaro/articles-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use robertolofaro/articles-model with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="robertolofaro/articles-model",
	filename="articles-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use robertolofaro/articles-model with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf robertolofaro/articles-model:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf robertolofaro/articles-model:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf robertolofaro/articles-model:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf robertolofaro/articles-model:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf robertolofaro/articles-model:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf robertolofaro/articles-model:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf robertolofaro/articles-model:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf robertolofaro/articles-model:Q4_K_M

Use Docker

docker model run hf.co/robertolofaro/articles-model:Q4_K_M

LM Studio
Jan

vLLM

How to use robertolofaro/articles-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "robertolofaro/articles-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robertolofaro/articles-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/robertolofaro/articles-model:Q4_K_M

Ollama
How to use robertolofaro/articles-model with Ollama:
```
ollama run hf.co/robertolofaro/articles-model:Q4_K_M
```

Unsloth Studio new

How to use robertolofaro/articles-model with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for robertolofaro/articles-model to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for robertolofaro/articles-model to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for robertolofaro/articles-model to start chatting

Pi new

How to use robertolofaro/articles-model with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf robertolofaro/articles-model:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "robertolofaro/articles-model:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use robertolofaro/articles-model with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf robertolofaro/articles-model:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default robertolofaro/articles-model:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use robertolofaro/articles-model with Docker Model Runner:
```
docker model run hf.co/robertolofaro/articles-model:Q4_K_M
```

Lemonade

How to use robertolofaro/articles-model with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull robertolofaro/articles-model:Q4_K_M

Run and chat with the model

lemonade run user.articles-model-Q4_K_M

List all available models

lemonade list

robertolofaro commited on 12 days ago

Commit

0a65ae5

verified ·

1 Parent(s): e90a9cd

Upload 5 files

Browse files

Files changed (5) hide show

qa_common.py +66 -0
qa_markdown_chroma_externalized.py +55 -0
qa_markdown_faiss_hnsw_externalized.py +49 -0
qa_markdown_fast.py +28 -0
qa_markdown_qdrant_externalized.py +60 -0

qa_common.py ADDED Viewed

	@@ -0,0 +1,66 @@

+#!/usr/bin/env python3
+import argparse
+import datetime
+from llama_cpp import Llama
+# ====================== COMMON CONFIG & PROMPT ======================
+SYSTEM_PROMPT = """You are the reference expert for the articles contained in this database, all extracted from the website robertolofaro.com, and all focused on change.
+#Your Mission:
+When a user asks a question, your goal is to provide a structured response based ONLY on the articles provided in your training. Do not provide general advice from outside these sources.
+# Response Format:
+1. Executive Summary: A 2-3 sentence overview answering the core query.
+2. Guidelines & Hints: A markdown list of specific "answers/guidelines/hints" found in the source material.
+"""
+def build_prompt(query: str, context: str = "") -> str:
+    prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
+    if context:
+        prompt += f"<|im_start|>user\nContext:\n{context}\n\nQuestion: {query}<|im_end|>\n"
+    else:
+        prompt += f"<|im_start|>user\n{query}<|im_end|>\n"
+    prompt += "<|im_start|>assistant\n"
+    return prompt
+def generate_answer(llm, prompt: str, max_tokens=1200):
+    output = llm(
+        prompt,
+        max_tokens=max_tokens,
+        temperature=0.65,
+        top_p=0.9,
+        stop=["<|im_end|>", "<|im_start|>"],
+        echo=False,
+    )
+    return output["choices"][0]["text"].strip()
+def save_result(query: str, answer: str, output_file="answer.md"):
+    now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    markdown = f"""# Q&A Result
+## Timestamp
+{now}
+## Question
+{query}
+## Answer
+{answer}
+"""
+    with open(output_file, "w", encoding="utf-8") as f:
+        f.write(markdown)
+    print(f"✅ Saved to: {output_file}")
+    print("="*80)
+    print(answer)
+    print("="*80)
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--prompt", type=str, help="Question to ask")
+    parser.add_argument("--output", type=str, default="answer.md")
+    return parser.parse_args()

qa_markdown_chroma_externalized.py ADDED Viewed

	@@ -0,0 +1,55 @@

+#!/usr/bin/env python3
+from qa_common import parse_args, build_prompt, generate_answer, save_result
+# REVISED: Imported from the dedicated langchain_chroma package
+from langchain_chroma import Chroma
+from langchain_huggingface import HuggingFaceEmbeddings
+from llama_cpp import Llama
+# ====================== CHROMA SPECIFIC ======================
+VECTORSTORE_PATH = "chroma_db"
+MODEL_PATH = "articles-Q4_K_M.gguf"
+print("Loading embedding model...")
+embeddings = HuggingFaceEmbeddings(
+    model_name="BAAI/bge-small-en-v1.5",
+    encode_kwargs={'normalize_embeddings': True}
+)
+print("Loading Chroma vector store...")
+vectorstore = Chroma(
+    persist_directory=VECTORSTORE_PATH,
+    embedding_function=embeddings
+)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
+print("Loading LLM...")
+llm = Llama(
+    model_path=MODEL_PATH,
+    n_ctx=65000,
+    n_threads=8,
+    verbose=False,
+)
+def get_context(query: str) -> str:
+    """Retrieve context using Chroma"""
+    docs = retriever.invoke(query)
+    return "\n\n".join([
+        f"[Article: {doc.metadata.get('article_title', 'N/A')}] "
+        f"{doc.page_content}"
+        for doc in docs
+    ])
+if __name__ == "__main__":
+    args = parse_args()
+    query = args.prompt if args.prompt else input("\nQuestion: ")
+    print("Retrieving context and generating answer...\n")
+    context = get_context(query)
+    prompt = build_prompt(query, context)
+    answer = generate_answer(llm, prompt)
+    save_result(query, answer, args.output)

qa_markdown_faiss_hnsw_externalized.py ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/usr/bin/env python3
+from qa_common import parse_args, build_prompt, generate_answer, save_result
+import faiss
+import pickle
+from sentence_transformers import SentenceTransformer
+from llama_cpp import Llama
+# ====================== FAISS HNSW SPECIFIC ======================
+INDEX_PATH = "faiss_hnsw/vector_search.index"
+METADATA_PATH = "faiss_hnsw/metadata.pkl"
+MODEL_PATH = "articles-Q4_K_M.gguf"
+print("Loading embedding model...")
+embed_model = SentenceTransformer("BAAI/bge-small-en-v1.5")
+print("Loading FAISS HNSW index...")
+index = faiss.read_index(INDEX_PATH)
+print("Loading metadata...")
+with open(METADATA_PATH, "rb") as f:
+    metadata = pickle.load(f)
+print("Loading LLM...")
+llm = Llama(model_path=MODEL_PATH, n_ctx=25000, n_threads=8, verbose=False)
+def get_context(query: str, k=5) -> str:
+    query_vec = embed_model.encode([query], normalize_embeddings=True).astype('float32')
+    _, indices = index.search(query_vec, k)
+    chunks = []
+    for idx in indices[0]:
+        row = metadata.iloc[idx]
+        chunk = f"[Article: {row['article_title']}] \n{row['article_content']}"
+        chunks.append(chunk)
+    return "\n\n".join(chunks)
+if __name__ == "__main__":
+    args = parse_args()
+    query = args.prompt if args.prompt else input("\nQuestion: ")
+    print("Retrieving context and generating answer...\n")
+    context = get_context(query, k=5)
+    prompt = build_prompt(query, context)
+    answer = generate_answer(llm, prompt)
+    save_result(query, answer, args.output)

qa_markdown_fast.py ADDED Viewed

	@@ -0,0 +1,28 @@

+#!/usr/bin/env python3
+from qa_common import parse_args, build_prompt, generate_answer, save_result
+from llama_cpp import Llama
+# ====================== CONFIG ======================
+MODEL_PATH = "articles-Q4_K_M.gguf"
+print("Loading GGUF model...")
+llm = Llama(
+    model_path=MODEL_PATH,
+    n_ctx=8192,
+    n_threads=8,
+    verbose=False,
+)
+def answer(query: str):
+    prompt = build_prompt(query, context="")   # No context = pure model
+    return generate_answer(llm, prompt, max_tokens=1100)
+if __name__ == "__main__":
+    args = parse_args()
+    query = args.prompt if args.prompt else input("\nQuestion: ")
+    print("Generating answer using fine-tuned model (Fast Mode)...\n")
+    answer_text = answer(query)
+    save_result(query, answer_text, args.output)

qa_markdown_qdrant_externalized.py ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env python3
+from qa_common import parse_args, build_prompt, generate_answer, save_result
+from langchain_qdrant import QdrantVectorStore
+from langchain_huggingface import HuggingFaceEmbeddings
+from llama_cpp import Llama
+# FIX 1: Import the native client to manage its lifecycle
+from qdrant_client import QdrantClient
+# ====================== QDRANT SPECIFIC ======================
+QDRANT_PATH = "qdrant_db"
+COLLECTION_NAME = "articles"
+MODEL_PATH = "articles-Q4_K_M.gguf"
+print("Loading embedding model...")
+embeddings = HuggingFaceEmbeddings(
+    model_name="BAAI/bge-small-en-v1.5",
+    encode_kwargs={'normalize_embeddings': True}
+)
+print("Loading Qdrant vector store...")
+# FIX 2: Create the client explicitly
+client = QdrantClient(path=QDRANT_PATH)
+# Pass the client directly to the vector store
+vectorstore = QdrantVectorStore(
+    client=client,
+    collection_name=COLLECTION_NAME,
+    embedding=embeddings
+)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
+print("Loading LLM...")
+llm = Llama(model_path=MODEL_PATH, n_ctx=25000, n_threads=8, verbose=False)
+def get_context(query: str) -> str:
+    docs = retriever.invoke(query)
+    return "\n\n".join([
+        f"[Article: {doc.metadata.get('article_title', 'N/A')}] "
+        f"{doc.page_content}"
+        for doc in docs
+    ])
+if __name__ == "__main__":
+    args = parse_args()
+    query = args.prompt if args.prompt else input("\nQuestion: ")
+    print("Retrieving context and generating answer...\n")
+    context = get_context(query)
+    prompt = build_prompt(query, context)
+    answer = generate_answer(llm, prompt)
+    save_result(query, answer, args.output)
+    # FIX 3: Close connection explicitly while Python is still fully intact
+    print("Closing vector store connection...")
+    client.close()