Instructions to use PyThaGo/LLMLit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PyThaGo/LLMLit with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="PyThaGo/LLMLit",
	filename="LLMLit-0.2-8B-Instruct.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use PyThaGo/LLMLit with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf PyThaGo/LLMLit
# Run inference directly in the terminal:
llama-cli -hf PyThaGo/LLMLit

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf PyThaGo/LLMLit
# Run inference directly in the terminal:
llama-cli -hf PyThaGo/LLMLit

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf PyThaGo/LLMLit
# Run inference directly in the terminal:
./llama-cli -hf PyThaGo/LLMLit

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf PyThaGo/LLMLit
# Run inference directly in the terminal:
./build/bin/llama-cli -hf PyThaGo/LLMLit

Use Docker

docker model run hf.co/PyThaGo/LLMLit

LM Studio
Jan
Ollama
How to use PyThaGo/LLMLit with Ollama:
```
ollama run hf.co/PyThaGo/LLMLit
```

Unsloth Studio

How to use PyThaGo/LLMLit with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PyThaGo/LLMLit to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PyThaGo/LLMLit to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for PyThaGo/LLMLit to start chatting

How to use PyThaGo/LLMLit with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf PyThaGo/LLMLit

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "PyThaGo/LLMLit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use PyThaGo/LLMLit with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf PyThaGo/LLMLit

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default PyThaGo/LLMLit

Run Hermes

hermes

Docker Model Runner
How to use PyThaGo/LLMLit with Docker Model Runner:
```
docker model run hf.co/PyThaGo/LLMLit
```

Lemonade

How to use PyThaGo/LLMLit with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull PyThaGo/LLMLit

Run and chat with the model

lemonade run user.LLMLit-{{QUANT_TAG}}

List all available models

lemonade list

LLMLit / RAG LLMLit

Cristian Sas

Create RAG LLMLit

7e0e33b verified over 1 year ago

raw

history blame

5.17 kB

	### Pași pentru implementarea RAG cu LLMLit pe Hugging Face 🚀
	---

	Retrieval-Augmented Generation (RAG) folosind modelul LLMLit disponibil pe Hugging Face. RAG combină căutarea informațiilor relevante cu generarea de texte pentru a produce răspunsuri mai precise. Vom utiliza Hugging Face pentru a integra acest model și vom folosi o bază de date externă pentru a face interogări și a îmbogăți răspunsurile generate de modelul LLMLit.

	#### 1. Instalarea pachetelor necesare 🛠️

	În primul rând, trebuie să instalezi librăriile necesare pentru a lucra cu Hugging Face și LLMLit. Poți face acest lucru folosind pip:

	```bash
	pip install transformers datasets faiss-cpu
	```

	- `transformers` este pachetul care ne permite să interacționăm cu modelele de la Hugging Face.
	- `datasets` ne ajută să gestionăm datele externe pentru căutare.
	- `faiss-cpu` este opțional, dar îl recomandăm pentru căutarea vectorială rapidă a documentelor.

	#### 2. Încărcarea modelului LLMLit de pe Hugging Face 🔄

	Acum, putem încarcă modelul LLMLit folosind Hugging Face:

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	# Încărcăm modelul LLMLit și tokenizer-ul
	tokenizer = AutoTokenizer.from_pretrained("LLMLit/LLMLit")
	model = AutoModelForSeq2SeqLM.from_pretrained("LLMLit/LLMLit")
	```

	#### 3. Configurarea bazei de date de documente 🔍

	Pentru a folosi RAG, avem nevoie de o sursă externă de documente pentru a recupera informațiile relevante. În exemplul de față, vom folosi FAISS pentru căutarea rapidă a documentelor. Începe prin a crea un index FAISS:

	```python
	import faiss
	import numpy as np

	# Crearea unui set de documente fictive
	documents = [
	"LLMLit este un model puternic de procesare a limbajului natural.",
	"RAG combină generarea de texte cu căutarea de informații externe.",
	"Hugging Face oferă o platformă excelentă pentru modelele AI.",
	"FAISS este un tool de căutare vectorială rapidă pentru baze de date mari."
	]

	# Tokenizare și crearea vectorilor pentru documente
	embedding_model = AutoModelForSeq2SeqLM.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
	tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

	def encode_documents(documents):
	embeddings = []
	for doc in documents:
	inputs = tokenizer(doc, return_tensors="pt", padding=True, truncation=True)
	with torch.no_grad():
	embeddings.append(embedding_model(**inputs).last_hidden_state.mean(dim=1).numpy())
	return np.vstack(embeddings)

	document_vectors = encode_documents(documents)

	# Crearea indexului FAISS
	index = faiss.IndexFlatL2(document_vectors.shape[1]) # Distanta L2
	index.add(document_vectors)
	```

	#### 4. Căutarea celor mai relevante documente 🔍

	Acum, putem folosi FAISS pentru a căuta documentele cele mai relevante pe baza întrebării utilizatorului:

	```python
	def retrieve_documents(query, top_k=3):
	query_vector = encode_documents([query]) # Încodifică întrebarea
	distances, indices = index.search(query_vector, top_k) # Căutăm cele mai apropiate documente
	return [documents[i] for i in indices[0]]

	# Exemplu de interogare
	query = "Cum se folosește RAG în aplicațiile AI?"
	relevant_documents = retrieve_documents(query)
	print(relevant_documents)
	```

	#### 5. Generarea răspunsului folosind LLMLit 📝

	Acum că avem documentele relevante, le putem utiliza pentru a genera un răspuns contextului întrebării. Vom adăuga aceste documente la promptul nostru pentru LLMLit:

	```python
	def generate_answer(query, documents):
	context = " ".join(documents) # Adăugăm documentele relevante ca și context
	prompt = f"Întrebare: {query}\nContext: {context}\nRăspuns:"

	# Tokenizarea promptului
	inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)

	# Generarea răspunsului
	outputs = model.generate(inputs['input_ids'], max_length=200, num_beams=5, early_stopping=True)
	answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
	return answer

	# Generarea răspunsului final
	answer = generate_answer(query, relevant_documents)
	print(answer)
	```

	#### 6. Rezultatul final 🎯

	În acest moment, ai un sistem complet RAG care combină căutarea de documente externe cu generarea de text utilizând LLMLit. Modelul va căuta informațiile relevante în documentele tale și va genera un răspuns informativ și precis.

	---

	### Concluzie 🌟

	Implementarea RAG folosind LLMLit îmbunătățește semnificativ calitatea răspunsurilor oferite de modele de limbaj, deoarece acestea pot accesa o bază de date externă pentru a obține informații mai precise și mai detaliate. Utilizând Hugging Face și librăriile precum FAISS, poți construi un sistem puternic de întrebări și răspunsuri bazat pe RAG.

	🔗 Pentru a experimenta cu LLMLit și pentru mai multe informații, vizitează [pagina oficială Hugging Face a modelului LLMLit](https://huggingface.co/LLMLit/LLMLit).

	Sper că acest ghid îți va fi de ajutor! 😊