⚖️ LitLex-Llama: Lithuanian Legal AI

LitLex is a specialized Large Language Model (LLM) fine-tuned to understand and interpret the Administrative Code of the Republic of Lithuania (ANK).

Built with Meta Llama 3 and optimized using Unsloth, this model demonstrates high capability in citing legal articles, calculating fines, and explaining regulations in the Lithuanian language.

🚀 Model Details

Model Type: Casual Language Model (LLM)
Base Model: unsloth/llama-3-8b-bnb-4bit (Quantized)
Language: Lithuanian 🇱🇹
Architecture: LoRA (Low-Rank Adaptation)
Developer: Lukash
License: MIT

💻 How to Run (Inference)

You can run this model using the unsloth library for faster inference, or standard transformers.

Installation

pip install unsloth torch transformers

Python Code

from unsloth import FastLanguageModel
import torch

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lukashm/LitLex-Llama-LT-v1",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Define the prompt template
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
"""

# Ask a question
question = "Kokia bauda gresia už greičio viršijimą daugiau kaip 50 km/h?"
inputs = tokenizer([alpaca_prompt.format(question, "")], return_tensors="pt").to("cuda")

# Generate answer
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]
print(response.split("### Response:\n")[1].replace("<|end_of_text|>", ""))

🧠 Training Details

Dataset

The model was fine-tuned on a custom dataset (ank_dataset.json) derived from the Official Administrative Code of Lithuania (ANK) via e-seimas.lrs.lt.

Size: ~500 high-quality Instruction/Output pairs.
Content: Specific focus on traffic violations, public order offenses, and administrative fines.

Hyperparameters

Optimization: Unsloth (QLoRA)
Steps: 500
Batch Size: 2 (Gradient Accumulation: 4)
Learning Rate: 2e-4
LoRA Rank (r): 64 (High rank for better fact retention)
Final Loss: ~0.08 (High convergence)

⚠️ Limitations & Disclaimer

This model is a Proof of Concept (MVP) intended for educational and research purposes.

Hallucinations: While highly accurate in style, the model may occasionally cite incorrect article numbers (e.g., confusing Art. 348 with Art. 416).
Scope: The model specializes in Administrative Law (ANK) and may not be aware of Criminal Code (BK) or Civil Code (CK) nuances unless specifically trained.
Legal Advice: This is an AI assistant, not a lawyer. Always consult official sources or a qualified attorney for legal matters.

Downloads last month: -

Model tree for lukashm/LitLex-Llama-LT-v1

Base model

meta-llama/Meta-Llama-3-8B

Quantized

unsloth/llama-3-8b-bnb-4bit

Adapter

(305)

this model