Instructions to use kssrikar4/Intellecta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kssrikar4/Intellecta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kssrikar4/Intellecta")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kssrikar4/Intellecta")
model = AutoModelForCausalLM.from_pretrained("kssrikar4/Intellecta")

llama-cpp-python

How to use kssrikar4/Intellecta with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kssrikar4/Intellecta",
	filename="Intellecta.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use kssrikar4/Intellecta with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kssrikar4/Intellecta
# Run inference directly in the terminal:
llama-cli -hf kssrikar4/Intellecta

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kssrikar4/Intellecta
# Run inference directly in the terminal:
llama-cli -hf kssrikar4/Intellecta

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kssrikar4/Intellecta
# Run inference directly in the terminal:
./llama-cli -hf kssrikar4/Intellecta

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kssrikar4/Intellecta
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kssrikar4/Intellecta

Use Docker

docker model run hf.co/kssrikar4/Intellecta

LM Studio
Jan

vLLM

How to use kssrikar4/Intellecta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kssrikar4/Intellecta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kssrikar4/Intellecta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/kssrikar4/Intellecta

SGLang

How to use kssrikar4/Intellecta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kssrikar4/Intellecta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kssrikar4/Intellecta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kssrikar4/Intellecta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kssrikar4/Intellecta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use kssrikar4/Intellecta with Ollama:
```
ollama run hf.co/kssrikar4/Intellecta
```

Unsloth Studio new

How to use kssrikar4/Intellecta with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kssrikar4/Intellecta to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kssrikar4/Intellecta to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kssrikar4/Intellecta to start chatting

Docker Model Runner
How to use kssrikar4/Intellecta with Docker Model Runner:
```
docker model run hf.co/kssrikar4/Intellecta
```

Lemonade

How to use kssrikar4/Intellecta with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kssrikar4/Intellecta

Run and chat with the model

lemonade run user.Intellecta-{{QUANT_TAG}}

List all available models

lemonade list

kssrikar4 commited on Jan 17, 2025

Commit

d17479b

verified ·

1 Parent(s): b970670

Create README.md

Browse files

Files changed (1) hide show

README.md +153 -0

README.md ADDED Viewed

	@@ -0,0 +1,153 @@

+---
+library_name: transformers
+license: llama3.2
+base_model: meta-llama/Llama-3.2-1B
+tags:
+- generated_from_trainer
+model-index:
+- name: Intellecta
+  results: []
+datasets:
+- fka/awesome-chatgpt-prompts
+- BAAI/Infinity-Instruct
+- allenai/WildChat-1M
+- lavita/ChatDoctor-HealthCareMagic-100k
+- zjunlp/Mol-Instructions
+- garage-bAInd/Open-Platypus
+language:
+- en
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Intellecta
+This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
+## Model description
+The model is based on LLaMA (Large Language Model Meta AI), a family of state-of-the-art language models developed for natural language understanding and generation. This specific implementation uses the LLaMA 3.2-1B model, which is fine-tuned for general-purpose conversational AI tasks.
+Architecture: Transformer-based causal language model.
+Tokenization: Uses the AutoTokenizer compatible with the LLaMA model, with adjustments to ensure proper padding.
+Pre-trained Foundation: The model builds on the pre-trained weights of LLaMA, focusing on improving performance for conversational and instruction-based tasks.
+Implementation: Developed with Hugging Face’s Transformers library for extensibility and ease of use.
+## Intended uses & limitations
+Intended Uses
+Instruction-following tasks: Can perform tasks such as answering questions, summarizing, and text generation.
+Conversational agents: Suitable for chatbots and virtual assistants, including those in specialized domains like healthcare or education.
+Research and Development: Fine-tuning and benchmarking against datasets for downstream tasks.
+## Training and evaluation data
+Datasets Used
+fka/awesome-chatgpt-prompts: General-purpose instruction-following and conversational dataset based on GPT-like interactions.
+BAAI/Infinity-Instruct (3M): A large instruction dataset containing a wide variety of tasks and instructions.
+allenai/WildChat-1M: Focused on open-ended conversational data.
+lavita/ChatDoctor-HealthCareMagic-100k: Healthcare-specific dataset for medical conversational agents.
+zjunlp/Mol-Instructions: Molecular biology-related instructions.
+garage-bAInd/Open-Platypus: Dataset aimed at general-purpose, open-domain reasoning.
+Data Preprocessing
+Text prompts and responses are tokenized with padding and truncation.
+Labels are derived from input tokens, masking padding tokens with -100 to exclude them from loss computation.
+## Training procedure
+The training procedure for the model fine-tunes the pre-trained LLaMA 3.2-1B model on various datasets with a focus on instruction-following and conversational tasks. Below are the key aspects of the training process:
+1. Preprocessing
+Tokenization:
+The input prompts and their responses are tokenized using the AutoTokenizer configured for LLaMA.
+Special considerations:
+Padding tokens are explicitly handled using the pad_token (set to the eos_token if undefined).
+Inputs are truncated to a maximum length of 512 tokens to fit model constraints.
+Label Preparation:
+Input IDs are cloned to create labels for supervised learning.
+Padding tokens in labels are masked with -100 to ensure they are ignored during loss computation.
+Dataset Mapping:
+Each dataset's prompt field is tokenized and reformatted into the model’s required input-output structure.
+Non-standard datasets without a prompt column are skipped to avoid errors.
+2. Model Setup
+Pre-trained Model:
+The base model, meta-llama/Llama-3.2-1B, is loaded with pre-trained weights.
+It is fine-tuned for causal language modeling, focusing on instruction-based outputs.
+Tokenizer Setup:
+The tokenizer ensures consistency in encoding and decoding for the model.
+Padding is fixed (using eos_token as a fallback).
+3. Training Configuration
+TrainingArguments:
+The Hugging Face TrainingArguments object is used to configure the training process:
+Output Directory: llama_output stores the model checkpoints and logs.
+Epochs: 4 epochs for a balance between training time and generalization.
+Batch Size: 4 examples per device to handle memory constraints.
+Gradient Accumulation: 4 steps to simulate a larger effective batch size.
+Learning Rate: 1e-4 with a warmup phase of 500 steps for stable optimization.
+Weight Decay: 0.01 to mitigate overfitting.
+Mixed Precision: FP16 (half-precision) is used for faster training and reduced memory usage.
+Logging Steps: Logs are generated every 10 steps to monitor training progress.
+Checkpointing: Model checkpoints are saved at the end of each epoch.
+Push to Hub: The fine-tuned model is uploaded to Hugging Face’s Hub (kssrikar4/Intellecta).
+Data Collator:
+The DataCollatorForSeq2Seq ensures that batches are dynamically padded for efficiency during training.
+4. Fine-Tuning Process
+Trainer:
+The Hugging Face Trainer class orchestrates the training process, combining the model, data, and training configuration.
+Loss is computed for each batch using the model's outputs (e.g., logits) and the prepared labels.
+The optimizer and learning rate scheduler are managed internally by the Trainer.
+Training Loop:
+During each epoch:
+The model processes batches of tokenized prompts and computes the causal language modeling (CLM) loss.
+Gradients are accumulated over multiple steps to simulate a larger batch size.
+Optimizer updates are applied after gradient accumulation.
+Validation:
+While validation data is not explicitly defined in the code, the Trainer supports evaluation if an eval_dataset is provided.
+Saving checkpoints at each epoch allows model evaluation post-training.
+5. Post-Training
+Push to Hub:
+The trained model, along with its tokenizer and configuration, is pushed to the Hugging Face Hub under the ID kssrikar4/Intellecta.
+Usage:
+The fine-tuned model can be downloaded and directly used for inference or further fine-tuning.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 16
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 4
+- mixed_precision_training: Native AMP
+### Training results
+### Framework versions
+- Transformers 4.48.0
+- Pytorch 2.5.1+cpu
+- Datasets 3.2.0
+- Tokenizers 0.21.0