Continue-1-OSS / README.md

SVECTOR-OFFICIAL

Update README.md

bbefab0 verified about 2 months ago

preview code

raw

history blame contribute delete

9.85 kB

metadata

license: apache-2.0
tags:
  - text-generation
  - text
  - chat
pipeline_tag: text-generation

Continue-1-OSS

Advanced Text Generation Model

Introduction

We are thrilled to introduce Continue-1-OSS, an advanced text generation model developed by SVECTOR, built on the Continue-1 architecture optimized for high-quality text generation, instruction following, and long-context understanding.

Continue-1-OSS is engineered to provide:

Superior Instruction Following: Accurately follows complex, multi-step instructions
Long Context: Robust handling of up to 128K+ tokens
Natural Conversations: Human-like dialogue with strong reasoning capabilities
Tool Integration: Built-in support for function calling and external tool use
Open Source: Fully accessible under Apache 2.0 license for research and commercial use

This model combines the power of transformer architecture with advanced training techniques to deliver exceptional performance across a wide range of natural language tasks.

Model Specifications

Base Architecture: Continue1ForCausalLM (transformer decoder)
Model Type: continue_oss
Parameters: 3 Billion
Context Length: 131,072 tokens
Vocabulary Size: 128,256 tokens
Hidden Size: 3072
Number of Layers: 28
Attention Heads: 24
License: Apache 2.0

Requirements

To use Continue-1-OSS, install the required dependencies:

pip install transformers torch
pip install vllm  # For fast inference (optional but recommended)

Quickstart

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "SVECTOR-CORPORATION/Continue-1-OSS"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare conversation
messages = [
    {"role": "user", "content": "What is machine learning?"}
]

# Apply chat template and generate
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

For high-performance inference with faster generation:

pip install vllm

from vllm import LLM, SamplingParams

# Initialize model
llm = LLM(
    model="SVECTOR-CORPORATION/Continue-1-OSS",
    trust_remote_code=True,
    max_model_len=8192
)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms."}
]

outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

Default System Prompt: "You are Continue-1-OSS, an advanced AI assistant developed by SVECTOR. You are designed to be helpful, harmless, and honest."

Advanced Features

Multi-Turn Conversations

messages = [
    {"role": "system", "content": "You are Continue-1-OSS, a helpful AI assistant."},
    {"role": "user", "content": "What is quantum computing?"},
    {"role": "assistant", "content": "Quantum computing is a type of computing that uses quantum mechanics principles..."},
    {"role": "user", "content": "Can you explain that more simply?"}
]

Tool Calling Support

Continue-1-OSS supports function calling for tool integration:

messages = [
    {"role": "user", "content": "What's the weather in San Francisco?"}
]

# Model can generate JSON function calls
# Example output: {"name": "get_weather", "parameters": {"location": "Ahmedabad"}}

Use Cases

Continue-1-OSS excels at:

Conversational AI: Build chatbots and virtual assistants with natural dialogue
Content Generation: Generate articles, stories, and creative content
Code Assistance: Help with coding tasks, debugging, and code explanations
Question Answering: Answer questions based on context with high accuracy
Summarization: Condense long documents into concise summaries
Data Extraction: Extract structured data from unstructured text
Tool Integration: Call functions and use external tools intelligently
Education: Create educational content and tutoring assistance
Customer Service: Automated support with natural language understanding

Performance

Quality: State-of-the-art instruction following and text generation
Speed: Fast inference with vLLM optimization
Memory: ~7GB GPU RAM (BF16), ~14GB (FP32)
Context: Handles up to 128K tokens effectively
Efficiency: Competitive with much larger models on many tasks

Model Architecture

Continue-1-OSS uses a custom architecture based on the transformer decoder:

Architecture Class: Continue1ForCausalLM
Config Class: Continue1Config
Hidden Size: 3072
Num Layers: 28
Num Attention Heads: 24
Intermediate Size: 8192
Vocab Size: 128,256
Max Position Embeddings: 131,072

The model uses RoPE (Rotary Position Embeddings) for positional encoding and supports extended context through position interpolation.

Training

Continue-1-OSS was developed using:

High-quality instruction datasets covering diverse tasks
Conversational and reasoning data for improved dialogue
Code and technical content for developer assistance
Multi-turn dialogue for contextual understanding

Training utilized:

Advanced optimization techniques
Careful hyperparameter tuning
Quality filtering and data curation
Evaluation on diverse benchmarks

Limitations

As with any language model, Continue-1-OSS has certain limitations:

Knowledge Cutoff: Training data is limited to information available up to December 2023
Factual Accuracy: May occasionally generate incorrect or outdated information
Specialized Domains: Performance may vary on highly specialized technical knowledge
Long Context: Very long contexts (>64K tokens) may impact generation quality
Languages: Primarily optimized for English; other languages have limited support
Reasoning: Complex multi-step reasoning may require careful prompting
Compute: Requires GPU for optimal performance (CPU is significantly slower)

Ethical Considerations

SVECTOR is committed to responsible AI development. Users should:

Transparency: Disclose when content is AI-generated
Verification: Always fact-check important information generated by the model
Bias Awareness: Be aware the model may reflect biases present in training data
Privacy: Do not input personal or sensitive information without proper safeguards
Safety: Implement content filtering and guardrails for production applications
Responsible Use: Do not use for illegal purposes, misinformation, or harmful content
Attribution: Credit the model when used in public projects or research

Performance Tips

Temperature Settings:
- 0.0-0.3 for factual/deterministic tasks
- 0.7-0.9 for creative tasks
Context Management:
- Model supports 128K tokens but consider truncating for faster inference
- Use sliding window for very long documents
Batch Processing:
- Use vLLM for efficient batched inference in production
- Group similar-length prompts together

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    "SVECTOR-CORPORATION/Continue-1-OSS",
    trust_remote_code=True,
    quantization_config=quantization_config,
    device_map="auto"
)

License

This model is released under the Apache License 2.0. You are free to use, modify, and distribute this model for both commercial and non-commercial purposes. See the LICENSE file for complete details.

Developed by SVECTOR