Update README.md

8a56eee verified about 7 hours ago

3.93 kB

base_model: Qwen/Qwen3.5-2B
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen3_5
license: apache-2.0
language:
  - hi
  - en
  - ta
  - te
  - kn
  - bn
  - mr

The model was finetuned on ~128,000 curated transcripts across different domanins and language preferences

Expanded Training: Now optimized for CX Support, Healthcare, Loan Collection, Insurance, Ecommerce, and Concierge services.
Feature Improvement: Significantly enhanced relative date-time extraction for more precise data processing.
Training Overview
- You can plug it into your calling or voice AI stack to automatically extract:
  - Enum-based classifications (e.g., call outcome, intent, disposition)
  - Conversation summaries
  - Action items / follow-ups
  - Relative DateTime Artifacts

It’s built to handle real-world Hindi, English, Indic noisy transcripts. test out our even smaller SLM

Finetuning Parameters:

rank = 64 # kept small to know change inherent model intelligence but to make sure structured ectraction is followed
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset  = test_dataset,
    args = SFTConfig(
        dataset_text_field = "prompt",
        max_seq_length = max_seq_length,
        per_device_train_batch_size = 5,   
        gradient_accumulation_steps = 5,   

        warmup_steps       = 10,           
        num_train_epochs   = 2,            
        learning_rate      = 2e-4,         
        lr_scheduler_type  = "linear",     

        optim        = "adamw_8bit",
        weight_decay = 0.01,               # Unsloth default (was 0.001)
        seed         = SEED,

        logging_steps  = 50,
        report_to      = "wandb",

        eval_strategy  = "steps",
        eval_steps     = 5000,
        save_strategy  = "steps",
        save_steps     = 5000,
        load_best_model_at_end  = True,    
        metric_for_best_model   = "eval_loss",

        output_dir     = "outputs_qwen35_2b",
        dataset_num_proc = 8,
        fp16= not torch.cuda.is_bf16_supported(),
        bf16=  torch.cuda.is_bf16_supported()
    ),
)

Provide the below schema for best output:

response_schema = {
        "type": "object",
        "properties": {
            "key_points": {
                "type": "array",
                "items": {"type": "string"},
                "nullable": True,
            },
            "action_items": {
                "type": "array",
                "items": {"type": "string"},
                "nullable": True,
            },
            "summary": {"type": "string"},
            "classification": classification_schema,
            "callback_requested": {
                "type": "boolean",
                "nullable": False,
                "description": "If the user requested a callback or mentiones currently he is busy then value is true otherwise false",
            },
            "callback_requested_time": {
                "type": "string",
                "nullable": True,
                "description": "ISO 8601 datetime string (YYYY-MM-DDTHH:MM:SS) in the call's timezone, if user requested a callback",
            },
        },
        "required": ["summary", "classification"],
    }

Uploaded finetuned model

Developed by: RinggAI
License: apache-2.0
Finetuned from model : Qwen/Qwen3.5-2B

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.