Qwen3-4B Financial Summarizer - Fine-tuned on EDGAR Corpus

Ini adalah model unsloth/Qwen3-4B-Base yang telah di-fine-tune untuk tugas meringkas laporan keuangan. Model ini dilatih pada subset dari dataset kritsadaK/EDGAR-CORPUS-Financial-Summarization, dengan fokus pada data dari tahun 2018 hingga 2020.

Model ini dioptimalkan menggunakan Unsloth untuk pelatihan yang lebih cepat dan penggunaan memori yang lebih efisien, dengan teknik fine-tuning LoRA (Low-Rank Adaptation).

Deskripsi Model

Base Model: unsloth/Qwen3-4B-Base
Tugas: Ringkasan Teks Keuangan (Financial Text Summarization)
Bahasa: Inggris (English)
Dataset: 500 sampel dari kritsadaK/EDGAR-CORPUS-Financial-Summarization (tahun 2018-2020).
Kerangka Kerja: Transformers, Unsloth, PyTorch.

Penggunaan (How to Use)

Pastikan Anda telah menginstal transformers dan unsloth. Anda dapat memuat model dan tokenizer lalu menggunakan chat template yang sesuai untuk menghasilkan ringkasan.

from unsloth import FastLanguageModel
import torch

# Ganti dengan nama repo Anda
model_name = "your-username/your-model-name" 

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = 8192,
    load_in_4bit = True, # atau False jika menggunakan VRAM yang lebih besar
)

# Template Prompt yang digunakan saat training
system_prompt = """You are a professional financial assistant.
Your objective is to generate a clear, concise, and professional summary of the provided data.
Ensure the summary accurately reflects the key information and main conclusions of the original text.

###Instructions for answering:
No wordiness, 350 words limit
No further explanation, just summary"""

user_template = """Summarize the text below:

{}"""

# Contoh teks input (ganti dengan laporan keuangan Anda)
financial_text = "..." # Masukkan teks laporan keuangan di sini

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_template.format(financial_text)},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

# Generate output
outputs = model.generate(input_ids=inputs, max_new_tokens=350, use_cache=True)
summary = tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0]

print(summary)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support