# üìù Multilingual Text Summarization (French + English)

## üìò Context

Text summarization is a crucial NLP task used to extract key insights from long documents. With the advancement of transformer-based architectures like BART and T5, we can now generate high-quality summaries in different languages.

This notebook demonstrates how to perform automatic summarization in:
- üá¨üáß **English**, using `facebook/bart-large-cnn`
- üá´üá∑ **French**, using `plguillou/t5-base-fr-sum-cnndm`

## üéØ Objectives

- Load and compare language-specific summarization models
- Generate and display summaries for both English and French input texts
- Test edge cases and observe model behavior

## Packages

In [1]:
# !pip install transformers sentencepiece
! pip install -r requirements.txt

Collecting transformers (from -r requirements.txt (line 1))
  Using cached transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting torch (from -r requirements.txt (line 2))
  Using cached torch-2.7.0-cp312-cp312-win_amd64.whl.metadata (29 kB)
Collecting langdetect (from -r requirements.txt (line 3))
  Using cached langdetect-1.0.9.tar.gz (981 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting gradio (from -r requirements.txt (line 4))
  Using cached gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers->-r requirements.txt (line 1))
  Using cached huggingface_hub-0.30.2-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers->-r requirements.txt (line 1))
  Using cached tokenizers-0.21.1-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.3 (from transformers->-r requirements.txt (line 1))
  Using cached safeten

In [2]:
# import loguru

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, LlamaTokenizer
import textwrap  # Text wrapping and filling

import gradio as gr
from langdetect import detect
import sys

In [4]:
print(sys.version)

3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]


## üß† Model Descriptions

### üá¨üáß `facebook/bart-large-cnn` ‚Äî English Text Summarization

**BART (Bidirectional and Auto-Regressive Transformer)** is a model developed by Facebook AI that combines the strengths of **encoder-decoder** architectures (like T5) and **auto-regressive** models (like GPT). It is **fine-tuned** on the **CNN/DailyMail dataset**, consisting of articles and summaries.

- **Use Case**: Excellent for **journalistic**, **informal**, or **structured opinion texts**.
- **Type of Summary**: **Abstractive** (paraphrasing, not just extraction).

**Architecture**:
- 12 layers of encoder + 12 layers of decoder
- Bidirectional attention for encoding, causal attention for decoding
- Around **406M parameters**

---

### üá´üá∑ `plguillou/t5-base-fr-sum-cnndm` ‚Äî French Text Summarization

Based on **T5 (Text-to-Text Transfer Transformer)**, developed by Google. This model is **fine-tuned** for **French text summarization** on a dataset inspired by CNN/DailyMail.

- **Use Case**: Best for **formal** or **structured** texts: **news articles**, **reports**, or **official documents**.
- **Type of Summary**: **Abstractive** (rephrasing the input text in its own words).

**Architecture**:
- **T5-base**: Around **220M parameters**
- Multilingual, but fine-tuned specifically for **French**.

---

### üåç `facebook/mbart-large-50-one-to-many-mmt` ‚Äî Multilingual Text Summarization

**mBART (Multilingual BART)** is a variation of the BART model that is trained on **multiple languages**. It is designed for **translation** tasks but can also be adapted for **summarization**.

- **Use Case**: Suitable for summarizing text in multiple languages, making it a versatile tool for multilingual applications.
- **Type of Summary**: **Abstractive**.

**Architecture**:
- 12 layers of encoder + 12 layers of decoder
- Multilingual model trained on 50 languages
- Around **680M parameters**

---

### üîÑ `google/t5-base-xxl-tlm` ‚Äî T5 for Multilingual Tasks

**T5** (Text-to-Text Transfer Transformer) is a model that frames all NLP tasks as a text-to-text problem, making it highly adaptable. It has been fine-tuned for multiple tasks including **summarization**.

- **Use Case**: Works well for **multilingual summarization**, but can also be used for translation, question-answering, etc.
- **Type of Summary**: **Abstractive** (like all T5-based models).

**Architecture**:
- **T5-base**: Around **220M parameters**
- **T5-XXL**: Much larger, up to **11B parameters**
- Fine-tuned for many multilingual tasks

---

### üöÄ `google/flan-t5-xl` ‚Äî Fine-tuned T5 for Better Generalization

**FLAN-T5** is a version of T5 that is **fine-tuned on a variety of tasks** to improve generalization. It aims to perform better on a wide range of NLP tasks, including summarization, when compared to regular T5.

- **Use Case**: Ideal for **high-quality summarization** tasks in multiple languages, with improved robustness.
- **Type of Summary**: **Abstractive**.

**Architecture**:
- **T5-XL**: Large model with **11B parameters**.
- Fine-tuned on a wide variety of tasks, improving the model's ability to generalize across domains.

---

### üìä Quick Comparison

| Model                         | Language      | Architecture        | Fine-Tuning            | Type of Summary |
|-------------------------------|---------------|---------------------|------------------------|-----------------|
| `facebook/bart-large-cnn`      | English       | BART                | CNN/DailyMail          | Abstractive     |
| `plguillou/t5-base-fr-sum`     | French        | T5 (Base)           | CNN/DailyMail FR       | Abstractive     |
| `facebook/mbart-large-50`      | Multilingual  | mBART               | Multilingual (50 languages) | Abstractive |
| `google/t5-base-xxl-tlm`       | Multilingual  | T5 (Base or XXL)    | Multilingual           | Abstractive     |
| `google/flan-t5-xl`            | Multilingual  | T5 (Fine-tuned)     | Fine-tuned for better generalization | Abstractive |


## Choice models

In [None]:
# English summarization model (BART)
summarizer_en = pipeline("summarization", model="facebook/bart-large-cnn")

# French summarization model (T5 fine-tuned for summarization)
summarizer_fr = pipeline("summarization", model="plguillou/t5-base-fr-sum-cnndm")

# fr_model_name = "plguillou/t5-base-fr-sum-cnndm"
# tokenizer_fr = AutoTokenizer.from_pretrained(fr_model_name)
# model_fr = AutoModelForSeq2SeqLM.from_pretrained(fr_model_name)
# summarizer_fr = pipeline("summarization", model=model_fr, tokenizer=tokenizer_fr)

Device set to use cpu
Device set to use cpu


## üß™ Application: Testing models

In [9]:
text_en = """
Artificial Intelligence is revolutionizing many industries such as healthcare, finance, and transportation.
Machine learning techniques now enable systems to analyze vast amounts of data and make decisions with minimal human input.
However, these advances raise concerns over data privacy, algorithmic transparency, and job displacement.
"""

text_fr = """
L'intelligence artificielle transforme profond√©ment des secteurs comme la sant√©, les transports et l'√©ducation.
Gr√¢ce √† l'apprentissage automatique, les syst√®mes peuvent analyser de grandes quantit√©s de donn√©es et prendre des d√©cisions complexes.
Cependant, cela soul√®ve des enjeux √©thiques majeurs sur la transparence, l'emploi et la confidentialit√©.
"""

In [10]:
print("üîπ Original English Text:\n")
print(textwrap.fill(text_en, width=100))

summary_en = summarizer_en(text_en, max_length=100, min_length=30, do_sample=False)

print("\n‚úÖ English Summary:\n")
print(textwrap.fill(summary_en[0]["summary_text"], width=100))

Your max_length is set to 100, but your input_length is only 62. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)


üîπ Original English Text:

 Artificial Intelligence is revolutionizing many industries such as healthcare, finance, and
transportation. Machine learning techniques now enable systems to analyze vast amounts of data and
make decisions with minimal human input. However, these advances raise concerns over data privacy,
algorithmic transparency, and job displacement.

‚úÖ English Summary:

Machine learning techniques now enable systems to analyze vast amounts of data and make decisions
with minimal human input. These advances raise concerns over data privacy, algorithmic transparency,
and job displacement.


In [11]:
print("üîπ Texte original en fran√ßais:\n")
print(textwrap.fill(text_fr, width=100))

summary_fr = summarizer_fr(text_fr, max_length=100, min_length=30, do_sample=False)

print("\n‚úÖ R√©sum√© en fran√ßais:\n")
print(textwrap.fill(summary_fr[0]["summary_text"], width=100))

üîπ Texte original en fran√ßais:

 L'intelligence artificielle transforme profond√©ment des secteurs comme la sant√©, les transports et
l'√©ducation. Gr√¢ce √† l'apprentissage automatique, les syst√®mes peuvent analyser de grandes quantit√©s
de donn√©es et prendre des d√©cisions complexes. Cependant, cela soul√®ve des enjeux √©thiques majeurs
sur la transparence, l'emploi et la confidentialit√©.

‚úÖ R√©sum√© en fran√ßais:

L'intelligence artificielle transforme profond√©ment des secteurs comme la sant√©, les transports et
l'√©ducation. Cependant, cela soul√®ve des enjeux √©thiques majeurs sur la transparence.


In [18]:
print("üîπ Texte original en fran√ßais:\n")
print(textwrap.fill(text_fr, width=100))

summary_fr = summarizer_fr(text_fr, max_length=100, min_length=30, do_sample=True)

print("\n‚úÖ R√©sum√© en fran√ßais:\n")
print(textwrap.fill(summary_fr[0]["summary_text"], width=100))

üîπ Texte original en fran√ßais:

 L'intelligence artificielle transforme profond√©ment des secteurs comme la sant√©, les transports et
l'√©ducation. Gr√¢ce √† l'apprentissage automatique, les syst√®mes peuvent analyser de grandes quantit√©s
de donn√©es et prendre des d√©cisions complexes. Cependant, cela soul√®ve des enjeux √©thiques majeurs
sur la transparence, l'emploi et la confidentialit√©.

‚úÖ R√©sum√© en fran√ßais:

L'intelligence artificielle transforme profond√©ment des secteurs tels que la sant√©, les transports
et l'√©ducation. Cependant, cela soul√®ve des enjeux √©thiques majeurs sur la transparence.


In [14]:
empty_text = ""

summary_fr = summarizer_fr(empty_text, max_length=100, min_length=30, do_sample=False)
summary_fr

Your max_length is set to 100, but your input_length is only 3. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=1)


[{'summary_text': "Selon les autorit√©s, il s'agit d'un √©v√©nement qui n'a pas eu lieu √† l'√©poque."}]

In [19]:
texts_fr = [
    "",
    """Machine learning techniques now enable systems to analyze vast amounts of data and make decisions
    with minimal human input. These advances raise concerns over data privacy, algorithmic transparency,
    and job displacement.""",
    "Le r√©chauffement climatique provoque des √©v√©nements m√©t√©orologiques extr√™mes dans le monde entier.",
    "La France accueille chaque ann√©e des millions de touristes attir√©s par sa culture et sa gastronomie.",
    "Les v√©hicules autonomes utilisent des capteurs et de l'IA pour se d√©placer sans conducteur humain."
]

print("üîÅ R√©sum√©s fran√ßais (batch):\n")
for t in texts_fr:
    text = t.strip()
    if not text:
        print("‚ö†Ô∏è Input text is empty. Please provide valid content to summarize.")

    if len(text.split()) < 5:
        print("‚ö†Ô∏è Input text is too short to summarize meaningfully.")
    
    else:
        s = summarizer_fr(t, max_length=60, min_length=20, do_sample=True)
        print(f"üìå Texte: {t}\n‚û°Ô∏è R√©sum√©: {s[0]['summary_text']}\n")



Your max_length is set to 60, but your input_length is only 39. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=19)


üîÅ R√©sum√©s fran√ßais (batch):

‚ö†Ô∏è Input text is empty. Please provide valid content to summarize.
‚ö†Ô∏è Input text is too short to summarize meaningfully.


Your max_length is set to 60, but your input_length is only 24. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=12)


üìå Texte: Machine learning techniques now enable systems to analyze vast amounts of data and make decisions
    with minimal human input. These advances raise concerns over data privacy, algorithmic transparency,
    and job displacement.
‚û°Ô∏è R√©sum√©: Les nouvelles techniques de machine-learning permettent aux syst√®mes d'analyser de vastes quantit√©s de donn√©es. Les avanc√©es entra√Ænent des probl√®mes de protection de la vie priv√©e, de transparence et de licenciement.



Your max_length is set to 60, but your input_length is only 32. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=16)


üìå Texte: Le r√©chauffement climatique provoque des √©v√©nements m√©t√©orologiques extr√™mes dans le monde entier.
‚û°Ô∏è R√©sum√©: Le r√©chauffement climatique provoque des √©v√©nements m√©t√©orologiques extr√™mes dans le monde entier.



Your max_length is set to 60, but your input_length is only 33. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=16)


üìå Texte: La France accueille chaque ann√©e des millions de touristes attir√©s par sa culture et sa gastronomie.
‚û°Ô∏è R√©sum√©: La France accueille chaque ann√©e des millions de touristes attir√©s par la culture et la gastronomie.

üìå Texte: Les v√©hicules autonomes utilisent des capteurs et de l'IA pour se d√©placer sans conducteur humain.
‚û°Ô∏è R√©sum√©: Les v√©hicules autonomes utilisent des capteurs et de l'IA pour se d√©placer sans conducteur humain.



 ## üîç Auto detect text language

In [20]:

texts = [
    "Bonjour, comment allez-vous ?",           # French
    "Hello, how are you doing?",               # English
    "Hola, ¬øc√≥mo est√°s?",                      # Spanish
    "Guten Tag, wie geht's Ihnen?",            # German
    "",                                        # Empty
    "„Åì„Çì„Å´„Å°„ÅØ„ÄÅ„ÅäÂÖÉÊ∞ó„Åß„Åô„ÅãÔºü",                # Japanese
    "1234567890 $$$ ???",                      # Gibberish
]

for text in texts:
    try:
        lang = detect(text)
        print(f"üìù Text: {text}\n‚û°Ô∏è Language detected: {lang}\n")
    except:
        print(f"üìù Text: {text}\n‚ùå Could not detect language\n")

üìù Text: Bonjour, comment allez-vous ?
‚û°Ô∏è Language detected: fr

üìù Text: Hello, how are you doing?
‚û°Ô∏è Language detected: en

üìù Text: Hola, ¬øc√≥mo est√°s?
‚û°Ô∏è Language detected: es

üìù Text: Guten Tag, wie geht's Ihnen?
‚û°Ô∏è Language detected: de

üìù Text: 
‚ùå Could not detect language

üìù Text: „Åì„Çì„Å´„Å°„ÅØ„ÄÅ„ÅäÂÖÉÊ∞ó„Åß„Åô„ÅãÔºü
‚û°Ô∏è Language detected: ja

üìù Text: 1234567890 $$$ ???
‚ùå Could not detect language



## Applications: Scripts

### Summarizer

In [None]:
from summarizer.utils import detect_language, read_file
from summarizer.summarize import generate_summary

import PyPDF2

In [None]:

texts = [
    "Bonjour, comment allez-vous ?",           # French
    "Hello, how are you doing?",               # English
    "Hola, ¬øc√≥mo est√°s?",                      # Spanish
    "Guten Tag, wie geht's Ihnen?",            # German
    "",                                        # Empty
    "„Åì„Çì„Å´„Å°„ÅØ„ÄÅ„ÅäÂÖÉÊ∞ó„Åß„Åô„ÅãÔºü",                # Japanese
    "1234567890 $$$ ???",                      # Gibberish
]

for text in texts:
    try:
        lang = detect_language(text)
        print(f"üìù Text: {text}\n‚û°Ô∏è Language detected: {lang}\n")
    except:
        print(f"üìù Text: {text}\n‚ùå Could not detect language\n")

üìù Text: Bonjour, comment allez-vous ?
‚û°Ô∏è Language detected: fr

üìù Text: Hello, how are you doing?
‚û°Ô∏è Language detected: en

üìù Text: Hola, ¬øc√≥mo est√°s?
‚û°Ô∏è Language detected: es

üìù Text: Guten Tag, wie geht's Ihnen?
‚û°Ô∏è Language detected: de

üìù Text: 
‚û°Ô∏è Language detected: unknown

üìù Text: „Åì„Çì„Å´„Å°„ÅØ„ÄÅ„ÅäÂÖÉÊ∞ó„Åß„Åô„ÅãÔºü
‚û°Ô∏è Language detected: ja

üìù Text: 1234567890 $$$ ???
‚û°Ô∏è Language detected: unknown



In [42]:
def read_txt_file(filepath: str) -> str:
    """Read content from a .txt file. """
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            content = f.read()
        return content
    except Exception as e:
        print(f"‚ùå Error reading TXT file: {e}")
        return ""

text = read_txt_file("assets/sample_fr.txt")
print(f"Original French Text:\n {text}")


Original French Text:
 Le changement climatique est une menace majeure pour la plan√®te.
Les experts estiment que si les √©missions de gaz √† effet de serre ne sont pas r√©duites de mani√®re significative,
 les temp√©ratures mondiales continueront d'augmenter, provoquant des ph√©nom√®nes m√©t√©orologiques extr√™mes, des inondations,
  des s√©cheresses prolong√©es et la mont√©e du niveau des mers.
   Pour y faire face, il est n√©cessaire de transformer nos modes de production et de consommation, de d√©velopper les √©nergies renouvelables,
    et de mettre en ≈ìuvre des politiques publiques ambitieuses.



In [41]:

summary = generate_summary(text=text)
print(f"R√©sum√© FR:\n {textwrap.fill(summary, width=100)}")

R√©sum√© FR:
 Les experts estiment que si les √©missions de gaz √† effet de serre ne sont pas r√©duites de fa√ßon
significative, les temp√©ratures mondiales continueront d'augmenter. Il est n√©cessaire de transformer
nos modes de production et de consommation, de d√©velopper les √©nergies renouvelables, d‚Äôadopter des
politiques publiques ambitieuses.


In [45]:
def read_pdf_file(filepath: str) -> str:
    try:
        with open(filepath, "rb") as file:
            pdf_reader = PyPDF2.PdfReader(file)
            text = ""
            
            # Loop through each page and extract text
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                text += page.extract_text()
                
        return text.strip()
    
    except Exception as e:
        print(f"‚ùå Error reading PDF file: {e}")
        return ""
    

text = read_pdf_file("assets/sample_en.pdf")
print(f"Original EN Text:\n {text}")

Original EN Text:
 Climate change is undeniably one of the most significant global challenges of our time. Its effects
are being felt across the globe, from rising sea levels to more frequent and intense natural disasters.
Governments, businesses, and individuals must take immediate and sustained action to reduce
greenhouse gas emissions. Investing in renewable energy, promoting sustainable transportation,
and encouraging conservation are essential steps. The science is clear: if we do not act now, the
consequences will be irreversible and catastrophic.


In [44]:

summary = generate_summary(text=text)
print(f"R√©sum√© EN:\n {textwrap.fill(summary, width=100)}")

R√©sum√© FR:
 Governments, businesses, and individuals must take immediate and sustained action to reduce
greenhouse gas emissions. Investing in renewable energy, promoting sustainable transportation, and
encouraging conservation are essential steps.


In [48]:
import os


def read_txt_file(filepath: str) -> str:
    """Read content from a .txt file."""
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            content = f.read()
        return content
    except Exception as e:
        print(f"‚ùå Error reading TXT file: {e}")
        return ""


def read_pdf_file(filepath: str) -> str:
    """Extract text from a PDF file using PyPDF2."""
    try:
        with open(filepath, "rb") as file:
            pdf_reader = PyPDF2.PdfReader(file)
            text = ""
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                text += page.extract_text()
        return text.strip()
    except Exception as e:
        print(f"‚ùå Error reading PDF file: {e}")
        return ""


def read_file(filepath: str) -> str:
    """Read a file (txt or pdf) and return its content as text."""
    if os.path.splitext(filepath)[1].lower() == ".txt":
        return read_txt_file(filepath)
    elif os.path.splitext(filepath)[1].lower() == ".pdf":
        return read_pdf_file(filepath)
    else:
        print(f"‚ùå Unsupported file type: {filepath}")
        return ""



content = read_file(filepath="assets/sample_fr.txt")
print(f"File content:\n{content}\n")

File content:
Le changement climatique est une menace majeure pour la plan√®te.
Les experts estiment que si les √©missions de gaz √† effet de serre ne sont pas r√©duites de mani√®re significative,
 les temp√©ratures mondiales continueront d'augmenter, provoquant des ph√©nom√®nes m√©t√©orologiques extr√™mes, des inondations,
  des s√©cheresses prolong√©es et la mont√©e du niveau des mers.
   Pour y faire face, il est n√©cessaire de transformer nos modes de production et de consommation, de d√©velopper les √©nergies renouvelables,
    et de mettre en ≈ìuvre des politiques publiques ambitieuses.




In [27]:
from summarizer.summarize import generate_summary

# Short test in French
text_fr = """La pollution de l'air est un enjeu majeur. Il faut agir rapidement pour limiter les √©missions. Le changement climatique est une menace majeure pour la plan√®te.
Les experts estiment que si les √©missions de gaz √† effet de serre ne sont pas r√©duites de mani√®re significative,
 les temp√©ratures mondiales continueront d'augmenter, provoquant des ph√©nom√®nes m√©t√©orologiques extr√™mes, des inondations,
  des s√©cheresses prolong√©es et la mont√©e du niveau des mers."""

summary = generate_summary(text=text_fr)
print("R√©sum√© FR:", summary)



R√©sum√© FR: La pollution de l'air est un enjeu majeur. Il faut agir rapidement pour limiter les √©missions. Le changement climatique est une menace majeure pour la plan√®te.


In [28]:
summary = generate_summary(text="La pollution de l'air est un enjeu majeur.")
print("R√©sum√© FR:", summary)

Your max_length is set to 100, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


R√©sum√© FR: L'air pollu√© est un enjeu majeur. La pollution de l'air est un facteur contributif majeur.


### Web interface

In [49]:
import gradio as gr
from summarizer.summarize import generate_summary



In [51]:
help(gr)

Help on package gradio:

NAME
    gradio

PACKAGE CONTENTS
    _simple_templates (package)
    analytics
    blocks
    blocks_events
    chat_interface
    cli (package)
    component_meta
    components (package)
    context
    data_classes
    events
    exceptions
    external
    external_utils
    flagging
    helpers
    http_server
    image_utils
    interface
    ipython_ext
    layouts (package)
    mcp
    monitoring_dashboard
    networking
    node_server
    oauth
    pipelines
    pipelines_utils
    processing_utils
    queueing
    ranged_response
    renderable
    route_utils
    routes
    server_messages
    sketch (package)
    state_holder
    templates
    test_data (package)
    themes (package)
    tunneling
    utils
    wasm_utils

CLASSES
    abc.ABC(builtins.object)
        gradio.flagging.FlaggingCallback
            gradio.flagging.CSVLogger
            gradio.flagging.SimpleCSVLogger
    builtins.dict(builtins.object)
        gradio.components.chatbot

In [52]:
help(gr.Interface)

Help on class Interface in module gradio.interface:

class Interface(gradio.blocks.Blocks)
 |  Interface(fn: 'Callable', inputs: 'str | Component | Sequence[str | Component] | None', outputs: 'str | Component | Sequence[str | Component] | None', examples: 'list[Any] | list[list[Any]] | str | None' = None, *, cache_examples: 'bool | None' = None, cache_mode: "Literal['eager', 'lazy'] | None" = None, examples_per_page: 'int' = 10, example_labels: 'list[str] | None' = None, live: 'bool' = False, title: 'str | None' = None, description: 'str | None' = None, article: 'str | None' = None, theme: 'Theme | str | None' = None, flagging_mode: "Literal['never'] | Literal['auto'] | Literal['manual'] | None" = None, flagging_options: 'list[str] | list[tuple[str, str]] | None' = None, flagging_dir: 'str' = '.gradio/flagged', flagging_callback: 'FlaggingCallback | None' = None, analytics_enabled: 'bool | None' = None, batch: 'bool' = False, max_batch_size: 'int' = 4, api_name: 'str | Literal[False] |

In [None]:
iface = gr.Interface(
    fn=generate_summary,
    inputs=[
        gr.Textbox(label="Enter text manually", lines=8, placeholder="Write or paste text here..."),
        gr.File(label="Or upload a .txt or .pdf file", file_types=[".txt", ".pdf"]),
        gr.Slider(10, 200, value=30, step=10, label="Min Summary Length"),
        gr.Slider(30, 300, value=100, step=10, label="Max Summary Length"),
        gr.Checkbox(label="Use sampling (do_sample)", value=False),
    ],
    outputs=gr.Textbox(label="Generated Summary"),
    title="üìù Multilingual Text Summarizer with LLMs",
    description="Summarize English or French text using transformers. Supports text, PDF and TXT.",
    examples=[
        ["Bonjour, ceci est un exemple de mail professionnel √† r√©sumer pour un usage interne."],
        ["This is a long English article that explains how machine learning models are trained using large datasets."]
    ]
)




In [62]:
# iface.launch() 

In [74]:
iface.close()

Closing server running on port: 7864


In [75]:
!pip show gradio

Name: gradio
Version: 5.29.0
Summary: Python library for easily interacting with trained machine learning models
Home-page: https://github.com/gradio-app/gradio
Author: 
Author-email: Abubakar Abid <gradio-team@huggingface.co>, Ali Abid <gradio-team@huggingface.co>, Ali Abdalla <gradio-team@huggingface.co>, Dawood Khan <gradio-team@huggingface.co>, Ahsen Khaliq <gradio-team@huggingface.co>, Pete Allen <gradio-team@huggingface.co>, √ñmer Faruk √ñzdemir <gradio-team@huggingface.co>, Freddy A Boulton <gradio-team@huggingface.co>, Hannah Blair <gradio-team@huggingface.co>
License: 
Location: C:\Projets\anaconda3\Lib\site-packages
Requires: aiofiles, anyio, fastapi, ffmpy, gradio-client, groovy, httpx, huggingface-hub, jinja2, markupsafe, numpy, orjson, packaging, pandas, pillow, pydantic, pydub, python-multipart, pyyaml, ruff, safehttpx, semantic-version, starlette, tomlkit, typer, typing-extensions, uvicorn
Required-by: 


In [78]:
help(gr.themes)

Help on package gradio.themes in gradio:

NAME
    gradio.themes

PACKAGE CONTENTS
    app
    base
    builder_app
    citrus
    default
    glass
    monochrome
    ocean
    origin
    soft
    upload_theme
    utils (package)

SUBMODULES
    colors
    sizes

CLASSES
    builtins.object
        gradio.themes.base.ThemeClass
            gradio.themes.base.Base
                gradio.themes.citrus.Citrus
                gradio.themes.default.Default
                gradio.themes.glass.Glass
                gradio.themes.monochrome.Monochrome
                gradio.themes.ocean.Ocean
                gradio.themes.origin.Origin
                gradio.themes.soft.Soft
        gradio.themes.utils.colors.Color
        gradio.themes.utils.fonts.Font
            gradio.themes.utils.fonts.GoogleFont
        gradio.themes.utils.sizes.Size

    class Base(ThemeClass)
     |  Base(*, primary_hue: 'colors.Color | str' = <gradio.themes.utils.colors.Color object at 0x000001AC81D71310>, secondary_

In [None]:
# Simplified code to test themes
def dummy_function(text):
    return text

demo = gr.Interface(
    fn=dummy_function,
    inputs=[gr.Textbox(label="Enter text manually")],
    outputs=gr.Textbox(label="Generated Summary"),
    theme=gr.themes.Soft() , 
)

demo.launch()

In [80]:
demo.close()

Closing server running on port: 7865


In [None]:
iface = gr.Interface(
    fn=generate_summary,
    inputs=[
        gr.Textbox(label="Enter text manually", lines=8, placeholder="Write or paste text here..."),
        gr.File(label="Or upload a .txt or .pdf file", file_types=[".txt", ".pdf"]),
        gr.Slider(10, 200, value=30, step=10, label="Min Summary Length"),
        gr.Slider(30, 300, value=100, step=10, label="Max Summary Length"),
        gr.Checkbox(label="Use sampling (do_sample)", value=False),
    ],
    outputs=gr.Textbox(label="Generated Summary", elem_id="output-summary"),
    title="üìù Multilingual Text Summarizer with LLMs",
    description="Summarize English or French text using transformers. Supports text, PDF and TXT.",
    theme="dark",  # compact huggingface default gardio monochrome dark
    
    live=True,  # Allow real-time interaction with the summarizer
    examples=[
        [
            """Bonjour, ceci est un exemple d'email professionnel tr√®s long. Nous avons plusieurs documents importants √† examiner. Le premier document concerne la gestion des ressources humaines, et le second porte sur l'optimisation des processus logistiques pour am√©liorer l'efficacit√© des op√©rations de transport. Les deux documents contiennent des informations cl√©s sur les changements organisationnels que nous avons mis en place r√©cemment. La r√©union d'aujourd'hui permettra de discuter de ces points et de prendre des d√©cisions √©clair√©es sur la direction future de notre entreprise. Nous avons besoin d'un r√©sum√© clair et pr√©cis des principaux changements et recommandations. Merci de pr√™ter attention aux d√©tails les plus importants.""",
        ],
        [
            """This is a long English article that explains how machine learning models are trained using large datasets. Machine learning involves the development of algorithms that can process and analyze data to make predictions or decisions without being explicitly programmed. In this article, we will explore the different stages involved in training a machine learning model, starting with data collection, followed by data preprocessing, feature engineering, model selection, and finally, model training and evaluation. Each of these steps is crucial for building a high-performance machine learning model. We will also discuss some of the challenges faced during the training process, such as overfitting and underfitting, and how to mitigate them using techniques like cross-validation, hyperparameter tuning, and regularization. The goal is to help readers understand the entire process of model training and to provide insights into the best practices used in the industry.""",
        ]
    ],
    css="""
        #text-input, #file-input {
            font-size: 16px;
            border-radius: 8px;
        }
        #min-length-slider, #max-length-slider {
            background-color: #e0e0e0;
        }
        #output-summary {
            font-size: 14px;
            font-family: Arial, sans-serif;
            color: #333;
            border: 1px solid #ddd;
            border-radius: 8px;
            padding: 12px;
        }
        .footer { 
            display: none;  /* Remove footer */
        }
    """
)

iface.launch() 
