File size: 4,937 Bytes
8eab63f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---

title: Controlled Text Summarization
emoji: πŸ“ˆ
colorFrom: yellow
colorTo: purple
sdk: gradio # VERY IMPORTANT: Specifies the framework (gradio, streamlit, docker, static)
sdk_version: 5.20.1 # Optional but recommended: Specify the SDK version
app_file: main.py # VERY IMPORTANT: The main Python file to run
pinned: false # Optional: Whether to pin the Space in your profile
license: mit # Optional: The license of your project (e.g., mit, apache-2.0, agpl-3.0)
---


# Creative Text Summarization with Style Control

A machine learning system that summarizes text in different stylistic variations (formal, informal, humorous, poetic) while preserving the content.

## Overview

This project creates an AI-powered text summarization system that not only condenses text but adapts its output to different stylistic preferences. It uses transformer models fine-tuned with style-specific summaries to generate summaries that match requested styles while maintaining accuracy.

## Features

- **Multiple Summary Styles**: Generate summaries in formal, informal, humorous, or poetic styles
- **Pre-trained Models**: Based on BART and other transformer architectures
- **User-friendly Interface**: Simple Gradio UI for interactive summary generation
- **Evaluation Metrics**: ROUGE and BLEU scores to evaluate summary quality

## Installation

1. Clone this repository:

```bash

git clone https://github.com/AriachAmine/controlled-text-summarization.git

cd controlled-text-summarization

```

2. Install dependencies:

```bash

pip install -r requirements.txt

```

3. Set up the Gemini API key:

```bash

# Linux/MacOS

export GEMINI_API_KEY="your-api-key-here"



# Windows

set GEMINI_API_KEY="your-api-key-here"

```

## Usage

### Running the Application

```bash

python main.py

```

This will:

1. Load the base model or a fine-tuned model (if available)
2. Prepare a dataset with stylized summaries (if needed)
3. Fine-tune the model on the prepared dataset (if no fine-tuned model exists)
4. Launch the Gradio interface for interactive summarization

### Using the Gradio Interface

1. Enter the text you want to summarize in the text box
2. Select your desired summary style from the dropdown (formal, informal, humorous, poetic)
3. Click "Submit" to generate the stylized summary

## How It Works

1. **Base Model**: Starts with a pre-trained text summarization model (BART)
2. **Style Training**: Fine-tunes the model on summaries with specific styles
3. **Style Control**: Uses style tokens to control output style during generation
4. **Evaluation**: Measures quality using ROUGE and BLEU metrics

## Project Structure

```

controlled-text-summarization/

β”œβ”€β”€ main.py               # Main script to run the application

β”œβ”€β”€ model.py              # Model loading and summarization functions

β”œβ”€β”€ data.py               # Data preparation and processing

β”œβ”€β”€ evaluation.py         # Metrics for evaluating summaries

β”œβ”€β”€ ui.py                 # Gradio interface

β”œβ”€β”€ requirements.txt      # Project dependencies

└── summarization_model/  # Directory for fine-tuned models (created after training)

```

## Dependencies

- torch, transformers: For model loading and fine-tuning
- gradio: For the user interface
- google-generativeai: For generating style-specific training data
- datasets, rouge_score, nltk: For data handling and evaluation



## Example



Input:



```

Scientists have discovered a new species of deep-sea fish that can withstand extreme pressure. The fish, found at depths of over 8,000 meters, has unique adaptations including specialized cell membranes and pressure-resistant proteins. This discovery may lead to new applications in biotechnology and materials science.

```



Output (Formal Style):



```

Researchers have identified a novel deep-sea fish species capable of surviving extreme pressures at depths exceeding 8,000 meters. The species exhibits specialized adaptations in cell membrane structure and pressure-resistant proteins, potentially offering valuable insights for biotechnology and materials science applications.

```



Output (Humorous Style):



```

Talk about a fish out of water... or rather, a fish VERY deep IN water! Scientists just found a super fish that laughs in the face of crushing ocean pressure. This deep-sea champion, chilling at 8,000 meters down, has fancy cell membranes and proteins that basically say "pressure, what pressure?" Scientists are already dreaming up ways to copy these deep-sea survival tricks for cool new tech!

```



## Future Improvements



- Add more styles (technical, narrative, etc.)

- Implement user feedback collection to improve models

- Add style strength control (slightly humorous vs. very humorous)

- Create a web API for integration with other applications



## License



[MIT License](LICENSE)