Omartificial-Intelligence-Space commited on
Commit
ce835a1
ยท
verified ยท
1 Parent(s): df60e39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +220 -3
README.md CHANGED
@@ -1,3 +1,220 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ar
5
+ base_model:
6
+ - UBC-NLP/AraT5v2-base-1024
7
+ library_name: transformers
8
+ tags:
9
+ - TST
10
+ - Arabic
11
+ - Author_Style
12
+ - AraGenEval
13
+ ---
14
+
15
+ # Arabic Author Text Transfer
16
+
17
+ ๐Ÿ† **First Place Winner at AraGenEval 2025 Competition**
18
+
19
+ A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
20
+
21
+ ## ๐ŸŽฏ Model Performance
22
+
23
+ - **BLEU Score:** 24.58
24
+ - **chrF Score:** 59.01
25
+ - **Competition:** First Place in AraGenEval 2024
26
+ - **Supported Authors:** 21 Arabic authors
27
+
28
+ ## ๐Ÿš€ Quick Start
29
+
30
+ ### Installation
31
+
32
+ ```bash
33
+ pip install -r requirements.txt
34
+ ```
35
+
36
+ ### Basic Usage
37
+
38
+ ```python
39
+ from inference_arabic_author_transfer import ArabicAuthorTextTransfer
40
+
41
+ # Initialize model
42
+ model = ArabicAuthorTextTransfer()
43
+
44
+ # Transfer text to author's style
45
+ text = "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ"
46
+ target_author = "ูŠูˆุณู ุฅุฏุฑูŠุณ"
47
+
48
+ result = model.transfer_style(text, target_author)
49
+ print(f"Original: {text}")
50
+ print(f"Transferred: {result}")
51
+ ```
52
+
53
+ ## ๐Ÿ“š Supported Authors
54
+
55
+ 1. ูŠูˆุณู ุฅุฏุฑูŠุณ
56
+ 2. ู†ุฌูŠุจ ู…ุญููˆุธ
57
+ 3. ุทู‡ ุญุณูŠู†
58
+ 4. ุญุณู† ุญู†ููŠ
59
+ 5. ุนุจุฏ ุงู„ุบูุงุฑ ู…ูƒุงูˆูŠ
60
+ 6. ุณู„ุงู…ุฉ ู…ูˆุณู‰
61
+ 7. ุฃุญู…ุฏ ุดูˆู‚ูŠ
62
+ 8. ุฃุญู…ุฏ ุชูŠู…ูˆุฑ ุจุงุดุง
63
+ 9. ุซุฑูˆุช ุฃุจุงุธุฉ
64
+ 10. ุฌุจุฑุงู† ุฎู„ูŠู„ ุฌุจุฑุงู†
65
+ 11. ุฑูˆุจุฑุช ุจุงุฑ
66
+ 12. ูˆูŠู„ูŠุงู… ุดูŠูƒุณุจูŠุฑ
67
+ 13. ุฃู…ูŠู† ุงู„ุฑูŠุญุงู†ูŠ
68
+ 14. ุบูˆุณุชุงู ู„ูˆุจูˆู†
69
+ 15. ุฃุญู…ุฏ ุฃู…ูŠู†
70
+ 16. ู…ุญู…ุฏ ุญุณูŠู† ู‡ูŠูƒู„
71
+ 17. ุฌูุฑุฌูŠ ุฒูŠุฏุงู†
72
+ 18. ุนุจุงุณ ู…ุญู…ูˆุฏ ุงู„ุนู‚ุงุฏ
73
+ 19. ูุคุงุฏ ุฒูƒุฑูŠุง
74
+ 20. ูƒุงู…ู„ ูƒูŠู„ุงู†ูŠ
75
+ 21. ู†ูˆุงู„ ุงู„ุณุนุฏุงูˆูŠ
76
+
77
+ ## ๐Ÿ”ง Usage Examples
78
+
79
+ ### 1. Command Line Interface
80
+
81
+ #### Interactive Mode
82
+ ```bash
83
+ python inference_arabic_author_transfer.py --interactive
84
+ ```
85
+
86
+ #### Single Text Transfer
87
+ ```bash
88
+ python inference_arabic_author_transfer.py \
89
+ --text "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ" \
90
+ --author "ูŠูˆุณู ุฅุฏุฑูŠุณ"
91
+ ```
92
+
93
+ #### Batch Processing from File
94
+ ```bash
95
+ python inference_arabic_author_transfer.py \
96
+ --input_file input.csv \
97
+ --output_file results.csv
98
+ ```
99
+
100
+ ### 2. Python API
101
+
102
+ #### Single Text Transfer
103
+ ```python
104
+ from inference_arabic_author_transfer import ArabicAuthorTextTransfer
105
+
106
+ model = ArabicAuthorTextTransfer()
107
+
108
+ # Transfer to single author
109
+ result = model.transfer_style(
110
+ text="ุงู„ุนู„ู… ู†ูˆุฑ ูˆุงู„ุฌู‡ู„ ุธู„ุงู…",
111
+ target_author="ู†ุฌูŠุจ ู…ุญููˆุธ"
112
+ )
113
+ ```
114
+
115
+ #### Batch Processing
116
+ ```python
117
+ texts = [
118
+ "ุงู„ุญูŠุงุฉ ุฌู…ูŠู„ุฉ",
119
+ "ุงู„ุนู„ู… ุฃุณุงุณ ุงู„ุชู‚ุฏู…",
120
+ "ุงู„ุฃุฏุจ ุบุฐุงุก ุงู„ุฑูˆุญ"
121
+ ]
122
+
123
+ results = model.batch_transfer_style(
124
+ texts=texts,
125
+ target_author="ุฃุญู…ุฏ ุดูˆู‚ูŠ",
126
+ batch_size=4
127
+ )
128
+ ```
129
+
130
+ #### Get Supported Authors
131
+ ```python
132
+ authors = model.get_supported_authors()
133
+ print(f"Supported authors: {authors}")
134
+ ```
135
+
136
+ ### 3. Advanced Parameters
137
+
138
+ ```python
139
+ result = model.transfer_style(
140
+ text="ุงู„ู†ุต ุงู„ุฃุตู„ูŠ",
141
+ target_author="ูŠูˆุณู ุฅุฏุฑูŠุณ",
142
+ max_length=512, # Maximum generation length
143
+ num_beams=5, # Number of beams for beam search
144
+ temperature=1.0, # Sampling temperature
145
+ do_sample=False # Use deterministic generation
146
+ )
147
+ ```
148
+
149
+ ## ๐Ÿ“ Input File Format
150
+
151
+ For batch processing, your input file should have the following format:
152
+
153
+ ### CSV Format
154
+ ```csv
155
+ text,author
156
+ "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ","ูŠูˆุณู ุฅุฏุฑูŠุณ"
157
+ "ุงู„ุนู„ู… ู†ูˆุฑ ูˆุงู„ุฌู‡ู„ ุธู„ุงู…","ู†ุฌูŠุจ ู…ุญููˆุธ"
158
+ "ุงู„ุญูŠุงุฉ ุฌู…ูŠู„ุฉ","ุทู‡ ุญุณูŠู†"
159
+ ```
160
+
161
+ ### Excel Format
162
+ Same structure as CSV but in Excel format.
163
+
164
+ ## ๐Ÿ—๏ธ Model Architecture
165
+
166
+ - **Base Model:** UBC-NLP/AraT5v2-base-1024
167
+ - **Approach:** Descriptive Author Tokens + Prompt Engineering
168
+ - **Input Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:name>: [text]"`
169
+ - **Training:** Fine-tuned with author-specific tokens
170
+
171
+ ## ๐Ÿ”ฌ Technical Details
172
+
173
+ ### Stylometric Analysis
174
+ The model incorporates comprehensive stylometric analysis including:
175
+ - **Lexical Features:** Sentence length, word length, vocabulary richness
176
+ - **Syntactic Patterns:** Definite articles, conjunctions, prepositions
177
+ - **Author-Specific Vocabulary:** TF-IDF based characteristic words
178
+ - **Style Classification:** Formality, complexity, emotional intensity
179
+
180
+ ### Prompt Engineering
181
+ - **Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>: [original_text]"`
182
+ - **Author Tokens:** Descriptive tokens like `<author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>`
183
+ - **Target:** Generated text in author's style
184
+
185
+ ## ๐Ÿ“Š Performance Metrics
186
+
187
+ | Metric | Score |
188
+ |--------|-------|
189
+ | BLEU | 24.58 |
190
+ | chrF | 59.01 |
191
+
192
+ ## ๐ŸŽฏ Use Cases
193
+
194
+ - **Content Creation:** Generate text in specific author styles
195
+ - **Educational Tools:** Demonstrate different writing styles
196
+ - **Research:** Study Arabic literary styles and patterns
197
+ - **Creative Writing:** Inspire new content in classic styles
198
+
199
+ ## ๐Ÿค Contributing
200
+
201
+ This model was developed for the AraGenEval 2025 competition. For questions or contributions, please refer to the competition guidelines.
202
+
203
+ ## ๐Ÿ“„ License
204
+
205
+ This model is released under the same license as the base AraT5v2 model.
206
+
207
+ ## ๐Ÿ™ Acknowledgments
208
+
209
+ - **Competition:** AraGenEval 2025
210
+ - **Base Model:** UBC-NLP/AraT5v2-base-1024
211
+ - **Dataset:** Arabic Authorship Style Transfer Task 1
212
+ - **Results:** First Place Winner
213
+
214
+ ## ๐Ÿ“ž Contact
215
+
216
+ For questions about the model or usage, please refer to the competition documentation or model repository.
217
+
218
+ ---
219
+
220
+ **๐Ÿ† First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**