Omartificial-Intelligence-Space commited on
Commit
3942106
ยท
verified ยท
1 Parent(s): ce835a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -137
README.md CHANGED
@@ -18,133 +18,46 @@ tags:
18
 
19
  A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
20
 
21
- ## ๐ŸŽฏ Model Performance
22
-
23
- - **BLEU Score:** 24.58
24
- - **chrF Score:** 59.01
25
- - **Competition:** First Place in AraGenEval 2024
26
- - **Supported Authors:** 21 Arabic authors
27
-
28
- ## ๐Ÿš€ Quick Start
29
-
30
- ### Installation
31
-
32
- ```bash
33
- pip install -r requirements.txt
34
- ```
35
-
36
- ### Basic Usage
37
-
38
- ```python
39
- from inference_arabic_author_transfer import ArabicAuthorTextTransfer
40
 
41
- # Initialize model
42
- model = ArabicAuthorTextTransfer()
43
 
44
- # Transfer text to author's style
45
- text = "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ"
46
- target_author = "ูŠูˆุณู ุฅุฏุฑูŠุณ"
47
-
48
- result = model.transfer_style(text, target_author)
49
- print(f"Original: {text}")
50
- print(f"Transferred: {result}")
51
- ```
52
 
53
- ## ๐Ÿ“š Supported Authors
 
 
 
54
 
55
- 1. ูŠูˆุณู ุฅุฏุฑูŠุณ
56
- 2. ู†ุฌูŠุจ ู…ุญููˆุธ
57
- 3. ุทู‡ ุญุณูŠู†
58
- 4. ุญุณู† ุญู†ููŠ
59
- 5. ุนุจุฏ ุงู„ุบูุงุฑ ู…ูƒุงูˆูŠ
60
- 6. ุณู„ุงู…ุฉ ู…ูˆุณู‰
61
- 7. ุฃุญู…ุฏ ุดูˆู‚ูŠ
62
- 8. ุฃุญู…ุฏ ุชูŠู…ูˆุฑ ุจุงุดุง
63
- 9. ุซุฑูˆุช ุฃุจุงุธุฉ
64
- 10. ุฌุจุฑุงู† ุฎู„ูŠู„ ุฌุจุฑุงู†
65
- 11. ุฑูˆุจุฑุช ุจุงุฑ
66
- 12. ูˆูŠู„ูŠุงู… ุดูŠูƒุณุจูŠุฑ
67
- 13. ุฃู…ูŠู† ุงู„ุฑูŠุญุงู†ูŠ
68
- 14. ุบูˆุณุชุงู ู„ูˆุจูˆู†
69
- 15. ุฃุญู…ุฏ ุฃู…ูŠู†
70
- 16. ู…ุญู…ุฏ ุญุณูŠู† ู‡ูŠูƒู„
71
- 17. ุฌูุฑุฌูŠ ุฒูŠุฏุงู†
72
- 18. ุนุจุงุณ ู…ุญู…ูˆุฏ ุงู„ุนู‚ุงุฏ
73
- 19. ูุคุงุฏ ุฒูƒุฑูŠุง
74
- 20. ูƒุงู…ู„ ูƒูŠู„ุงู†ูŠ
75
- 21. ู†ูˆุงู„ ุงู„ุณุนุฏุงูˆูŠ
76
-
77
- ## ๐Ÿ”ง Usage Examples
78
-
79
- ### 1. Command Line Interface
80
-
81
- #### Interactive Mode
82
- ```bash
83
- python inference_arabic_author_transfer.py --interactive
84
- ```
85
 
86
- #### Single Text Transfer
87
- ```bash
88
- python inference_arabic_author_transfer.py \
89
- --text "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ" \
90
- --author "ูŠูˆุณู ุฅุฏุฑูŠุณ"
91
- ```
92
 
93
- #### Batch Processing from File
94
- ```bash
95
- python inference_arabic_author_transfer.py \
96
- --input_file input.csv \
97
- --output_file results.csv
98
- ```
99
 
100
- ### 2. Python API
101
 
102
- #### Single Text Transfer
103
- ```python
104
- from inference_arabic_author_transfer import ArabicAuthorTextTransfer
105
 
106
- model = ArabicAuthorTextTransfer()
 
 
 
107
 
108
- # Transfer to single author
109
- result = model.transfer_style(
110
- text="ุงู„ุนู„ู… ู†ูˆุฑ ูˆุงู„ุฌู‡ู„ ุธู„ุงู…",
111
- target_author="ู†ุฌูŠุจ ู…ุญููˆุธ"
112
- )
113
- ```
114
 
115
- #### Batch Processing
116
- ```python
117
- texts = [
118
- "ุงู„ุญูŠุงุฉ ุฌู…ูŠู„ุฉ",
119
- "ุงู„ุนู„ู… ุฃุณุงุณ ุงู„ุชู‚ุฏู…",
120
- "ุงู„ุฃุฏุจ ุบุฐุงุก ุงู„ุฑูˆุญ"
121
- ]
122
-
123
- results = model.batch_transfer_style(
124
- texts=texts,
125
- target_author="ุฃุญู…ุฏ ุดูˆู‚ูŠ",
126
- batch_size=4
127
- )
128
- ```
129
 
130
- #### Get Supported Authors
131
- ```python
132
- authors = model.get_supported_authors()
133
- print(f"Supported authors: {authors}")
134
- ```
135
 
136
- ### 3. Advanced Parameters
137
-
138
- ```python
139
- result = model.transfer_style(
140
- text="ุงู„ู†ุต ุงู„ุฃุตู„ูŠ",
141
- target_author="ูŠูˆุณู ุฅุฏุฑูŠุณ",
142
- max_length=512, # Maximum generation length
143
- num_beams=5, # Number of beams for beam search
144
- temperature=1.0, # Sampling temperature
145
- do_sample=False # Use deterministic generation
146
- )
147
- ```
148
 
149
  ## ๐Ÿ“ Input File Format
150
 
@@ -153,34 +66,14 @@ For batch processing, your input file should have the following format:
153
  ### CSV Format
154
  ```csv
155
  text,author
156
- "ุงู„ุชุนู„ูŠู… ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญูŠุงุชู†ุง ุงู„ูŠูˆู…ูŠุฉ","ูŠูˆุณู ุฅุฏุฑูŠุณ"
157
- "ุงู„ุนู„ู… ู†ูˆุฑ ูˆุงู„ุฌู‡ู„ ุธู„ุงู…","ู†ุฌูŠุจ ู…ุญููˆุธ"
158
- "ุงู„ุญูŠุงุฉ ุฌู…ูŠู„ุฉ","ุทู‡ ุญุณูŠู†"
159
- ```
160
 
161
- ### Excel Format
162
- Same structure as CSV but in Excel format.
163
 
164
- ## ๐Ÿ—๏ธ Model Architecture
165
 
166
- - **Base Model:** UBC-NLP/AraT5v2-base-1024
167
- - **Approach:** Descriptive Author Tokens + Prompt Engineering
168
- - **Input Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:name>: [text]"`
169
- - **Training:** Fine-tuned with author-specific tokens
170
 
171
- ## ๐Ÿ”ฌ Technical Details
172
-
173
- ### Stylometric Analysis
174
- The model incorporates comprehensive stylometric analysis including:
175
- - **Lexical Features:** Sentence length, word length, vocabulary richness
176
- - **Syntactic Patterns:** Definite articles, conjunctions, prepositions
177
- - **Author-Specific Vocabulary:** TF-IDF based characteristic words
178
- - **Style Classification:** Formality, complexity, emotional intensity
179
 
180
- ### Prompt Engineering
181
- - **Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>: [original_text]"`
182
- - **Author Tokens:** Descriptive tokens like `<author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>`
183
- - **Target:** Generated text in author's style
184
 
185
  ## ๐Ÿ“Š Performance Metrics
186
 
@@ -215,6 +108,18 @@ This model is released under the same license as the base AraT5v2 model.
215
 
216
  For questions about the model or usage, please refer to the competition documentation or model repository.
217
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  ---
219
 
220
  **๐Ÿ† First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**
 
18
 
19
  A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
20
 
21
+ ## ๐Ÿ”— Paper Link (ACL Anthology)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ ๐Ÿ“˜ **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]
 
24
 
25
+ ## ๐Ÿ—๏ธ Model Architecture
 
 
 
 
 
 
 
26
 
27
+ - **Base Model:** UBC-NLP/AraT5v2-base-1024
28
+ - **Approach:** Descriptive Author Tokens + Prompt Engineering
29
+ - **Input Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:name>: [text]"`
30
+ - **Training:** Fine-tuned with author-specific tokens
31
 
32
+ ## ๐Ÿ”ฌ Technical Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ ### Stylometric Analysis
35
+ The model incorporates comprehensive stylometric analysis including:
36
+ - **Lexical Features:** Sentence length, word length, vocabulary richness
37
+ - **Syntactic Patterns:** Definite articles, conjunctions, prepositions
38
+ - **Author-Specific Vocabulary:** TF-IDF based characteristic words
39
+ - **Style Classification:** Formality, complexity, emotional intensity
40
 
41
+ ### Prompt Engineering
42
+ - **Format:** `"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>: [original_text]"`
43
+ - **Author Tokens:** Descriptive tokens like `<author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>`
44
+ - **Target:** Generated text in author's style
 
 
45
 
 
46
 
47
+ ## ๐ŸŽฏ Model Performance
 
 
48
 
49
+ - **BLEU Score:** 24.58
50
+ - **chrF Score:** 59.01
51
+ - **Competition:** First Place in AraGenEval 2024
52
+ - **Supported Authors:** 21 Arabic authors
53
 
 
 
 
 
 
 
54
 
55
+ ## ๐Ÿ“š Supported Authors
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ <p align="center">
58
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/qDHUSa6ZvD1LjN9uJs-jp.png" width="600"/>
59
+ </p>
 
 
60
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ## ๐Ÿ“ Input File Format
63
 
 
66
  ### CSV Format
67
  ```csv
68
  text,author
 
 
 
 
69
 
 
 
70
 
 
71
 
 
 
 
 
72
 
73
+ ```
 
 
 
 
 
 
 
74
 
75
+ ### Excel Format
76
+ Same structure as CSV but in Excel format.
 
 
77
 
78
  ## ๐Ÿ“Š Performance Metrics
79
 
 
108
 
109
  For questions about the model or usage, please refer to the competition documentation or model repository.
110
 
111
+
112
+ ## BibTeX Citation
113
+
114
+ ```bibtex
115
+ @inproceedings{nacar2025anlpers,
116
+ title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
117
+ author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
118
+ booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
119
+ pages={49--53},
120
+ year={2025}
121
+ }
122
+ ```
123
  ---
124
 
125
  **๐Ÿ† First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**