seungbo7747 commited on
Commit
830b3de
Β·
verified Β·
1 Parent(s): fdcf752

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -60,6 +60,87 @@ The following hyperparameters were used during training:
60
  | 0.6402 | 0.96 | 1200 | 0.5970 | 0.0855 | 0.0213 | 0.0854 | 0.0855 |
61
 
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ### Framework versions
64
 
65
  - Transformers 4.51.3
 
60
  | 0.6402 | 0.96 | 1200 | 0.5970 | 0.0855 | 0.0213 | 0.0854 | 0.0855 |
61
 
62
 
63
+ ### How to
64
+ ```python
65
+ import torch
66
+ from transformers import T5TokenizerFast, T5ForConditionalGeneration
67
+
68
+ # 1. λͺ¨λΈ 및 ν† ν¬λ‚˜μ΄μ € λ‘œλ“œ
69
+ model_id = "username/my_awesome_summarization_model" # μ‹€μ œ ν—ˆλΈŒ λͺ¨λΈ ID둜 λŒ€μ²΄
70
+ tokenizer = T5TokenizerFast.from_pretrained(model_id)
71
+ model = T5ForConditionalGeneration.from_pretrained(model_id)
72
+
73
+ # 2. GPU μ„€μ • (κ°€λŠ₯ν•œ 경우)
74
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
75
+ model.to(device)
76
+ print(f"Using device: {device}")
77
+ if torch.cuda.is_available():
78
+ print(f"GPU name: {torch.cuda.get_device_name(0)}")
79
+
80
+ # 3. μš”μ•½ ν•¨μˆ˜ μ •μ˜
81
+ def summarize_text(texts, max_input_length=512, max_output_length=150, num_beams=4):
82
+ """
83
+ μ£Όμ–΄μ§„ ν…μŠ€νŠΈ 리슀트λ₯Ό μš”μ•½ν•˜λŠ” ν•¨μˆ˜.
84
+
85
+ Args:
86
+ texts (list[str]): μš”μ•½ν•  ν…μŠ€νŠΈ 리슀트 (각 ν…μŠ€νŠΈλŠ” 'summarize: ' 접두사 포함 κ°€λŠ₯).
87
+ max_input_length (int): μž…λ ₯ ν…μŠ€νŠΈ μ΅œλŒ€ 길이.
88
+ max_output_length (int): 좜λ ₯ μš”μ•½ μ΅œλŒ€ 길이.
89
+ num_beams (int): λΉ” μ„œμΉ˜μ—μ„œ μ‚¬μš©ν•  λΉ” 수.
90
+
91
+ Returns:
92
+ list[str]: μš”μ•½λœ ν…μŠ€νŠΈ 리슀트.
93
+ """
94
+ # μž…λ ₯ ν…μŠ€νŠΈμ— 'summarize: ' 접두사 μΆ”κ°€ (μ—†λŠ” 경우)
95
+ inputs = [f"summarize: {text}" if not text.startswith("summarize: ") else text for text in texts]
96
+
97
+ # 토큰화
98
+ tokenized_inputs = tokenizer(
99
+ inputs,
100
+ max_length=max_input_length,
101
+ truncation=True,
102
+ padding=True,
103
+ return_tensors="pt"
104
+ )
105
+
106
+ # GPU둜 μž…λ ₯ 이동
107
+ tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}
108
+
109
+ # μš”μ•½ 생성
110
+ summary_ids = model.generate(
111
+ tokenized_inputs["input_ids"],
112
+ attention_mask=tokenized_inputs["attention_mask"],
113
+ max_length=max_output_length,
114
+ num_beams=num_beams,
115
+ early_stopping=True
116
+ )
117
+
118
+ # λ””μ½”λ”©
119
+ summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
120
+ return summaries
121
+
122
+ # 4. ν…ŒμŠ€νŠΈ μž…λ ₯ μ˜ˆμ‹œ
123
+ test_texts = [
124
+ "summarize: ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ„œμšΈμž…λ‹ˆλ‹€. μ„œμšΈμ€ ν•œλ°˜λ„ 쀑뢀에 μœ„μΉ˜ν•˜λ©°, μΈκ΅¬λŠ” μ•½ 970만 λͺ…μž…λ‹ˆλ‹€. μ„œμšΈμ€ 경제, λ¬Έν™”, μ •μΉ˜μ˜ μ€‘μ‹¬μ§€λ‘œ, ν•œκ°•μ΄ λ„μ‹œλ₯Ό κ°€λ‘œμ§€λ₯΄λ©° λ§Žμ€ 역사적 μœ μ‚°κ³Ό ν˜„λŒ€μ  건좕물이 κ³΅μ‘΄ν•©λ‹ˆλ‹€.",
125
+ "summarize: 인곡지λŠ₯(AI)은 컴퓨터 μ‹œμŠ€ν…œμ΄ μΈκ°„μ˜ μ§€λŠ₯을 λͺ¨λ°©ν•˜κ±°λ‚˜ μ΄ˆμ›”ν•˜λ„λ‘ λ§Œλ“œλŠ” κΈ°μˆ μž…λ‹ˆλ‹€. AIλŠ” λ¨Έμ‹ λŸ¬λ‹, λ”₯λŸ¬λ‹, μžμ—°μ–΄ 처리 λ“±μ˜ λΆ„μ•Όλ‘œ λ‚˜λ‰˜λ©°, 의료, 금육, 제쑰 λ“± λ‹€μ–‘ν•œ μ‚°μ—…μ—μ„œ ν™œμš©λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ AI의 윀리적 λ¬Έμ œμ™€ 일자리 λŒ€μ²΄ μš°λ €λ„ 제기되고 μžˆμŠ΅λ‹ˆλ‹€.",
126
+ "summarize: κΈ°ν›„ λ³€ν™”λŠ” 지ꡬ μ˜¨λ‚œν™”, ν•΄μˆ˜λ©΄ μƒμŠΉ, 극단적 기상 ν˜„μƒμ„ μ΄ˆλž˜ν•˜λŠ” κΈ€λ‘œλ²Œ λ¬Έμ œμž…λ‹ˆλ‹€. μ΄μ‚°ν™”νƒ„μ†Œ 배좜 κ°μ†Œμ™€ μž¬μƒ κ°€λŠ₯ μ—λ„ˆμ§€ μ‚¬μš©μ΄ ν•΄κ²°μ±…μœΌλ‘œ μ œμ‹œλ˜μ§€λ§Œ, ꡭ제적 ν˜‘λ ₯이 λΆ€μ‘±ν•œ μƒν™©μž…λ‹ˆλ‹€."
127
+ ]
128
+
129
+ # 5. μš”μ•½ μ‹€ν–‰ 및 κ²°κ³Ό 좜λ ₯
130
+ summaries = summarize_text(test_texts)
131
+ for i, (input_text, summary) in enumerate(zip(test_texts, summaries)):
132
+ print(f"\nInput {i+1}: {input_text}")
133
+ print(f"Summary {i+1}: {summary}")
134
+
135
+ # 6. 단일 ν…μŠ€νŠΈ μš”μ•½ μ˜ˆμ‹œ (κ°„λ‹¨ν•œ μ‚¬μš©)
136
+ single_text = "summarize: 블둝체인은 λΆ„μ‚°λœ λ””μ§€ν„Έ μž₯λΆ€λ‘œ, 거래 데이터λ₯Ό μ•”ν˜Έν™”ν•˜μ—¬ λ³΄μ•ˆμ„±κ³Ό 투λͺ…성을 μ œκ³΅ν•©λ‹ˆλ‹€. λΉ„νŠΈμ½”μΈκ³Ό 같은 μ•”ν˜Έν™”νλΏλ§Œ μ•„λ‹ˆλΌ 곡급망 관리, 의료 기둝 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ ν™œμš©λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€."
137
+ summary = summarize_text([single_text])[0]
138
+ print(f"\nSingle Input: {single_text}")
139
+ print(f"Single Summary: {summary}")
140
+ ```
141
+
142
+
143
+
144
  ### Framework versions
145
 
146
  - Transformers 4.51.3