Spaces:
Sleeping
Sleeping
File size: 10,694 Bytes
d1221ff 999c3ec d1221ff 999c3ec d1221ff 999c3ec d1221ff | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | """Medium task: SQuAD v1 longer passages with multiple-hop reasoning.
Context: 800–2000 characters, truncated to 65%.
The answer is always within the truncated portion.
Episode: 2 steps (summarize → answer).
Grading: token-level F1 score (same as SQuAD official eval).
"""
import random
import logging
from typing import Dict, Any, List, Optional
from .base import BaseSummarizationTask
logger = logging.getLogger(__name__)
FALLBACK_SAMPLES: List[Dict[str, Any]] = [
{
"context": (
"The Byzantine Empire, also referred to as the Eastern Roman Empire or Byzantium, "
"was the continuation of the Roman Empire primarily in its eastern provinces during "
"Late Antiquity and the Middle Ages, when its capital city was Constantinople. It "
"survived the fragmentation and fall of the Western Roman Empire in the 5th century "
"AD and continued to exist for an additional thousand years until the fall of "
"Constantinople to the Ottoman Empire in 1453. During most of its existence, the "
"empire was the most powerful economic, cultural, and military force in Europe. "
"Both the See of Constantinople and the Ecumenical Patriarchate, which are Christian "
"institutions, trace their origins to the foundation of Constantinople by Constantine "
"the Great in 330 AD. The empire's rich history, blending Greek, Roman, and Christian "
"traditions, produced important developments in art, architecture, and philosophy "
"that continue to influence Eastern Europe to this day."
),
"question": "In what year did the Byzantine Empire fall to the Ottoman Empire?",
"answer_list": ["1453"],
},
{
"context": (
"The Industrial Revolution was the transition to new manufacturing processes in Great "
"Britain, continental Europe, and the United States, from about 1760 to sometime between "
"1820 and 1840. This transition included going from hand production methods to machines; "
"new chemical manufacturing and iron production processes; the increasing use of steam "
"power and water power; the development of machine tools; and the rise of the mechanised "
"factory system. Output greatly increased, and a result was an unprecedented rise in "
"population and the rate of population growth. The textile industry was the first to use "
"modern production methods, and textiles became the dominant industry in terms of "
"employment, value of output, and capital invested. Cotton was the leading textile of "
"the Industrial Revolution and assumed its dominant role because cotton could be cultivated "
"at scale in warm climates outside Europe, especially in what became the southern United States."
),
"question": "Which industry was the first to use modern production methods during the Industrial Revolution?",
"answer_list": ["textile industry", "The textile industry", "textiles"],
},
{
"context": (
"Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to "
"intelligence of humans and other animals. Example tasks in which this is done include "
"speech recognition, computer vision, translation between (natural) languages, as well as "
"other mappings of inputs. AI applications include advanced web search engines (e.g., "
"Google Search), recommendation systems (used by YouTube, Amazon, and Netflix), "
"understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Waymo), "
"generative or creative tools (ChatGPT and AI art), automated decision-making, and "
"competing at the highest level in strategic game systems (such as chess and Go). As "
"machines become increasingly capable, tasks considered to require 'intelligence' are "
"often removed from the definition of AI, a phenomenon known as the AI effect. For "
"instance, optical character recognition is frequently excluded from things considered "
"to be AI, having become a routine technology. Artificial intelligence was founded as "
"an academic discipline in 1956, and in the years since it has experienced several waves "
"of optimism, followed by disappointment and the loss of funding (known as an 'AI winter'), "
"followed by new approaches, success, and renewed funding."
),
"question": "In what year was artificial intelligence founded as an academic discipline?",
"answer_list": ["1956"],
},
{
"context": (
"The human brain is the command center for the human nervous system. It receives signals "
"from the body's sensory organs and outputs information to the muscles. The human brain "
"has the same basic structure as other mammal brains, but is larger in relation to body "
"size than any other brains. The cerebral cortex is the outer layer of the brain and is "
"responsible for most higher-order functions, including consciousness, memory, reasoning, "
"and language. It is divided into four lobes: the frontal lobe, parietal lobe, temporal "
"lobe, and occipital lobe. The brain communicates with the rest of the body through the "
"spinal cord and a network of nerves, which together form the peripheral nervous system. "
"The average adult human brain weighs about 1.4 kilograms (3 lb) and contains approximately "
"86 billion neurons. These neurons are connected by trillions of synaptic connections, "
"making the brain one of the most complex structures in the known universe."
),
"question": "What is the outer layer of the brain called?",
"answer_list": ["cerebral cortex", "The cerebral cortex"],
},
{
"context": (
"Climate change refers to long-term shifts in temperatures and weather patterns. Such "
"shifts can be natural, due to changes in the sun's activity or large volcanic eruptions. "
"But since the 1800s, human activities have been the main driver of climate change, "
"primarily due to the burning of fossil fuels like coal, oil and gas. Burning fossil fuels "
"generates greenhouse gas emissions that act like a blanket wrapped around the Earth, "
"trapping the sun's heat and raising temperatures. The main greenhouse gases that are "
"causing climate change include carbon dioxide and methane. These come from using gasoline "
"for driving a car or coal for heating a building, for example. Clearing land and forests "
"can also release carbon dioxide. Agriculture, oil and gas operations are major sources of "
"methane emissions. Energy, industry, transport, buildings, agriculture and land use are "
"among the main sectors causing greenhouse gas emissions. Between 1880 and 2012, the "
"average global temperature increased by 0.85 degrees Celsius."
),
"question": "What has been the main driver of climate change since the 1800s?",
"answer_list": ["human activities", "burning of fossil fuels", "fossil fuels"],
},
]
TRUNCATION_RATIO = 0.65 # Show 65% of context (harder than easy)
class MediumTask(BaseSummarizationTask):
"""Medium-length context factual QA task using longer SQuAD passages."""
name = "medium"
max_steps = 2
def __init__(self):
self._samples: List[Dict[str, Any]] = []
self._load_dataset()
def _load_dataset(self):
"""Load SQuAD samples with longer passages."""
try:
from datasets import load_dataset
logger.info("Loading SQuAD dataset for medium task...")
ds = load_dataset("rajpurkar/squad", split="validation", trust_remote_code=False)
target_min, target_max = 900, 2500 # chars
seen_contexts = set()
for item in ds:
context: str = item["context"]
if not (target_min <= len(context) <= target_max):
continue
if context in seen_contexts:
continue
seen_contexts.add(context)
answers = item["answers"]["text"]
answer_starts = item["answers"]["answer_start"]
cutoff = int(len(context) * TRUNCATION_RATIO)
if not answers or not all(s < cutoff for s in answer_starts):
continue
self._samples.append(
{
"context": context,
"question": item["question"],
"answer_list": list(set(answers)),
}
)
if len(self._samples) >= 500:
break
logger.info(f"Medium task: loaded {len(self._samples)} SQuAD samples")
except Exception as e:
logger.warning(f"Could not load SQuAD dataset for medium: {e}. Using fallback.")
if not self._samples:
self._samples = FALLBACK_SAMPLES
def get_sample(self, seed: Optional[int] = None) -> Dict[str, Any]:
rng = random.Random(seed)
item = rng.choice(self._samples)
context = item["context"]
cutoff = int(len(context) * TRUNCATION_RATIO)
category = self.infer_category(item["question"])
return {
"context": context,
"truncated_context": context[:cutoff],
"truncation_ratio": TRUNCATION_RATIO,
"category": category,
"source_type": "long_form_reference",
"question": item["question"],
"answer": item["answer_list"][0],
"answer_list": item["answer_list"],
}
def get_summarize_prompt(self, truncated_context: str, truncation_ratio: float) -> str:
pct = int(truncation_ratio * 100)
return (
f"Here is a document excerpt ({pct}% of the full text):\n\n"
f"{truncated_context}\n\n"
"Produce a retrieval-safe summary for a downstream assistant. Preserve "
"specific names, dates, numbers, causal relationships, and claims that "
"are likely to be queried later. Keep the summary under 200 words while "
"retaining the details needed for factual QA."
)
|