File size: 13,757 Bytes
cb1c1cb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
<!-- manual page source format generated by PolyglotMan v3.0.3a12, -->
<!-- available via anonymous ftp from ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z -->
<HTML>
<HEAD>
<TITLE>PROLOGDB(5WN) manual page</TITLE>
</HEAD>
<BODY>
<A HREF="#toc">Table of Contents</A><P>
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2>
wn_pl - description of Prolog database files
<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION </A></H2>
The files
<B>wn_ </B><I>* </I><B>.pl </B> contain the WordNet database in a prolog-readable format. A prolog
interface to WordNet is not implemented. <P>
The prolog database is very large
and may take many minutes to load into the Prolog workspace. A separate
file has been created for each WordNet relation giving the user the ability
to load only those parts of the database that they are interested. <P>
See
<B>FILES </B>, below, for a list of the database files and <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
and <B><A HREF="wninput.5WN.html">wninput</B>(5WN)<B></B></A>
for detailed descriptions of the various WordNet relations (referred to
as <I>operators </I> in this manual page).
<H3><A NAME="sect2" HREF="#toc2">File Format </A></H3>
Each prolog database file
contains information corresponding to the synsets and word senses contained
in the WordNet database. In the prolog version of the database, the <I>synset_id
</I>s (defined below) are used as unique synset identifiers. <P>
Each line of
a file contains an operator that corresponds to a WordNet relation. All
lines with the same <I>operator </I> value are stored in the file <B>wn_ </B><I>operator
</I><B>.pl </B>. <P>
The general format of a line in a prolog database file is as follows:
<P>
<blockquote><I>operator<B>(<I>field1<B>,<I> ... <B>,<I>fieldn<B>). </B></I></B></I></B></I></B></I> <BR>
</blockquote>
<P>
Each line contains the name of the
operator, followed by a left parenthesis, a comma-separated list of fields,
a right parenthesis, and a period. Note there are no spaces, and each
line is terminated with a newline character.
<H3><A NAME="sect3" HREF="#toc3">Operators </A></H3>
Each WordNet relation
is represented in a separate file by <I>operator </I> name. Some operators are
reflexive (i.e. the "reverse" relation is implicit). So, for example, if
<B>x </B> is a hypernym of <B>y </B>, <B>y </B> is necessarily a hyponym of <B>x </B>. In the prolog
database, reflected pointers are usually implied for semantic relations.
<P>
Semantic relations are represented by a pair of <I>synset_id </I>s, in which
the first <I>synset_id </I> is generally the source of the relation and the second
is the target. If two pairs <I>synset_id </I><B>, </B><I>w_num </I> are present, the operator
represents a lexical relation between word forms. <P>
<B>s(<I>synset_id<B>,<I>w_num<B>,'<I>word<B>',<I>ss_type<B>,<I>sense_number<B>,<I>tag_count<B>).
</B></I></B></I></B></I></B></I></B></I></B></I></B><BR>
<blockquote>A <B>s </B> operator is present for every word sense in WordNet. In <B>wn_s.pl
</B>, <I>w_num </I> specifies the word number for <I>word </I> in the synset. </blockquote>
<P>
<B>g(<I>synset_id<B>,'(<I>gloss<B>)').
</B></I></B></I></B><BR>
<blockquote>The <B>g </B> operator specifies the gloss for a synset. </blockquote>
<P>
<B>hyp(<I>synset_id<B>,<I>synset_id<B>).
</B></I></B></I></B><BR>
<blockquote>The <B>hyp </B> operator specifies that the second synset is a hypernym of
the first synset. This relation holds for nouns and verbs. The reflexive
operator, hyponym, implies that the first synset is a hyponym of the second
synset. </blockquote>
<P>
<B>ent(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>ent </B> operator specifies that the
second synset is an entailment of first synset. This relation only holds
for verbs. </blockquote>
<P>
<B>sim(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>sim </B> operator specifies that
the second synset is similar in meaning to the first synset. This means
that the second synset is a satellite the first synset, which is the cluster
head. This relation only holds for adjective synsets contained in adjective
clusters. </blockquote>
<P>
<B>mm(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>mm </B> operator specifies that the
second synset is a member meronym of the first synset. This relation only
holds for nouns. The reflexive operator, member holonym, can be implied.
</blockquote>
<P>
<B>ms(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>ms </B> operator specifies that the second
synset is a substance meronym of the first synset. This relation only
holds for nouns. The reflexive operator, substance holonym, can be implied.
</blockquote>
<P>
<B>mp(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>mp </B> operator specifies that the second
synset is a part meronym of the first synset. This relation only holds
for nouns. The reflexive operator, part holonym, can be implied. </blockquote>
<P>
<B>cs(<I>synset_id<B>,<I>synset_id<B>).
</B></I></B></I></B><BR>
<blockquote>The <B>cs </B> operator specifies that the second synset is a cause of the
first synset. This relation only holds for verbs. </blockquote>
<P>
<B>vgp(<I>synset_id<B>,<I>synset_id<B>).
</B></I></B></I></B><BR>
<blockquote>The <B>vgp </B> operator specifies verb synsets that are similar in meaning
and should be grouped together when displayed in response to a grouped
synset search. </blockquote>
<P>
<B>at(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR>
<blockquote>The <B>at </B> operator defines the
attribute relation between noun and adjective synset pairs in which the
adjective is a value of the noun. For each pair, both relations are listed
(ie. each <I>synset_id </I> is both a source and target). </blockquote>
<P>
<B>ant(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>).
</B></I></B></I></B></I></B></I></B><BR>
<blockquote>The <B>ant </B> operator specifies antonymous <I>word </I>s. This is a lexical relation
that holds for all syntactic categories. For each antonymous pair, both
relations are listed (ie. each <I>synset_id,w_num </I> pair is both a source and
target word.) </blockquote>
<P>
<B>sa(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). </B></I></B></I></B></I></B></I></B><BR>
<blockquote>The <B>sa </B> operator
specifies that additional information about the first word can be obtained
by seeing the second word. This operator is only defined for verbs and
adjectives. There is no reflexive relation (ie. it cannot be inferred that
the additional information about the second word can be obtained from
the first word). </blockquote>
<P>
<B>ppl(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). </B></I></B></I></B></I></B></I></B><BR>
<blockquote>The <B>ppl </B> operator
specifies that the adjective first word is a participle of the verb second
word. The reflexive operator can be implied. </blockquote>
<P>
<B>per(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>).
</B></I></B></I></B></I></B></I></B><BR>
<blockquote>The <B>per </B> operator specifies two different relations based on the parts
of speech involved. If the first word is in an adjective synset, that
word pertains to either the noun or adjective second word. If the first
word is in an adverb synset, that word is derived from the adjective second
word. </blockquote>
<P>
<B>fr(<I>synset_id<B>,<I>f_num<B>,<I>w_num<B>). </B></I></B></I></B></I></B><BR>
<blockquote>The <B>fr </B> operator specifies a generic
sentence frame for one or all words in a synset. The operator is defined
only for verbs. </blockquote>
<H3><A NAME="sect4" HREF="#toc4">Field Definitions </A></H3>
A <I>synset_id </I> is a nine byte field in
which the first byte defines the syntactic category of the synset and
the remaining eight bytes are a <I>synset_offset </I>, as defined in <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
,
indicating the byte offset in the <B>data. </B><I>pos </I> file that corresponds to the
syntactic category. <P>
The syntactic category is encoded as: <P>
<blockquote><B>1 </B><tt> </tt> <tt> </tt> NOUN <BR>
<B>2 </B><tt> </tt> <tt> </tt> VERB <BR>
<B>3 </B><tt> </tt> <tt> </tt> ADJECTIVE <BR>
<B>4 </B><tt> </tt> <tt> </tt> ADVERB <BR>
</blockquote>
<P>
<I>w_num </I>, if present, indicates which word
in the synset is being referred to. Word numbers are assigned to the <I>word
</I> fields in a synset, from left to right, beginning with 1. When used to
represent lexical WordNet relations <I>w_num </I> may be 0, indicating that the
relation holds for all words in the synset indicated by the preceding
<I>synset_id </I>. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
for a discussion of semantic and lexical
relations. <P>
<I>ss_type </I> is a one character code indicating the synset type:
<P>
<blockquote><B>n </B><tt> </tt> <tt> </tt> NOUN <BR>
<B>v </B><tt> </tt> <tt> </tt> VERB <BR>
<B>a </B><tt> </tt> <tt> </tt> ADJECTIVE <BR>
<B>s </B><tt> </tt> <tt> </tt> ADJECTIVE SATELLITE <BR>
<B>r </B><tt> </tt> <tt> </tt> ADVERB <BR>
</blockquote>
<P>
<I>sense_number
</I> specifies the sense number of the word, within the part of speech encoded
in the <I>synset_id </I>, in the WordNet database. <P>
<I>word </I> is the ASCII text of
the word as entered in the synset by the lexicographer, with spaces replaced
by underscore characters (<B>_ </B>). The text of the word is case sensitive.
An adjective <I>word </I> is immediately followed by a syntactic marker if one
was specified in the lexicographer file. A syntactic marker is appended,
in parentheses, onto <I>word </I> without any intervening spaces. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
for a list of the syntactic markers for adjectives. <P>
Each synset has a
<I>gloss </I> that may contain a definition, one or more example sentences, or
both. Note that glosses are enclosed in single forward quotes and parentheses: <B>'(<I>gloss<B>)'
</B></I></B>. <P>
<I>f_num </I> specifies the generic sentence frame number for word <I>w_num </I> in
the synset indicated by <I>synset_id </I>. Note that when <I>w_num </I> is <B>0 </B>, the frame
number applies to all words in the synset. If non-zero, the frame applies
to that word in the synset. <P>
In WordNet, sense numbers are assigned as
described in <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
. <I>tag_count </I> is the number of times the sense was
tagged in the Semantic Concordances, and <B>0 </B> if it was not instantiated.
<H2><A NAME="sect5" HREF="#toc5">NOTES </A></H2>
Since single forward quotes are used to enclose character strings,
single quote characters found in <I>word </I> and <I>gloss </I> fields are represented
as two adjacent single quote characters. <P>
The load time can be greatly
reduced by creating "object language" versions of the files, an option
that is supported by some implementations, such as Quintus Prolog.
<H2><A NAME="sect6" HREF="#toc6">ENVIRONMENT
VARIABLES (UNIX) </A></H2>
<DL>
<DT><B>WNHOME</B> </DT>
<DD>Base directory for WordNet. Default is <B>/usr/local/WordNet-3.0
</B>. </DD>
</DL>
<H2><A NAME="sect7" HREF="#toc7">REGISTRY (WINDOWS) </A></H2>
<DL>
<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B> </DT>
<DD>Base directory
for WordNet. Default is <B>C:\Program Files\WordNet\3.0 </B>. </DD>
</DL>
<H2><A NAME="sect8" HREF="#toc8">FILES </A></H2>
All files are
in <B>WNHOME/prolog </B> on Unix platforms and <B>WNHome\prolog </B> on Windows platforms
<DL>
<DT><B>wn_s.pl</B> </DT>
<DD>synset pointers </DD>
<DT><B>wn_g.pl</B> </DT>
<DD>gloss pointers </DD>
<DT><B>wn_hyp.pl</B> </DT>
<DD>hypernym pointers
</DD>
<DT><B>wn_ent.pl</B> </DT>
<DD>entailment pointers </DD>
<DT><B>wn_sim.pl</B> </DT>
<DD>similar pointers </DD>
<DT><B>wn_mm.pl</B> </DT>
<DD>member
meronym pointers </DD>
<DT><B>wn_ms.pl</B> </DT>
<DD>substance meronym pointers </DD>
<DT><B>wn_mp.pl</B> </DT>
<DD>part meronym
pointers </DD>
<DT><B>wn_cs.pl</B> </DT>
<DD>cause pointers </DD>
<DT><B>wn_vgp.pl</B> </DT>
<DD>grouped verb pointers </DD>
<DT><B>wn_at.pl</B>
</DT>
<DD>attribute pointers </DD>
<DT><B>wn_ant.pl</B> </DT>
<DD>antonym pointers </DD>
<DT><B>wn_sa.pl</B> </DT>
<DD>see also pointers
</DD>
<DT><B>wn_ppl.pl</B> </DT>
<DD>participle pointers </DD>
<DT><B>wn_per.pl</B> </DT>
<DD>pertainym pointers </DD>
<DT><B>wn_fr.pl</B> </DT>
<DD>frame
pointers </DD>
</DL>
<H2><A NAME="sect9" HREF="#toc9">SEE ALSO </A></H2>
<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
, <B><A HREF="wngroups.7WN.html">wngroups</B>(7WN)</A>
, <B><A HREF="wnpkgs.7WN.html">wnpkgs</B>(7WN)</A>
.
<P>
<HR><P>
<A NAME="toc"><B>Table of Contents</B></A><P>
<UL>
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI>
<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI>
<UL>
<LI><A NAME="toc2" HREF="#sect2">File Format</A></LI>
<LI><A NAME="toc3" HREF="#sect3">Operators</A></LI>
<LI><A NAME="toc4" HREF="#sect4">Field Definitions</A></LI>
</UL>
<LI><A NAME="toc5" HREF="#sect5">NOTES</A></LI>
<LI><A NAME="toc6" HREF="#sect6">ENVIRONMENT VARIABLES (UNIX)</A></LI>
<LI><A NAME="toc7" HREF="#sect7">REGISTRY (WINDOWS)</A></LI>
<LI><A NAME="toc8" HREF="#sect8">FILES</A></LI>
<LI><A NAME="toc9" HREF="#sect9">SEE ALSO</A></LI>
</UL>
</BODY></HTML>
|