File size: 11,601 Bytes
cb1c1cb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
<!-- manual page source format generated by PolyglotMan v3.0.3a12, -->
<!-- available via anonymous ftp from ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z -->
<HTML>
<HEAD>
<TITLE>MORPHY(7WN) manual page</TITLE>
</HEAD>
<BODY>
<A HREF="#toc">Table of Contents</A><P>
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2>
morphy - discussion of WordNet's morphological processing
<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION
</A></H2>
Although only base forms of words are usually stored in WordNet, searches
may be done on inflected forms. A set of morphology functions, Morphy,
is applied to the search string to generate a form that is present in
WordNet. <P>
Morphology in WordNet uses two types of processes to try to convert
the string passed into one that can be found in the WordNet database. There
are lists of inflectional endings, based on syntactic category, that can
be detached from individual words in an attempt to find a form of the
word that is in WordNet. There are also exception list files, one for
each syntactic category, in which a search for an inflected form is done.
Morphy tries to use these two processes in an intelligent manner to translate
the string passed to the base form found in WordNet. Morphy first checks
for exceptions, then uses the rules of detachment. The Morphy functions
are not independent from WordNet. After each transformation, WordNet is
searched for the resulting string in the syntactic category specified.
<P>
The Morphy functions are passed a string and a syntactic category. A
string is either a single word or a collocation. Since some words, such
as <B>axes </B> can have more than one base form (<B>axe </B> and <B>axis </B>), Morphy works
in the following manner. The first time that Morphy is called with a specific
string, it returns a base form. For each subsequent call to Morphy made
with a <FONT SIZE=-1><B>NULL </B></FONT>
string argument, Morphy returns another base form. Whenever
Morphy cannot perform a transformation, whether on the first call for
a word or subsequent calls, <FONT SIZE=-1><B>NULL </B></FONT>
is returned. A transformation to a
valid English string will return <FONT SIZE=-1><B>NULL </B></FONT>
if the base form of the string
is not in WordNet. <P>
The morphological functions are found in the WordNet
library. See <B><A HREF="morph.3WN.html">morph</B>(3WN)</A>
for information on using these functions.
<H3><A NAME="sect2" HREF="#toc2">Rules
of Detachment </A></H3>
The following table shows the rules of detachment used by
Morphy. If a word ends with one of the suffixes, it is stripped from the
word and the corresponding ending is added. Then WordNet is searched for
the resulting string. No rules are applicable to adverbs. <P>
<TABLE BORDER=0>
<TR> <TD ALIGN=CENTER><B>POS </B> </TD> <TD ALIGN=CENTER><B>Suffix
</B> </TD> <TD ALIGN=CENTER><B>Ending </B> </TD> </TR>
<TR> <TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"s" </TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ses" </TD> <TD ALIGN=LEFT>"s" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"xes" </TD> <TD ALIGN=LEFT>"x" </TD>
</TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"zes" </TD> <TD ALIGN=LEFT>"z" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ches" </TD> <TD ALIGN=LEFT>"ch" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"shes" </TD> <TD ALIGN=LEFT>"sh" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN
</TD> <TD ALIGN=LEFT>"men" </TD> <TD ALIGN=LEFT>"man" </TD> </TR>
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ies" </TD> <TD ALIGN=LEFT>"y" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"s" </TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ies" </TD> <TD ALIGN=LEFT>"y"
</TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"es" </TD> <TD ALIGN=LEFT>"e" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"es" </TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ed" </TD> <TD ALIGN=LEFT>"e" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ed"
</TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ing" </TD> <TD ALIGN=LEFT>"e" </TD> </TR>
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ing" </TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"er" </TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"est"
</TD> <TD ALIGN=LEFT>"" </TD> </TR>
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"er" </TD> <TD ALIGN=LEFT>"e" </TD> </TR>
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"est" </TD> <TD ALIGN=LEFT>"e" </TD> </TR>
</TABLE>
<H3><A NAME="sect3" HREF="#toc3">Exception Lists </A></H3>
There is one
exception list file for each syntactic category. The exception lists contain
the morphological transformations for strings that are not regular and
therefore cannot be processed in an algorithmic manner. Each line of an
exception list contains an inflected form of a word or collocation, followed
by one or more base forms. The list is kept in alphabetical order and
a binary search is used to find words in these lists. See <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
for
information on the format of the exception list files.
<H3><A NAME="sect4" HREF="#toc4">Single Words </A></H3>
In
general, single words are relatively easy to process. Morphy first looks
for the word in the exception list. If it is found the first base form
is returned. Subsequent calls with a <FONT SIZE=-1><B>NULL </B></FONT>
argument return additional
base forms, if present. A <FONT SIZE=-1><B>NULL </B></FONT>
is returned when there are no more base
forms of the word. <P>
If the word is not found in the exception list corresponding
to the syntactic category, an algorithmic process using the rules of detachment
looks for a matching suffix. If a matching suffix is found, a corresponding
ending is applied (sometimes this ending is a <FONT SIZE=-1><B>NULL </B></FONT>
string, so in effect
the suffix is removed from the word), and WordNet is consulted to see
if the resulting word is found in the desired part of speech.
<H3><A NAME="sect5" HREF="#toc5">Collocations
</A></H3>
As opposed to single words, collocations can be quite difficult to transform
into a base form that is present in WordNet. In general, only base forms
of words, even those comprising collocations, are stored in WordNet, such
as <B>attorney general </B>. Transforming the collocation <B>attorneys general </B>
is then simply a matter of finding the base forms of the individual words
comprising the collocation. This usually works for nouns, therefore non-conforming
nouns, such as <B>customs duty </B> are presently entered in the noun exception
list. <P>
Verb collocations that contain prepositions, such as <B>ask for it
</B>, are more difficult. As with single words, the exception list is searched
first. If the collocation is not found, special code in Morphy determines
whether a verb collocation includes a preposition. If it does, a function
is called to try to find the base form in the following manner. It is
assumed that the first word in the collocation is a verb and that the
last word is a noun. The algorithm then builds a search string with the
base forms of the verb and noun, leaving the remainder of the collocation
(usually just the preposition, but more words may be involved) in the
middle. For example, passed <B>asking for it </B>, the database search would
be performed with <B>ask for it </B>, which is found in WordNet, and therefore
returned from Morphy. If a verb collocation does not contain a preposition,
then the base form of each word in the collocation is found and WordNet
is searched for the resulting string.
<H3><A NAME="sect6" HREF="#toc6">Hyphenation </A></H3>
Hyphenation also presents
special difficulties when searching WordNet. It is often a subjective decision
as to whether a word is hyphenated, joined as one word, or is a collocation
of several words, and which of the various forms are entered into WordNet.
When Morphy breaks a string into "words", it looks for both spaces and
hyphens as delimiters. It also looks for periods in strings and removes
them if an exact match is not found. A search for an abbreviation like
<B>oct. </B> return the synset for <B>{ October, Oct } </B>. Not every pattern of hyphenated
and collocated string is searched for properly, so it may be advantageous
to specify several search strings if the results of a search attempt seem
incomplete.
<H3><A NAME="sect7" HREF="#toc7">Special Processing for nouns ending with 'ful' </A></H3>
Morphy contains
code that searches for nouns ending with <B>ful </B> and performs a transformation
on the substring preceeding it. It then appends 'ful' back onto the resulting
string and returns it. For example, if passed the nouns <B>boxesful </B>, it will
return <B>boxful </B>.
<H2><A NAME="sect8" HREF="#toc8">BUGS </A></H2>
Since many noun collocations contains prepositions,
such as <B>line of products </B>, an algorithm similar to that used for verbs
should be written for nouns. In the present scheme, if Morphy is passed
<B>lines of products </B>, the search string becomes <B>line of product </B>, which
is not in WordNet <P>
Morphy will allow non-words to be converted to words,
if they follow one of the rules described above. For example, it will
happily convert <B>plantes </B> to <B>plants </B>.
<H2><A NAME="sect9" HREF="#toc9">ENVIRONMENT VARIABLES (UNIX) </A></H2>
<DL>
<DT><B>WNHOME</B>
</DT>
<DD>Base directory for WordNet. Default is <B>/usr/local/WordNet-3.0 </B>. </DD>
<DT><B>WNSEARCHDIR</B>
</DT>
<DD>Directory in which the WordNet database has been installed. Default
is <B>WNHOME/dict </B>. </DD>
</DL>
<H2><A NAME="sect10" HREF="#toc10">REGISTRY (WINDOWS) </A></H2>
<DL>
<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B>
</DT>
<DD>Base directory for WordNet. Default is <B>C:\Program Files\WordNet\3.0 </B>. </DD>
</DL>
<H2><A NAME="sect11" HREF="#toc11">FILES
</A></H2>
<DL>
<DT><B><I>pos </I>.exc</B> </DT>
<DD>morphology exception lists </DD>
</DL>
<H2><A NAME="sect12" HREF="#toc12">SEE ALSO </A></H2>
<B><A HREF="wn.1WN.html">wn</B>(1WN)</A>
, <B><A HREF="wnb.1WN.html">wnb</B>(1WN)</A>
, <B><A HREF="binsrch.3WN.html">binsrch</B>(3WN)</A>
,
<B><A HREF="morph.3WN.html">morph</B>(3WN)</A>
, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
, <B><A HREF="wninput.7WN.html">wninput</B>(7WN)</A>
. <P>
<HR><P>
<A NAME="toc"><B>Table of Contents</B></A><P>
<UL>
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI>
<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI>
<UL>
<LI><A NAME="toc2" HREF="#sect2">Rules of Detachment</A></LI>
<LI><A NAME="toc3" HREF="#sect3">Exception Lists</A></LI>
<LI><A NAME="toc4" HREF="#sect4">Single Words</A></LI>
<LI><A NAME="toc5" HREF="#sect5">Collocations</A></LI>
<LI><A NAME="toc6" HREF="#sect6">Hyphenation</A></LI>
<LI><A NAME="toc7" HREF="#sect7">Special Processing for nouns ending with 'ful'</A></LI>
</UL>
<LI><A NAME="toc8" HREF="#sect8">BUGS</A></LI>
<LI><A NAME="toc9" HREF="#sect9">ENVIRONMENT VARIABLES (UNIX)</A></LI>
<LI><A NAME="toc10" HREF="#sect10">REGISTRY (WINDOWS)</A></LI>
<LI><A NAME="toc11" HREF="#sect11">FILES</A></LI>
<LI><A NAME="toc12" HREF="#sect12">SEE ALSO</A></LI>
</UL>
</BODY></HTML>
|