|
|
|
|
|
|
|
|
|
|
|
<HTML> |
|
|
<HEAD> |
|
|
<TITLE>MORPHY(7WN) manual page</TITLE> |
|
|
</HEAD> |
|
|
<BODY> |
|
|
<A HREF="#toc">Table of Contents</A><P> |
|
|
|
|
|
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2> |
|
|
morphy - discussion of WordNet's morphological processing |
|
|
<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION |
|
|
</A></H2> |
|
|
Although only base forms of words are usually stored in WordNet, searches |
|
|
may be done on inflected forms. A set of morphology functions, Morphy, |
|
|
is applied to the search string to generate a form that is present in |
|
|
WordNet. <P> |
|
|
Morphology in WordNet uses two types of processes to try to convert |
|
|
the string passed into one that can be found in the WordNet database. There |
|
|
are lists of inflectional endings, based on syntactic category, that can |
|
|
be detached from individual words in an attempt to find a form of the |
|
|
word that is in WordNet. There are also exception list files, one for |
|
|
each syntactic category, in which a search for an inflected form is done. |
|
|
Morphy tries to use these two processes in an intelligent manner to translate |
|
|
the string passed to the base form found in WordNet. Morphy first checks |
|
|
for exceptions, then uses the rules of detachment. The Morphy functions |
|
|
are not independent from WordNet. After each transformation, WordNet is |
|
|
searched for the resulting string in the syntactic category specified. |
|
|
<P> |
|
|
The Morphy functions are passed a string and a syntactic category. A |
|
|
string is either a single word or a collocation. Since some words, such |
|
|
as <B>axes </B> can have more than one base form (<B>axe </B> and <B>axis </B>), Morphy works |
|
|
in the following manner. The first time that Morphy is called with a specific |
|
|
string, it returns a base form. For each subsequent call to Morphy made |
|
|
with a <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
string argument, Morphy returns another base form. Whenever |
|
|
Morphy cannot perform a transformation, whether on the first call for |
|
|
a word or subsequent calls, <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
is returned. A transformation to a |
|
|
valid English string will return <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
if the base form of the string |
|
|
is not in WordNet. <P> |
|
|
The morphological functions are found in the WordNet |
|
|
library. See <B><A HREF="morph.3WN.html">morph</B>(3WN)</A> |
|
|
for information on using these functions. |
|
|
<H3><A NAME="sect2" HREF="#toc2">Rules |
|
|
of Detachment </A></H3> |
|
|
The following table shows the rules of detachment used by |
|
|
Morphy. If a word ends with one of the suffixes, it is stripped from the |
|
|
word and the corresponding ending is added. Then WordNet is searched for |
|
|
the resulting string. No rules are applicable to adverbs. <P> |
|
|
<TABLE BORDER=0> |
|
|
<TR> <TD ALIGN=CENTER><B>POS </B> </TD> <TD ALIGN=CENTER><B>Suffix |
|
|
</B> </TD> <TD ALIGN=CENTER><B>Ending </B> </TD> </TR> |
|
|
<TR> <TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"s" </TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ses" </TD> <TD ALIGN=LEFT>"s" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"xes" </TD> <TD ALIGN=LEFT>"x" </TD> |
|
|
</TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"zes" </TD> <TD ALIGN=LEFT>"z" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ches" </TD> <TD ALIGN=LEFT>"ch" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"shes" </TD> <TD ALIGN=LEFT>"sh" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN |
|
|
</TD> <TD ALIGN=LEFT>"men" </TD> <TD ALIGN=LEFT>"man" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>NOUN </TD> <TD ALIGN=LEFT>"ies" </TD> <TD ALIGN=LEFT>"y" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"s" </TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ies" </TD> <TD ALIGN=LEFT>"y" |
|
|
</TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"es" </TD> <TD ALIGN=LEFT>"e" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"es" </TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ed" </TD> <TD ALIGN=LEFT>"e" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ed" |
|
|
</TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ing" </TD> <TD ALIGN=LEFT>"e" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>VERB </TD> <TD ALIGN=LEFT>"ing" </TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"er" </TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"est" |
|
|
</TD> <TD ALIGN=LEFT>"" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"er" </TD> <TD ALIGN=LEFT>"e" </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT>ADJ </TD> <TD ALIGN=LEFT>"est" </TD> <TD ALIGN=LEFT>"e" </TD> </TR> |
|
|
</TABLE> |
|
|
|
|
|
<H3><A NAME="sect3" HREF="#toc3">Exception Lists </A></H3> |
|
|
There is one |
|
|
exception list file for each syntactic category. The exception lists contain |
|
|
the morphological transformations for strings that are not regular and |
|
|
therefore cannot be processed in an algorithmic manner. Each line of an |
|
|
exception list contains an inflected form of a word or collocation, followed |
|
|
by one or more base forms. The list is kept in alphabetical order and |
|
|
a binary search is used to find words in these lists. See <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
|
for |
|
|
information on the format of the exception list files. |
|
|
<H3><A NAME="sect4" HREF="#toc4">Single Words </A></H3> |
|
|
In |
|
|
general, single words are relatively easy to process. Morphy first looks |
|
|
for the word in the exception list. If it is found the first base form |
|
|
is returned. Subsequent calls with a <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
argument return additional |
|
|
base forms, if present. A <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
is returned when there are no more base |
|
|
forms of the word. <P> |
|
|
If the word is not found in the exception list corresponding |
|
|
to the syntactic category, an algorithmic process using the rules of detachment |
|
|
looks for a matching suffix. If a matching suffix is found, a corresponding |
|
|
ending is applied (sometimes this ending is a <FONT SIZE=-1><B>NULL </B></FONT> |
|
|
string, so in effect |
|
|
the suffix is removed from the word), and WordNet is consulted to see |
|
|
if the resulting word is found in the desired part of speech. |
|
|
<H3><A NAME="sect5" HREF="#toc5">Collocations |
|
|
</A></H3> |
|
|
As opposed to single words, collocations can be quite difficult to transform |
|
|
into a base form that is present in WordNet. In general, only base forms |
|
|
of words, even those comprising collocations, are stored in WordNet, such |
|
|
as <B>attorney general </B>. Transforming the collocation <B>attorneys general </B> |
|
|
is then simply a matter of finding the base forms of the individual words |
|
|
comprising the collocation. This usually works for nouns, therefore non-conforming |
|
|
nouns, such as <B>customs duty </B> are presently entered in the noun exception |
|
|
list. <P> |
|
|
Verb collocations that contain prepositions, such as <B>ask for it |
|
|
</B>, are more difficult. As with single words, the exception list is searched |
|
|
first. If the collocation is not found, special code in Morphy determines |
|
|
whether a verb collocation includes a preposition. If it does, a function |
|
|
is called to try to find the base form in the following manner. It is |
|
|
assumed that the first word in the collocation is a verb and that the |
|
|
last word is a noun. The algorithm then builds a search string with the |
|
|
base forms of the verb and noun, leaving the remainder of the collocation |
|
|
(usually just the preposition, but more words may be involved) in the |
|
|
middle. For example, passed <B>asking for it </B>, the database search would |
|
|
be performed with <B>ask for it </B>, which is found in WordNet, and therefore |
|
|
returned from Morphy. If a verb collocation does not contain a preposition, |
|
|
then the base form of each word in the collocation is found and WordNet |
|
|
is searched for the resulting string. |
|
|
<H3><A NAME="sect6" HREF="#toc6">Hyphenation </A></H3> |
|
|
Hyphenation also presents |
|
|
special difficulties when searching WordNet. It is often a subjective decision |
|
|
as to whether a word is hyphenated, joined as one word, or is a collocation |
|
|
of several words, and which of the various forms are entered into WordNet. |
|
|
When Morphy breaks a string into "words", it looks for both spaces and |
|
|
hyphens as delimiters. It also looks for periods in strings and removes |
|
|
them if an exact match is not found. A search for an abbreviation like |
|
|
<B>oct. </B> return the synset for <B>{ October, Oct } </B>. Not every pattern of hyphenated |
|
|
and collocated string is searched for properly, so it may be advantageous |
|
|
to specify several search strings if the results of a search attempt seem |
|
|
incomplete. |
|
|
<H3><A NAME="sect7" HREF="#toc7">Special Processing for nouns ending with 'ful' </A></H3> |
|
|
Morphy contains |
|
|
code that searches for nouns ending with <B>ful </B> and performs a transformation |
|
|
on the substring preceeding it. It then appends 'ful' back onto the resulting |
|
|
string and returns it. For example, if passed the nouns <B>boxesful </B>, it will |
|
|
return <B>boxful </B>. |
|
|
<H2><A NAME="sect8" HREF="#toc8">BUGS </A></H2> |
|
|
Since many noun collocations contains prepositions, |
|
|
such as <B>line of products </B>, an algorithm similar to that used for verbs |
|
|
should be written for nouns. In the present scheme, if Morphy is passed |
|
|
<B>lines of products </B>, the search string becomes <B>line of product </B>, which |
|
|
is not in WordNet <P> |
|
|
Morphy will allow non-words to be converted to words, |
|
|
if they follow one of the rules described above. For example, it will |
|
|
happily convert <B>plantes </B> to <B>plants </B>. |
|
|
<H2><A NAME="sect9" HREF="#toc9">ENVIRONMENT VARIABLES (UNIX) </A></H2> |
|
|
|
|
|
<DL> |
|
|
|
|
|
<DT><B>WNHOME</B> |
|
|
</DT> |
|
|
<DD>Base directory for WordNet. Default is <B>/usr/local/WordNet-3.0 </B>. </DD> |
|
|
|
|
|
<DT><B>WNSEARCHDIR</B> |
|
|
</DT> |
|
|
<DD>Directory in which the WordNet database has been installed. Default |
|
|
is <B>WNHOME/dict </B>. </DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect10" HREF="#toc10">REGISTRY (WINDOWS) </A></H2> |
|
|
|
|
|
<DL> |
|
|
|
|
|
<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B> |
|
|
</DT> |
|
|
<DD>Base directory for WordNet. Default is <B>C:\Program Files\WordNet\3.0 </B>. </DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect11" HREF="#toc11">FILES |
|
|
</A></H2> |
|
|
|
|
|
<DL> |
|
|
|
|
|
<DT><B><I>pos </I>.exc</B> </DT> |
|
|
<DD>morphology exception lists </DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect12" HREF="#toc12">SEE ALSO </A></H2> |
|
|
<B><A HREF="wn.1WN.html">wn</B>(1WN)</A> |
|
|
, <B><A HREF="wnb.1WN.html">wnb</B>(1WN)</A> |
|
|
, <B><A HREF="binsrch.3WN.html">binsrch</B>(3WN)</A> |
|
|
, |
|
|
<B><A HREF="morph.3WN.html">morph</B>(3WN)</A> |
|
|
, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
|
, <B><A HREF="wninput.7WN.html">wninput</B>(7WN)</A> |
|
|
. <P> |
|
|
|
|
|
<HR><P> |
|
|
<A NAME="toc"><B>Table of Contents</B></A><P> |
|
|
<UL> |
|
|
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI> |
|
|
<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI> |
|
|
<UL> |
|
|
<LI><A NAME="toc2" HREF="#sect2">Rules of Detachment</A></LI> |
|
|
<LI><A NAME="toc3" HREF="#sect3">Exception Lists</A></LI> |
|
|
<LI><A NAME="toc4" HREF="#sect4">Single Words</A></LI> |
|
|
<LI><A NAME="toc5" HREF="#sect5">Collocations</A></LI> |
|
|
<LI><A NAME="toc6" HREF="#sect6">Hyphenation</A></LI> |
|
|
<LI><A NAME="toc7" HREF="#sect7">Special Processing for nouns ending with 'ful'</A></LI> |
|
|
</UL> |
|
|
<LI><A NAME="toc8" HREF="#sect8">BUGS</A></LI> |
|
|
<LI><A NAME="toc9" HREF="#sect9">ENVIRONMENT VARIABLES (UNIX)</A></LI> |
|
|
<LI><A NAME="toc10" HREF="#sect10">REGISTRY (WINDOWS)</A></LI> |
|
|
<LI><A NAME="toc11" HREF="#sect11">FILES</A></LI> |
|
|
<LI><A NAME="toc12" HREF="#sect12">SEE ALSO</A></LI> |
|
|
</UL> |
|
|
</BODY></HTML> |
|
|
|