|
|
|
|
|
|
|
|
|
|
|
<HTML> |
|
|
<HEAD> |
|
|
<TITLE>GRIND(1) manual page</TITLE> |
|
|
</HEAD> |
|
|
<BODY> |
|
|
<A HREF="#toc">Table of Contents</A><P> |
|
|
|
|
|
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2> |
|
|
grind - process WordNet lexicographer files |
|
|
<H2><A NAME="sect1" HREF="#toc1">SYNOPSIS </A></H2> |
|
|
<B>grind </B> [ <B>-v |
|
|
</B> ] [ <B>-s </B> ] [ <B>-L </B><I>logfile </I> ] [ <B>-a </B> ] [ <B>-d </B> ] [ <B>-i </B> ] [ <B>-o </B> ] [ <B>-n </B> ] <I>filename </I> |
|
|
[ <I>filename </I>... ] |
|
|
<H2><A NAME="sect2" HREF="#toc2">DESCRIPTION </A></H2> |
|
|
<B>grind() </B> processes WordNet lexicographer files, |
|
|
producing database files suitable for use with the WordNet search and |
|
|
interface code and other applications. The syntactic and structural integrity |
|
|
of the input files is verified. Warnings and errors are reported via <B>stderr |
|
|
</B> and a run-time log is produced on <B>stdout </B>. A database is generated only |
|
|
if there are no errors. |
|
|
<H3><A NAME="sect3" HREF="#toc3">Input Files </A></H3> |
|
|
Input files correspond to the syntactic |
|
|
categories implemented in WordNet - <B>noun</B>, <B></B> <B>verb</B>, <B></B> <B>adjective</B> and <B></B> <B>adverb</B>. |
|
|
Each input lexicographer file consists of a list of synonym sets (<I>synsets |
|
|
</I>) for one part of speech. Although the basic synset syntax is the same |
|
|
for all of the parts of speech, some parts of the syntax only apply to |
|
|
a particular part of speech. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)<B></B></A> |
|
|
for a description of the |
|
|
input file format. <P> |
|
|
Each <I>filename </I> specified is of the form: <P> |
|
|
<blockquote> </blockquote> |
|
|
<P> |
|
|
where |
|
|
<I>pathname </I> is optional and <I>pos </I> is either <B>noun</B>, <B></B> <B>verb</B>, <B></B> <B>adj</B> or <B></B> <B>adv</B>. <I>suffix |
|
|
</I> may be used to separate groups of synsets into different files, for example |
|
|
<B>noun.animal </B> and <B>noun.plant </B>. One or more input files, in any combination |
|
|
of syntactic categories, may be specified. See <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A> |
|
|
for a list |
|
|
of the lexicographer files used to build the complete WordNet database. |
|
|
|
|
|
<H3><A NAME="sect4" HREF="#toc4">Output Files </A></H3> |
|
|
<B>grind() </B> produces the following output files: <P> |
|
|
<TABLE BORDER=0> |
|
|
<TR> <TD ALIGN=CENTER><B>Filename |
|
|
</B></TD> <TD ALIGN=CENTER>Description </TD> </TR> |
|
|
<TR> <TR> <TD ALIGN=LEFT><B>index.<I>pos </I></B> </TD> <TD ALIGN=LEFT>Index file for each syntactic category </TD> |
|
|
</TR> |
|
|
<TR> <TD ALIGN=LEFT><B>data.<I>pos </I></B> </TD> <TD ALIGN=LEFT>Data file for each syntactic category </TD> </TR> |
|
|
<TR> <TD ALIGN=LEFT><B>index.sense </B> </TD> <TD ALIGN=LEFT>Sense |
|
|
index </TD> </TR> |
|
|
</TABLE> |
|
|
<P> |
|
|
See <B><A HREF="wndb.5WN.html">wndb</B>(5WN)<B></B></A> |
|
|
for a description of the database file formats. |
|
|
<P> |
|
|
Each time <B>grind() </B> is run, any existing database files are overwritten |
|
|
with the database files generated from the specified input files. If no |
|
|
input files from a syntactic category are specified, the corresponding |
|
|
database files are not overwritten. |
|
|
<H3><A NAME="sect5" HREF="#toc5">Sense Numbers </A></H3> |
|
|
Senses are generally |
|
|
ordered from most to least frequently used, with the most common sense |
|
|
numbered <B>1 </B>. Frequency of use is determined by the number of times a sense |
|
|
is tagged in the various semantic concordance texts. Senses that are not |
|
|
semantically tagged follow the ordered senses in an arbitrary order. |
|
|
Note that this ordering is only an estimate based on usage in a small |
|
|
corpus. <P> |
|
|
The <I>tagsense_cnt </I> field for each entry in the <B>index.<I>pos </I></B> files |
|
|
indicates how many of the senses in the list have been tagged. <P> |
|
|
The <B>cntlist |
|
|
</B> file provided with the database lists the number of times each sense |
|
|
is tagged in the semantic concordances. <B>grind() </B> uses the data from <B>cntlist |
|
|
</B> to order the senses of each word. When the <B>index </B>.<I>pos </I> files are generated, |
|
|
the <I>synset_offset </I>s are output in sense number order, with sense 1 first |
|
|
in the list. Senses with the same number of semantic tags are assigned |
|
|
unique but consecutive sense numbers. The WordNet <FONT SIZE=-1><B>OVERVIEW </B></FONT> |
|
|
search displays |
|
|
all senses of the specified word, in all syntactic categories, and indicates |
|
|
which of the senses are represented in the semantically tagged texts. |
|
|
|
|
|
<H2><A NAME="sect6" HREF="#toc6">OPTIONS </A></H2> |
|
|
|
|
|
<DL> |
|
|
|
|
|
<DT><B>-v</B> </DT> |
|
|
<DD>Verify integrity of input without generating database. </DD> |
|
|
|
|
|
<DT><B>-s</B> </DT> |
|
|
<DD>Suppress |
|
|
generation of warning messages. Usually <B>grind </B> is run with this option |
|
|
until all syntactic and structural errors are corrected since the warning |
|
|
messages may make it difficult to spot error messages. </DD> |
|
|
|
|
|
<DT><B>-L</B><I>logfile</I> </DT> |
|
|
<DD>Write |
|
|
all messages to <I>logfile </I> instead of <B>stderr </B>. </DD> |
|
|
|
|
|
<DT><B>-a</B> </DT> |
|
|
<DD>Generate statistical report |
|
|
on input files processed. </DD> |
|
|
|
|
|
<DT><B>-d</B> </DT> |
|
|
<DD>Generate distribution of senses by string |
|
|
length report on input files processed. </DD> |
|
|
|
|
|
<DT><B>-i</B> </DT> |
|
|
<DD>Generate sense index file. </DD> |
|
|
|
|
|
<DT><B>-o</B> |
|
|
</DT> |
|
|
<DD>Order senses using <B>cntlist </B>. </DD> |
|
|
|
|
|
<DT><B>-n</B> </DT> |
|
|
<DD>Generate nominalization (derivational |
|
|
morphology) links in database. </DD> |
|
|
|
|
|
<DT><I>filename</I> </DT> |
|
|
<DD>Input file of the form described |
|
|
in <FONT SIZE=-1><B>Input </B></FONT> |
|
|
</DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect7" HREF="#toc7">FILES </A></H2> |
|
|
|
|
|
<DL> |
|
|
|
|
|
<DT><B><I>pos </I>.*</B> </DT> |
|
|
<DD>lexicographer files to use to build database |
|
|
</DD> |
|
|
|
|
|
<DT><B>cntlist</B> </DT> |
|
|
<DD>file of combined semantic concordance <B>cntlist </B> files. Used to |
|
|
assign sense numbers in WordNet database </DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect8" HREF="#toc8">SEE ALSO </A></H2> |
|
|
<B><A HREF="cntlist.5WN.html">cntlist</B>(5WN)</A> |
|
|
, <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A> |
|
|
, |
|
|
<B><A HREF="senseidx.5WN.html">senseidx</B>(5WN)</A> |
|
|
, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
|
, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A> |
|
|
, <B><A HREF="uniqbeg.7WN.html">uniqbeg</B>(7WN)</A> |
|
|
, <B><A HREF="wngloss.7WN.html">wngloss</B>(7WN)</A> |
|
|
. |
|
|
<H2><A NAME="sect9" HREF="#toc9">DIAGNOSTICS |
|
|
</A></H2> |
|
|
Exit status is normally 0. Exit status is -1 if non-specific error occurs. |
|
|
If syntactic or structural errors exist, exit status is number of errors |
|
|
detected. |
|
|
<DL> |
|
|
|
|
|
<DT><B>usage: grind [-v] [-s] [-Llogfile] [-a ] [-d] [-i] [-o] [-n] filename |
|
|
[filename...]</B> </DT> |
|
|
<DD>Invalid options were specified on the command line. </DD> |
|
|
|
|
|
<DT><B>No input |
|
|
files processed.</B> </DT> |
|
|
<DD>None of the filenames specified were of the appropriate |
|
|
form. </DD> |
|
|
|
|
|
<DT><B><I>n </I> syntactic errors found.</B> </DT> |
|
|
<DD>Syntax errors were found while parsing |
|
|
the input files. </DD> |
|
|
|
|
|
<DT><B><I>n </I> structural errors found.</B> </DT> |
|
|
<DD>Pointer errors were found |
|
|
that could not be automatically corrected. </DD> |
|
|
</DL> |
|
|
|
|
|
<H2><A NAME="sect10" HREF="#toc10">BUGS </A></H2> |
|
|
Please report bugs to |
|
|
<B>wordnet@princeton.edu </B>. <P> |
|
|
|
|
|
<HR><P> |
|
|
<A NAME="toc"><B>Table of Contents</B></A><P> |
|
|
<UL> |
|
|
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI> |
|
|
<LI><A NAME="toc1" HREF="#sect1">SYNOPSIS</A></LI> |
|
|
<LI><A NAME="toc2" HREF="#sect2">DESCRIPTION</A></LI> |
|
|
<UL> |
|
|
<LI><A NAME="toc3" HREF="#sect3">Input Files</A></LI> |
|
|
<LI><A NAME="toc4" HREF="#sect4">Output Files</A></LI> |
|
|
<LI><A NAME="toc5" HREF="#sect5">Sense Numbers</A></LI> |
|
|
</UL> |
|
|
<LI><A NAME="toc6" HREF="#sect6">OPTIONS</A></LI> |
|
|
<LI><A NAME="toc7" HREF="#sect7">FILES</A></LI> |
|
|
<LI><A NAME="toc8" HREF="#sect8">SEE ALSO</A></LI> |
|
|
<LI><A NAME="toc9" HREF="#sect9">DIAGNOSTICS</A></LI> |
|
|
<LI><A NAME="toc10" HREF="#sect10">BUGS</A></LI> |
|
|
</UL> |
|
|
</BODY></HTML> |
|
|
|