openie5 / WordNet-3.0 /doc /html /grind.1WN.html

feat: wordnet 3.0 added for standalone

cb1c1cb almost 3 years ago

7.83 kB

	<!-- manual page source format generated by PolyglotMan v3.0.3a12, -->
	<!-- available via anonymous ftp from ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z -->

	<HTML>
	<HEAD>
	<TITLE>GRIND(1) manual page</TITLE>
	</HEAD>
	<BODY>
	<A HREF="#toc">Table of Contents</A><P>

	<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2>
	grind - process WordNet lexicographer files
	<H2><A NAME="sect1" HREF="#toc1">SYNOPSIS </A></H2>
	<B>grind </B> [ <B>-v
	</B> ] [ <B>-s </B> ] [ <B>-L </B><I>logfile </I> ] [ <B>-a </B> ] [ <B>-d </B> ] [ <B>-i </B> ] [ <B>-o </B> ] [ <B>-n </B> ] <I>filename </I>
	[ <I>filename </I>... ]
	<H2><A NAME="sect2" HREF="#toc2">DESCRIPTION </A></H2>
	<B>grind() </B> processes WordNet lexicographer files,
	producing database files suitable for use with the WordNet search and
	interface code and other applications. The syntactic and structural integrity
	of the input files is verified. Warnings and errors are reported via <B>stderr
	</B> and a run-time log is produced on <B>stdout </B>. A database is generated only
	if there are no errors.
	<H3><A NAME="sect3" HREF="#toc3">Input Files </A></H3>
	Input files correspond to the syntactic
	categories implemented in WordNet - <B>noun</B>, <B></B> <B>verb</B>, <B></B> <B>adjective</B> and <B></B> <B>adverb</B>.
	Each input lexicographer file consists of a list of synonym sets (<I>synsets
	</I>) for one part of speech. Although the basic synset syntax is the same
	for all of the parts of speech, some parts of the syntax only apply to
	a particular part of speech. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)<B></B></A>
	for a description of the
	input file format. <P>
	Each <I>filename </I> specified is of the form: <P>
	<blockquote> </blockquote>
	<P>
	where
	<I>pathname </I> is optional and <I>pos </I> is either <B>noun</B>, <B></B> <B>verb</B>, <B></B> <B>adj</B> or <B></B> <B>adv</B>. <I>suffix
	</I> may be used to separate groups of synsets into different files, for example
	<B>noun.animal </B> and <B>noun.plant </B>. One or more input files, in any combination
	of syntactic categories, may be specified. See <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
	for a list
	of the lexicographer files used to build the complete WordNet database.

	<H3><A NAME="sect4" HREF="#toc4">Output Files </A></H3>
	<B>grind() </B> produces the following output files: <P>
	<TABLE BORDER=0>
	<TR> <TD ALIGN=CENTER><B>Filename
	</B></TD> <TD ALIGN=CENTER>Description </TD> </TR>
	<TR> <TR> <TD ALIGN=LEFT><B>index.<I>pos </I></B> </TD> <TD ALIGN=LEFT>Index file for each syntactic category </TD>
	</TR>
	<TR> <TD ALIGN=LEFT><B>data.<I>pos </I></B> </TD> <TD ALIGN=LEFT>Data file for each syntactic category </TD> </TR>
	<TR> <TD ALIGN=LEFT><B>index.sense </B> </TD> <TD ALIGN=LEFT>Sense
	index </TD> </TR>
	</TABLE>
	<P>
	See <B><A HREF="wndb.5WN.html">wndb</B>(5WN)<B></B></A>
	for a description of the database file formats.
	<P>
	Each time <B>grind() </B> is run, any existing database files are overwritten
	with the database files generated from the specified input files. If no
	input files from a syntactic category are specified, the corresponding
	database files are not overwritten.
	<H3><A NAME="sect5" HREF="#toc5">Sense Numbers </A></H3>
	Senses are generally
	ordered from most to least frequently used, with the most common sense
	numbered <B>1 </B>. Frequency of use is determined by the number of times a sense
	is tagged in the various semantic concordance texts. Senses that are not
	semantically tagged follow the ordered senses in an arbitrary order.
	Note that this ordering is only an estimate based on usage in a small
	corpus. <P>
	The <I>tagsense_cnt </I> field for each entry in the <B>index.<I>pos </I></B> files
	indicates how many of the senses in the list have been tagged. <P>
	The <B>cntlist
	</B> file provided with the database lists the number of times each sense
	is tagged in the semantic concordances. <B>grind() </B> uses the data from <B>cntlist
	</B> to order the senses of each word. When the <B>index </B>.<I>pos </I> files are generated,
	the <I>synset_offset </I>s are output in sense number order, with sense 1 first
	in the list. Senses with the same number of semantic tags are assigned
	unique but consecutive sense numbers. The WordNet <FONT SIZE=-1><B>OVERVIEW </B></FONT>
	search displays
	all senses of the specified word, in all syntactic categories, and indicates
	which of the senses are represented in the semantically tagged texts.

	<H2><A NAME="sect6" HREF="#toc6">OPTIONS </A></H2>

	<DL>

	<DT><B>-v</B> </DT>
	<DD>Verify integrity of input without generating database. </DD>

	<DT><B>-s</B> </DT>
	<DD>Suppress
	generation of warning messages. Usually <B>grind </B> is run with this option
	until all syntactic and structural errors are corrected since the warning
	messages may make it difficult to spot error messages. </DD>

	<DT><B>-L</B><I>logfile</I> </DT>
	<DD>Write
	all messages to <I>logfile </I> instead of <B>stderr </B>. </DD>

	<DT><B>-a</B> </DT>
	<DD>Generate statistical report
	on input files processed. </DD>

	<DT><B>-d</B> </DT>
	<DD>Generate distribution of senses by string
	length report on input files processed. </DD>

	<DT><B>-i</B> </DT>
	<DD>Generate sense index file. </DD>

	<DT><B>-o</B>
	</DT>
	<DD>Order senses using <B>cntlist </B>. </DD>

	<DT><B>-n</B> </DT>
	<DD>Generate nominalization (derivational
	morphology) links in database. </DD>

	<DT><I>filename</I> </DT>
	<DD>Input file of the form described
	in <FONT SIZE=-1><B>Input </B></FONT>
	</DD>
	</DL>

	<H2><A NAME="sect7" HREF="#toc7">FILES </A></H2>

	<DL>

	<DT><B><I>pos </I>.*</B> </DT>
	<DD>lexicographer files to use to build database
	</DD>

	<DT><B>cntlist</B> </DT>
	<DD>file of combined semantic concordance <B>cntlist </B> files. Used to
	assign sense numbers in WordNet database </DD>
	</DL>

	<H2><A NAME="sect8" HREF="#toc8">SEE ALSO </A></H2>
	<B><A HREF="cntlist.5WN.html">cntlist</B>(5WN)</A>
	, <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
	,
	<B><A HREF="senseidx.5WN.html">senseidx</B>(5WN)</A>
	, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
	, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
	, <B><A HREF="uniqbeg.7WN.html">uniqbeg</B>(7WN)</A>
	, <B><A HREF="wngloss.7WN.html">wngloss</B>(7WN)</A>
	.
	<H2><A NAME="sect9" HREF="#toc9">DIAGNOSTICS
	</A></H2>
	Exit status is normally 0. Exit status is -1 if non-specific error occurs.
	If syntactic or structural errors exist, exit status is number of errors
	detected.
	<DL>

	<DT><B>usage: grind [-v] [-s] [-Llogfile] [-a ] [-d] [-i] [-o] [-n] filename
	[filename...]</B> </DT>
	<DD>Invalid options were specified on the command line. </DD>

	<DT><B>No input
	files processed.</B> </DT>
	<DD>None of the filenames specified were of the appropriate
	form. </DD>

	<DT><B><I>n </I> syntactic errors found.</B> </DT>
	<DD>Syntax errors were found while parsing
	the input files. </DD>

	<DT><B><I>n </I> structural errors found.</B> </DT>
	<DD>Pointer errors were found
	that could not be automatically corrected. </DD>
	</DL>

	<H2><A NAME="sect10" HREF="#toc10">BUGS </A></H2>
	Please report bugs to
	<B>wordnet@princeton.edu </B>. <P>

	<HR><P>
	<A NAME="toc"><B>Table of Contents</B></A><P>
	<UL>
	<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI>
	<LI><A NAME="toc1" HREF="#sect1">SYNOPSIS</A></LI>
	<LI><A NAME="toc2" HREF="#sect2">DESCRIPTION</A></LI>
	<UL>
	<LI><A NAME="toc3" HREF="#sect3">Input Files</A></LI>
	<LI><A NAME="toc4" HREF="#sect4">Output Files</A></LI>
	<LI><A NAME="toc5" HREF="#sect5">Sense Numbers</A></LI>
	</UL>
	<LI><A NAME="toc6" HREF="#sect6">OPTIONS</A></LI>
	<LI><A NAME="toc7" HREF="#sect7">FILES</A></LI>
	<LI><A NAME="toc8" HREF="#sect8">SEE ALSO</A></LI>
	<LI><A NAME="toc9" HREF="#sect9">DIAGNOSTICS</A></LI>
	<LI><A NAME="toc10" HREF="#sect10">BUGS</A></LI>
	</UL>
	</BODY></HTML>