Provided by: wordnet-sense-index_3.0-39_all bug

NAME

       index.sense, sense.idx - WordNet's sense index

DESCRIPTION

       The WordNet sense index provides an alternate method for accessing synsets and word senses in the WordNet
       database.   It is useful to applications that retrieve synsets or other information related to a specific
       sense in WordNet, rather than all the senses of a word or collocation.  It can also be  used  with  tools
       like  grep  and  Perl  to  find  all senses of a word in one or more parts of speech.  A specific WordNet
       sense, encoded as a sense_key, can be used as an index into this file to obtain its WordNet sense number,
       the database byte offset of the synset containing the sense, and the number of times it has  been  tagged
       in the semantic concordance texts.

       Concatenating  the  lemma  and lex_sense fields of a semantically tagged word (represented in a <wf ... >
       attribute/value pair) in a semantic concordance file, using % as the concatenation character, creates the
       sense_key for that sense, which can in turn be used to search the sense index file.

       A sense_key is the best way to represent a sense in semantic tagging  or  other  systems  that  refer  to
       WordNet  senses.   sense_keys  are  independent  of  WordNet sense numbers and synset_offsets, which vary
       between versions of the database.  Using the sense index and a sense_key, the corresponding  synset  (via
       the  synset_offset)  and  WordNet sense number can easily be obtained.  A mapping from noun sense_keys in
       WordNet 1.6 to  corresponding  2.0  sense_keys  is  provided  with  version  2.0,  and  is  described  in
       sensemap(5WN).

       See wndb(5WN) for a thorough discussion of the WordNet database files.

   File Format
       The  sense  index  file  lists  all of the senses in the WordNet database with each line representing one
       sense.  The file is in alphabetical order, fields are separated by one space, and each line is terminated
       with a newline character.

       Each line is of the form:

              sense_key  synset_offset  sense_number  tag_cnt

       sense_key is an encoding of the word sense.  Programs can construct a sense key in this format and use it
       as a binary search key into the sense index file.  The format of a sense_key is described below.

       synset_offset is the byte offset that the synset containing the sense is found at in the database  "data"
       file  corresponding  to  the part of speech encoded in the sense_key.  synset_offset is an 8 digit, zero-
       filled decimal integer, and can be used with fseek(3) to read a synset from the data file.   When  passed
       to  the  WordNet  library  function  read_synset()  along  with  the syntactic category, a data structure
       containing the parsed synset is returned.

       sense_number is a decimal integer indicating the sense number of the word,  within  the  part  of  speech
       encoded in sense_key, in the WordNet database.  See wndb(5WN) for information about how sense numbers are
       assigned.

       tag_cnt represents the decimal number of times the sense is tagged in various semantic concordance texts.
       A tag_cnt of 0 indicates that the sense has not been semantically tagged.

   Sense Key Encoding
       A sense_key is represented as:

              lemma%lex_sense

       where lex_sense is encoded as:

              ss_type:lex_filenum:lex_id:head_word:head_id

       lemma  is  the  ASCII  text  of  the  word  or  collocation  as  found in the WordNet database index file
       corresponding to pos.  lemma is in lower case, and collocations are formed by  joining  individual  words
       with an underscore (_) character.

       ss_type is a one digit decimal integer representing the synset type for the sense.  See Synset Type below
       for a listing of the numbers corresponding to each synset type.

       lex_filenum is a two digit decimal integer representing the name of the lexicographer file containing the
       synset for the sense.  See lexnames(5WN) for the list of lexicographer file names and their corresponding
       numbers.

       lex_id  is a two digit decimal integer that, when appended onto lemma, uniquely identifies a sense within
       a lexicographer file.  lex_id numbers usually start with 00, and are incremented as additional senses  of
       the  word are added to the same file, although there is no requirement that the numbers be consecutive or
       begin with 00.  Note that a value of 00 is the default, and therefore is  not  present  in  lexicographer
       files.   Only  non-default  lex_id  values  must  be  explicitly  assigned  in  lexicographer files.  See
       wninput(5WN) for information on the format of lexicographer files.

       head_word is only present if the sense is in an adjective satellite synset.  It is the lemma of the first
       word of the satellite's head synset.

       head_id is a two digit decimal integer that, when appended onto head_word, uniquely identifies the  sense
       of  head_word  within a lexicographer file, as described for lex_id.  There is a value in this field only
       if head_word is present.

   Synset Type
       The synset type is encoded as follows:

              1    NOUN
              2    VERB
              3    ADJECTIVE
              4    ADVERB
              5    ADJECTIVE SATELLITE

NOTES

       For non-satellite senses the head_word and head_id fields have no values,  however  the  field  separator
       character (:) is present.

ENVIRONMENT VARIABLES (UNIX)

       WNHOME              Base directory for WordNet.  Default is /usr/local/WordNet-3.0.

       WNSEARCHDIR         Directory in which the WordNet database has been installed.  Default is WNHOME/dict.

REGISTRY (WINDOWS)

       HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
                           Base directory for WordNet.  Default is C:\Program Files\WordNet\3.0.

FILES

       index.sense         sense index

SEE ALSO

       binsrch(3WN), wnsearch(3WN), lexnames(5WN), wnintro(5WN), sensemap(5WN), wndb(5WN), wninput(5WN).

WordNet 3.0                                         Dec 2006                                       SENSEIDX(5WN)