Provided by: remembrance-agent_2.12-7.2_amd64 bug

NAME

       ra-index - index files for use with remembrance agent software

SYNOPSIS

       ra-index [--version] [-v] [-d] [-s] <base-dir> <source1> [<source2>] [...]  [-e <excludee1> [<excludee2>]
       [...]]

DESCRIPTION

       ra-index  and ra-retrieve make up the Savant search engine, an information retrieval engine designed as a
       back-end for the Remembrance Agent (RA).  Given a collection of the user's accumulated email, usenet news
       articles, papers, saved HTML files and other text notes, the RA attempts to find  those  documents  which
       are  most  relevant  to the user's current context.  That is, it searches this collection of text for the
       documents which bear the highest word-for-word similarity to the text the user is currently  editing,  in
       the  hope  that  they  will also bear high conceptual similarity and thus be useful to the user's current
       work.  With the Emacs front-end, these suggestions are continuously displayed in a small  buffer  at  the
       bottom  of the user's window.  If a suggestion looks useful, the full text can be retrieved with a single
       command.

       The Remembrance Agent works in two stages.  First, the user's collection of  text  documents  is  indexed
       into  a  database  saved  in  a  vector  format.   After  the database is created, the other stage of the
       Remembrance Agent is run from emacs, where it periodically takes a sample of text from the working buffer
       and finds those documents from the collection that are most similar.  It summarizes the top documents  in
       a  small  emacs  window  and allows you to retrieve the entire text of any one with a keystroke.  See the
       README file for information on using the Emacs front-end.

       At its core Savant is a text-retrieval search-engine that uses a standard TF/iDF algorithm, but  it  also
       uses  a  template system to recognize different kinds of documents and extract various field information.
       For example, ra-index can recognize subject lines and address information from email files and file  this
       information  separately.   It can also pull apart file archives into separate documents, e.g. RMAIL files
       are indexed as separate email documents.  Finally, there are filters defined for many document  types  to
       remove extraneous information like HTML tags that might otherwise cause problems in retrieval.  These are
       all  precompiled in a template structure.  It is not currently well documented, though if anyone wants to
       play with it is all defined in the source file templates/conftemplates.c.

       The RA is primarily designed as a proactive information provider that continually gives  you  information
       that  might  be  relevant to your current environment, but Savant can also be used as a standard text and
       information retrieval search engine.

   USAGE
       To index, you must have a set of source text-files, and a directory Savant can put database  files  into.
       The  <source>  arguments may be files or directories.  If a directory is in the list, Savant will use all
       its contents, recursing into all subdirectories.  Non-text files and backup files (those appended with  ~
       or  prepended with #) are ignored.  It also ignores dot-files (those starting with .) and symbolic links.
       Any files or directories specified after the optional -e flag will be  excluded.   Savant  will  use  any
       files  it  finds  to  create  a  database in the specified base directory, which must already exist.  The
       optional -v argument (verbose) will direct Savant to keep you updated on its progress.  So for example,

            ra-index -v ~/RA-indexes/mail ~/RMAIL ~/Rmail-files -e ~/Rmail-files/Old-files
       will build a database in the ~/RA-indexes/mail directory, made up of emails from my RMAIL file  plus  all
       files and subdirectories of ~/Rmail-files, excluding files and directories in ~/Rmail-files/Old-files.

       ra-index can build databases in any directory you like, but the emacs interface for the Remembrance Agent
       expects  a  particular structure.  For each database you want to make, you should create a directory, and
       all these directories should live in the same parent directory.  For example, for my own  use  I  have  a
       directory  ~/RA-indexes/,  and  within that are the directories ~/RA-indexes/mail/, ~/RA-indexes/papers/,
       etc. which actually contain the database files.

   OPTIONS
       -v     Verbose mode.  Print useful information.

       -d     Debug mode.  Print not-so-useful information.

       -e     Exclude all filenames and directories which follow

       -s     Follow symbolic links when indexing

       --version
              Print version information.

SEE ALSO

       ra-retrieve(1)

AUTHOR

       Bradley Rhodes, MIT Media Lab.   Please  send  comments  and  questions  to  ra-bugs@media.mit.edu.   New
       versions and updates can be found at http://www.media.mit.edu/~rhodes/RA/

COPYRIGHT

       All code included in versions up to and including 2.09:
          Copyright (C) 2001 Massachusetts Institute of Technology.

       All modifications subsequent to version 2.09 are copyright Bradley Rhodes or their respective authors.

       Developed  by  Bradley  Rhodes  at the Media Laboratory, MIT, Cambridge, Massachusetts, with support from
       British Telecom and Merrill Lynch.

       This program is free software; you can redistribute it and/or modify  it  under  the  terms  of  the  GNU
       General  Public License as published by the Free Software Foundation; either version 2 of the License, or
       (at your option) any later version.  For commercial licensing under other terms, please consult  the  MIT
       Technology Licensing Office.

       This  program  may be subject to the following US and/or foreign patents (pending): "Method and Apparatus
       for Automated, Context-Dependent Retrieval of Information," MIT Case No. 7870TS. If any of these  patents
       are  granted,  royalty-free  license  to  use  this  and derivative programs under the GNU General Public
       License are hereby granted.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;  without  even
       the  implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

       You should have received a copy of the GNU General Public License along with this program; if not,  write
       to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

BUGS

       Dates are not currently indexed, so anything trying to do a date query gets no suggestion back.

       Requires GNU make to compile.

       The template structure isn't documented.

MIT Media Lab                                   REMEMBRANCE AGENT                                    RA-INDEX(1)