mailfoot man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]


MAILFOOT(1)							   MAILFOOT(1)

NAME
       mailfoot - a full-online-ordered-training simulator for use with dbacl.

SYNOPSIS
       mailfoot command [ command_arguments ]

DESCRIPTION
       mailfoot	 automates the task of testing email filtering and classifica‐
       tion programs such as dbacl(1).	Given a set of categorized  documents,
       mailfoot	 initiates test runs to estimate the classification errors and
       thereby permit fine tuning of the parameters of the classifier.

       Full Online Ordered Training is a learning method for email classifiers
       where  each  incoming  email  is learned as soon as it arrives, thereby
       always keeping category descriptions up to date for the next  classifi‐
       cation.	 This  directly models the way that some email classifiers are
       used in practice.

       FOOT's error rates depend directly on the order	in  which  emails  are
       seen.   A  small	 change in ordering, as might happen due to networking
       delays, can have an impact on the number of misclassifications.	Conse‐
       quently,	 mailfoot  does not give meaningful results, unless the sample
       emails are chosen carefully.  However, as this method is commonly  used
       by  spam	 filters,  it  is still worth computing to foster comparisons.
       Other methods  (see mailcross(1),mailtoe(1)) attempt to capture the be‐
       haviour of classification errors in other ways.

       To  improve and stabilize the error rate calculation, mailfoot performs
       the FOOT simulations several times on slightly reordered email streams,
       and  averages  the  results.  The reorderings occur by multiplexing the
       emails from each category mailbox in random order. Thus	if  there  are
       three  categories,  the	first email classified is chosen randomly from
       the front of the sample email streams of each type.  The	 second	 email
       is also chosen randomly among the three types, from the front of the
	streams	 after	the first email was removed. Simulation stops when all
       sample streams are exhausted.

       mailfoot uses the environment variable MAILFOOT_FILTER when  executing,
       which  permits the simulation of arbitrary filters, provided these sat‐
       isfy the compatibility conditions stated	 in  the  ENVIRONMENT  section
       below.

       For  convenience, mailfoot implements a testsuite framework with prede‐
       fined wrappers for several open source classifiers.  This  permits  the
       direct  comparison  of  dbacl(1) with competing classifiers on the same
       set of email samples. See the USAGE section below.

       During preparation, mailfoot builds a subdirectory named mailfoot.d  in
       the  current  working directory.	 All needed calculations are performed
       inside this subdirectory.

EXIT STATUS
       mailfoot returns 0 on success, 1 if a problem occurred.

COMMANDS
       prepare size
	      Prepares a subdirectory named mailfoot.d in the current  working
	      directory,  and  populates  it  with  empty  subdirectories  for
	      exactly size subsets.

       add category [ FILE ]...
	      Takes a set of emails from either FILE if specified,  or	STDIN,
	      and  associates  them  with  category.   The  ordering of emails
	      within FILE is preserved, and subsequent FILEs are  appended  to
	      the  first  in each category.  This command can be repeated sev‐
	      eral times, but should be executed at least once.

       clean  Deletes the directory mailfoot.d and all its contents.

       run    Multiplexes randomly from the email streams added	 earlier,  and
	      relearns	categories  only  when a misclassification occurs. The
	      simulation is repeated size times.

       summarize
	      Prints average error rates for the simulations.

       plot [ ps | logscale ]...
	      Plots the number	of  errors  over  simulation  time.  The  "ps"
	      option,  if present, writes the plot to a postscript file in the
	      directory mailfoot/plots, instead of being shown on-screen.  The
	      "logscale"  option, if present, causes the plot to be on the log
	      scale for both ordinates.

       review truecat predcat
	      Scans the last run statistics  and  extracts  all	 the  messages
	      which  belong  to category truecat but have been classified into
	      category predcat.	 The extracted	messages  are  copied  to  the
	      directory mailfoot.d/review for perusal.

       testsuite list
	      Shows  a	list of available filters/wrapper scripts which can be
	      selected.

       testsuite select [ FILTER ]...
	      Prepares the filter(s) named FILTER to be used  for  simulation.
	      The  filter  name is the name of a wrapper script located in the
	      directory /usr/local/share/dbacl/testsuite.  Each filter	has  a
	      rigid  interface	documented  below, and the act of selecting it
	      copies it to  the	 mailfoot.d/filters  directory.	 Only  filters
	      located there are used in the simulations.

       testsuite deselect [ FILTER ]...
	      Removes  the  named filter(s) from the directory mailfoot.d/fil‐
	      ters so that they are not used in the simulation.

       testsuite run [ plots ]
	      Invokes every selected filter on the datasets added  previously,
	      and calculates misclassification rates. If the "plots" option is
	      present, each filter simulation is plotted as a postscript  file
	      in the directory mailfoot.d/plots.

       testsuite status
	      Describes the scheduled simulations.

       testsuite summarize
	      Shows  the  cross validation results for all filters. Only makes
	      sense after the run command.

USAGE
       The normal usage pattern is the following: first, you  should  separate
       your  email collection into several categories (manually or otherwise).
       Each category should be associated with one or more folders,  but  each
       folder  should  not  contain  more  than one category. Next, you should
       decide how many runs to use, say 10.  The more runs you use, the better
       the  predicted error rates. However, more runs take more time.  Now you
       can type

       % mailfoot prepare 10

       Next, for every category, you must add  every  folder  associated  with
       this  category. Suppose you have three categories named spam, work, and
       play, which are associated with the mbox	 files	spam.mbox,  work.mbox,
       and play.mbox respectively. You would type

       % mailfoot add spam spam.mbox
       % mailfoot add work work.mbox
       % mailfoot add play play.mbox

       You  should aim for a similar number of emails in each category, as the
       random multiplexing will be unbalanced otherwise. The ordering  of  the
       email  messages in each *.mbox file is important, and is preserved dur‐
       ing each simulation. If you repeatedly add to the  same	category,  the
       later  mailboxes	 will be appended to the first, preserving the implied
       ordering.

       You can now perform as many FOOT simulations  as	 desired.  The	multi‐
       plexed  emails  are  classified and learned one at a time, by executing
       the command given in the environment variable MAILFOOT_FILTER.  If  not
       set, a default value is used.

       % mailfoot run
       % mailfoot summarize

       The  testsuite  commands	 are  designed to simplify the above steps and
       allow comparison of a wide range of email  classifiers,	including  but
       not  limited  to	 dbacl.	  Classifiers  are  supported  through wrapper
       scripts, which  are  located  in	 the  /usr/local/share/dbacl/testsuite
       directory.

       The  first stage when using the testsuite is deciding which classifiers
       to compare.  You can view a list of available wrappers by typing:

       % mailfoot testsuite list

       Note that the wrapper scripts are NOT  the  actual  email  classifiers,
       which must be installed separately by your system administrator or oth‐
       erwise.	Once this is done, you can select one or more wrappers for the
       simulation by typing, for example:

       % mailfoot testsuite select dbaclA ifile

       If some of the selected classifiers cannot be found on the system, they
       are not selected. Note also that some wrappers can have hard-coded cat‐
       egory  names,  e.g.  if the classifier only supports binary classifica‐
       tion. Heed the warning messages.

       It remains only to run the simulation. Beware, this  can	 take  a  long
       time (several hours depending on the classifier).

       % mailfoot testsuite run
       % mailfoot testsuite summarize

       Once you are all done, you can delete the working files, log files etc.
       by typing

       % mailfoot clean

SCRIPT INTERFACE
       mailfoot testsuite takes care of learning and classifying your prepared
       email  corpora  for  each  selected  classifier. Since classifiers have
       widely varying interfaces, this is  only	 possible  by  wrapping	 those
       interfaces individually into a standard form which can be used by mail‐
       foot testsuite.

       Each wrapper script is a command line tool which accepts a single  com‐
       mand followed by zero or more optional arguments, in the standard form:

       wrapper command [argument]...

       Each  wrapper  script  also  makes  use	of  STDIN and STDOUT in a well
       defined way. If no behaviour is described,  then	 no  output  or	 input
       should be used.	The possible commands are described below:

       filter In this case, a single email is expected on STDIN, and a list of
	      category filenames is expected in $2, $3, etc. The script writes
	      the category name corresponding to the input email on STDOUT. No
	      trailing newline is required or expected.

       learn  In this case, a standard mbox stream is expected on STDIN, while
	      a	 suitable  category  file name is expected in $2. No output is
	      written to STDOUT.

       clean  In this case, a directory is expected in $2, which  is  examined
	      for  old	database  information. If any old databases are found,
	      they are purged or reset. No output is written to STDOUT.

       describe
	      IN this case, a single  line  of	text  is  written  to  STDOUT,
	      describing  the  filter's functionality. The line should be kept
	      short to prevent line wrapping on a terminal.

       bootstrap
	      In this case, a directory is expected in $2. The wrapper	script
	      first checks for the existence of its associated classifier, and
	      other prerequisites. If the check is successful, then the	 wrap‐
	      per is cloned into the supplied directory.  A courtesy notifica‐
	      tion should be given on STDOUT to express	 success  or  failure.
	      It is also permissible to give longer descriptions caveats.

       toe    Used by mailtoe(1).

       foot   In  this	case, a list of categories is expected in $3, $4, etc.
	      Every possible category must be listed. Preceding this list, the
	      true category is given in $2.

ENVIRONMENT
       Right  after loading, mailfoot reads the hidden file .mailfootrc in the
       $HOME directory, if it exists, so this would be a good place to	define
       custom values for environment variables.

       MAILFOOT_FILTER
	      This variable contains a shell command to be executed repeatedly
	      during the running stage.	 The command should  accept  an	 email
	      message  on  STDIN  and output a resulting category name. On the
	      command line, it should also  accept  first  the	true  category
	      name,  then  a list of all possible category file names.	If the
	      output category does not match the true category, then the rele‐
	      vant    categories   are	 assumed   to	have   been   silently
	      updated/relearned.  If MAILFOOT_FILTER  is  undefined,  mailfoot
	      uses a default value.

       TEMPDIR
	      This  directory  is exported for the benefit of wrapper scripts.
	      Scripts which need to create temporary files should place them a
	      the location given in TEMPDIR.

NOTES
       The  subdirectory  mailfoot.d  can grow quite large. It contains a full
       copy of the training corpora, as well as learning files for size	 times
       all the added categories, and various log files.

       FOOT simulations for dbacl(1) are very, very slow (order n squared) and
       will take all night to perform. This is not easy to improve.

WARNING
       Because the ordering of emails within the added mailboxes matters,  the
       estimated  error	 rates	are  not well defined or even meaningful in an
       objective sense.	 However, if the sample	 emails	 represent  an	actual
       snapshot	 of a user's incoming email, then the error rates are somewhat
       meaningful. The simulations can then be interpreted as alternate reali‐
       ties where a given classifier would have intercepted the incoming mail.

SOURCE
       The  source code for the latest version of this program is available at
       the following locations:

       http://www.lbreyer.com/gpl.html
       http://dbacl.sourceforge.net

AUTHOR
       Laird A. Breyer <laird@lbreyer.com>

SEE ALSO
       bayesol(1) dbacl(1), mailcross(1), mailinspect(1), mailtoe(1), regex(7)

Version 1.14.1	      Bayesian Text Classification Tools	   MAILFOOT(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net