Mail::Mbox::MessageParUser3Contributed Perl DocumeMail::Mbox::MessageParser(3)NAMEMail::Mbox::MessageParser - A fast and simple mbox folder reader
SYNOPSIS
#!/usr/bin/perl
use Mail::Mbox::MessageParser;
my $file_name = 'mail/saved-mail';
my $file_handle = new FileHandle($file_name);
# Set up cache. (Not necessary if enable_cache is false.)
Mail::Mbox::MessageParser::SETUP_CACHE(
{ 'file_name' => '/tmp/cache' } );
my $folder_reader =
new Mail::Mbox::MessageParser( {
'file_name' => $file_name,
'file_handle' => $file_handle,
'enable_cache' => 1,
'enable_grep' => 1,
} );
die $folder_reader unless ref $folder_reader;
# Any newlines or such before the start of the first email
my $prologue = $folder_reader->prologue;
print $prologue;
# This is the main loop. It's executed once for each email
while(!$folder_reader->end_of_file())
{
my $email = $folder_reader->read_next_email();
print $$email;
}
DESCRIPTION
This module implements a fast but simple mbox folder reader. One of
three implementations (Cache, Grep, Perl) will be used depending on the
wishes of the user and the system configuration. The first
implementation is a cached-based one which stores email information
about mailboxes on the file system. Subsequent accesses will be faster
because no analysis of the mailbox will be needed. The second
implementation is one based on GNU grep, and is significantly faster
than the Perl version for mailboxes which contain very large (10MB)
emails. The final implementation is a fast Perl-based one which should
always be applicable.
The Cache implementation is about 6 times faster than the standard Perl
implementation. The Grep implementation is about 4 times faster than
the standard Perl implementation. If you have GNU grep, it's best to
enable both the Cache and Grep implementations. If the cache
information is available, you'll get very fast speeds. Otherwise,
you'll take about a 1/3 performance hit when the Grep version is used
instead.
The overriding requirement for this module is speed. If you wish more
sophisticated parsing, use Mail::MboxParser (which is based on this
module) or Mail::Box.
METHODS AND FUNCTIONS
SETUP_CACHE(...)
SETUP_CACHE( { 'file_name' => <cache file name> } );
<cache file name> - the file name of the cache
Call this function once to set up the cache before creating any
parsers. You must provide the location to the cache file. There is
no default value.
new(...)
new( { 'file_name' => <mailbox file name>,
'file_handle' => <mailbox file handle>,
'enable_cache' => <1 or 0>,
'enable_grep' => <1 or 0>,
'force_processing' => <1 or 0>,
'debug' => <1 or 0>,
} );
<mailbox file name> - the file name of the mailbox
<mailbox file handle> - the already opened file handle for the mailbox
<enable_cache> - true to attempt to use the cache implementation
<enable_grep> - true to attempt to use the grep implementation
<force_processing> - true to force processing of files that look invalid
<debug> - true to print some debugging information to STDERR
The constructor takes either a file name or a file handle, or both.
If the file handle is not defined, Mail::Mbox::MessageParser will
attempt to open the file using the file name. You should always
pass the file name if you have it, so that the parser can cache the
mailbox information.
This module will automatically decompress the mailbox as necessary.
If a filename is available but the file handle is undef, the module
will call either bzip2, or gzip to decompress the file in memory if
the filename ends with .tz, .bz2, or .gz, respectively. If the file
handle is defined, it will detect the type of compression and apply
the correct decompression program.
The Cache, Grep, or Perl implementation of the parser will be
loaded, whichever is most appropriate. For example, the first time
you use caching, there will be no cache. In this case, the grep
implementation can be used instead. The cache will be updated in
memory as the grep implementation parses the mailbox, and the cache
will be written after the program exits. The file name is optional,
in which case enable_cache and enable_grep must both be false.
force_processing will cause the module to process folders that look
to be binary, or whose text data doesn't look like a mailbox.
Returns a reference to a Mail::Mbox::MessageParser object on
success, and a scalar desribing an error on failure. ("Not a
mailbox", "Can't open <filename>: <system error>", "Can't execute
<uncompress command> for file <filename>"
reset()
Reset the filehandle and all internal state. Note that this will
not work with filehandles which are streams. If there is enough
demand, I may add the ability to store the previously read stream
data internally so that reset() will work correctly.
endline()
Returns "\n" or "\r\n", depending on the file format.
prologue()
Returns any newlines or other content at the start of the mailbox
prior to the first email.
end_of_file()
Returns true if the end of the file has been encountered.
line_number()
Returns the line number for the start of the last email read.
number()
Returns the number of the last email read. (i.e. The first email
will have a number of 1.)
length()
Returns the length of the last email read.
offset()
Returns the byte offset of the last email read.
read_next_email()
Returns a reference to a scalar holding the text of the next email
in the mailbox, or undef at the end of the file.
BUGS
No known bugs.
Contact david@coppit.org for bug reports and suggestions.
AUTHOR
David Coppit <david@coppit.org>.
LICENSE
This software is distributed under the terms of the GPL. See the file
"LICENSE" for more information.
HISTORY
This code was originally part of the grepmail distribution. See
http://grepmail.sf.net/ for previous versions of grepmail which
included early versions of this code.
SEE ALSO
Mail::MboxParser, Mail::Box
perl v5.14.1 2011-06-28 Mail::Mbox::MessageParser(3)