Tagger(3) User Contributed Perl Documentation Tagger(3)NAMELingua::EN::Tagger - Part-of-speech tagger for English natural language
processing.
SYNOPSIS
# Create a parser object
my $p = new Lingua::EN::Tagger;
# Add part of speech tags to a text
my $tagged_text = $p->add_tags( $text );
...
# Get a list of all nouns and noun phrases with occurrence counts
my %word_list = $p->get_words( $text );
...
# Get a readable version of the tagged text
my $readable_text = $p->get_readable( $text );
DESCRIPTION
The module is a probability based, corpus-trained tagger that assigns
POS tags to English text based on a lookup dictionary and a set of
probability values. The tagger assigns appropriate tags based on
conditional probabilities - it examines the preceding tag to determine
the appropriate tag for the current word. Unknown words are classified
according to word morphology or can be set to be treated as nouns or
other parts of speech.
The tagger also extracts as many nouns and noun phrases as it can,
using a set of regular expressions.
CONSTRUCTOR
new %PARAMS
Class constructor. Takes a hash with the following parameters
(shown with default values):
unknown_word_tag => ''
Tag to assign to unknown words
stem => 0
Stem single words using Lingua::Stem::EN
weight_noun_phrases => 0
When returning occurrence counts for a noun phrase, multiply
the value by the number of words in the NP.
longest_noun_phrase => 5
Will ignore noun phrases longer than this threshold. This
affects only the get_words() and get_nouns() methods.
relax => 0
Relax the Hidden Markov Model: this may improve accuracy for
uncommon words, particularly words used polysemously
METHODS
add_tags TEXT
Examine the string provided and return it fully tagged ( XML style
)
get_words TEXT
Given a text string, return as many nouns and noun phrases as
possible. Applies add_tags and involves three stages:
* Tag the text
* Extract all the maximal noun phrases
* Recursively extract all noun phrases from the MNPs
get_readable TEXT
Return an easy-on-the-eyes tagged version of a text string.
Applies add_tags and reformats to be easier to read.
get_sentences TEXT
Returns an anonymous array of sentences (without POS tags) from a
text.
get_proper_nouns TAGGED_TEXT
Given a POS-tagged text, this method returns a hash of all proper
nouns and their occurrence frequencies. The method is greedy and
will return multi-word phrases, if possible, so it would find
``Linguistic Data Consortium'' as a single unit, rather than as
three individual proper nouns. This method does not stem the found
words.
get_nouns TAGGED_TEXT
Given a POS-tagged text, this method returns all nouns and their
occurrence frequencies.
get_max_noun_phrases TAGGED_TEXT
Given a POS-tagged text, this method returns only the maximal noun
phrases. May be called directly, but is also used by
get_noun_phrases
get_noun_phrases TAGGED_TEXT
Similar to get_words, but requires a POS-tagged text as an
argument.
install
Reads some included corpus data and saves it in a stored hash on
the local file system. This is called automatically if the tagger
can't find the stored lexicon.
AUTHORS
Aaron Coburn <aaron@coburncuadrado.com>
CONTRIBUTORS
Maciej Ceglowski <developer@ceglowski.com>
Eric Nichols, Nara Institute of Science and Technology
COPYRIGHT AND LICENSE
Copyright 2003-2010 Aaron Coburn <aaron@coburncuadrado.com>
This program is free software; you can redistribute it and/or modify
it under the terms of version 3 of the GNU General Public License as
published by the Free Software Foundation.
perl v5.14.2 2010-05-11 Tagger(3)