Lingua::StopWords man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Lingua::StopWords(3)  User Contributed Perl Documentation Lingua::StopWords(3)

NAME
       Lingua::StopWords - Stop words for several languages.

SYNOPSIS
	   use Lingua::StopWords qw( getStopWords );
	   my $stopwords = getStopWords('en');

	   my @words = qw( i am the walrus goo goo g'joob );

	   # prints "walrus goo goo g'joob"
	   print join ' ', grep { !$stopwords->{$_} } @words;

DESCRIPTION
       In keyword search, it is common practice to suppress a collection of
       "stopwords": words such as "the", "and", "maybe", etc. which exist in
       in a large number of documents and do not tell you anything important
       about any document which contains them.	This module provides such
       "stoplists" in several languages.

   Supported Languages
	   |-----------------------------------------------------------|
	   | Language	| ISO code | default encoding | also available |
	   |-----------------------------------------------------------|
	   | Danish	| da	   | ISO-8859-1	      | UTF-8	       |
	   | Dutch	| nl	   | ISO-8859-1	      | UTF-8	       |
	   | English	| en	   | ISO-8859-1	      | UTF-8	       |
	   | Finnish	| fi	   | ISO-8859-1	      | UTF-8	       |
	   | French	| fr	   | ISO-8859-1	      | UTF-8	       |
	   | German	| de	   | ISO-8859-1	      | UTF-8	       |
	   | Hungarian	| hu	   | ISO-8859-1	      | UTF-8	       |
	   | Italian	| it	   | ISO-8859-1	      | UTF-8	       |
	   | Norwegian	| no	   | ISO-8859-1	      | UTF-8	       |
	   | Portuguese | pt	   | ISO-8859-1	      | UTF-8	       |
	   | Spanish	| es	   | ISO-8859-1	      | UTF-8	       |
	   | Swedish	| sv	   | ISO-8859-1	      | UTF-8	       |
	   | Russian	| ru	   | KOI8-R	      | UTF-8	       |
	   |-----------------------------------------------------------|

FUNCTIONS
   getStopWords
	   my $stoplist	     = getStopWords('en');
	   my $utf8_stoplist = getStopWords('en', 'UTF-8');

       Retrieve a stoplist in the form of a hashref where the keys are all
       stopwords and the values are all 1.

	   $stoplist = {
	       and => 1,
	       if  => 1,
	       # ...
	   };

       getStopWords() expects 1-2 arguments.  The first, which is required, is
       an ISO code representing a supported language.  If the ISO code cannot
       be found, getStopWords returns undef.

       The second argument should be 'UTF-8' if you want the stopwords encoded
       in UTF-8.  The UTF-8 flag will be turned on, so make sure you
       understand all the implications of that.

SEE ALSO
       The stoplists supplied by this module were created as part of the
       Snowball project (see <http://snowball.tartarus.org>,
       Lingua::Stem::Snowball).

       Lingua::EN::StopWords provides a different stoplist for English.

AUTHOR
       Maintained by Marvin Humphrey <marvin at rectangular dot com>.
       Original author Fabien Potencier, <fabpot at cpan dot org>.

COPYRIGHT AND LICENSE
       Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself, either Perl version 5.8.3 or, at
       your option, any later version of Perl 5 you may have available.

perl v5.14.1			  2008-08-22		  Lingua::StopWords(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net