langident man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

LANGIDENT(1)	      User Contributed Perl Documentation	  LANGIDENT(1)

NAME
       langident - identifies the language files are written in

SYNOPSIS
	 langident [OPTIONS] file1 [file2 ...]

DESCRIPTION
       Identifies the language files are written in using Perl module
       Lingua::Identify.

   OPTIONS
   -a
       Show all results (not just the most probable language).

   -c
       Show confidence level for most probable language (it will be the first
       value right after the most probable language).

   -d
       Debug (development only).

   -E ENCODING
       Select an input encoding. Defaults to UTF-8.

	 # use ISO-8859-1 (latin1)
	 langident -E ISO-8859-1 file

   -e METHODS
       Select the method(s) to use. There are three ways of doing this:

	 # simply using a method
	 langident -e ngrams3 file

	 # using several methods (separate them with a comma)
	 langident -e prefixes3,suffixes3

	 # using several methods and assign different weights to each of them
	 langident -e smallwords=2,prefixes=1,ngrams3=1.3

       The available methods are the following: smallwords, prefixes1,
       prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3,
       suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.

   -h
       Display help message and exit.

   -l
       List all available languages and exit.

   -m NUMBER
       Set maximum number of results (languages) to display (shows the N most
       probable languages, by descending order of probability).

       Overrides the -a switch.

   -o LANGUAGES
       Only work with specified languages.

	 # identify between Portuguese and English only
	 langident -o pt,en *

   -p
       Also show percentages.

   -s SIZE
       Maximum size to examine.

   -v
       Show version and exit.

EXAMPLES
       Use methods ngrams2 and ngrams1, assigning the double of importance to
       ngrams2 (-e switch); output will include the three most probable
       languages (-m switch) with its percentages (-p switch) and also the
       confidence level (-c switch) of the first result.

	 $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README
	 README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
	 $

TO DO
       ยท     Add a switch to ignore HTML tags (and maybe other formats too)

SEE ALSO
       Lingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3),
       Text::Affixes(3).

       A linguist and/or a shrink.

       The latest CVS version of "Lingua::Identify" (which includes langident)
       can be attained at
       http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/

       ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm

AUTHOR
       Jose Alves de Castro, <cog@cpan.org>

COPYRIGHT AND LICENSE
       Copyright 2004 by Jose Alves de Castro

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.20.2			  2010-05-21			  LANGIDENT(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net