KinoSearch::Index::Similarity man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

KinoSearch::Index::SimUseriContributed Perl DoKinoSearch::Index::Similarity(3)

NAME
       KinoSearch::Index::Similarity - Judge how well a document matches a
       query.

SYNOPSIS
	   package MySimilarity;

	   sub length_norm { return 1.0 }    # disable length normalization

	   package MyFullTextType;
	   use base qw( KinoSearch::Plan::FullTextType );

	   sub make_similarity { MySimilarity->new }

DESCRIPTION
       After determining whether a document matches a given query, a score
       must be calculated which indicates how well the document matches the
       query.  The Similarity class is used to judge how "similar" the query
       and the document are to each other; the closer the resemblance, they
       higher the document scores.

       The default implementation uses Lucene's modified cosine similarity
       measure.	 Subclasses might tweak the existing algorithms, or might be
       used in conjunction with custom Query subclasses to implement arbitrary
       scoring schemes.

       Most of the methods operate on single fields, but some are used to
       combine scores from multiple fields.

CONSTRUCTORS
   new()
	   my $sim = KinoSearch::Index::Similarity->new;

       Constructor. Takes no arguments.

METHODS
   length_norm(num_tokens)
       Dampen the scores of long documents.

       After a field is broken up into terms at index-time, each term must be
       assigned a weight.  One of the factors in calculating this weight is
       the number of tokens that the original field was broken into.

       Typically, we assume that the more tokens in a field, the less
       important any one of them is -- so that, e.g. 5 mentions of "Kafka" in
       a short article are given more heft than 5 mentions of "Kafka" in an
       entire book.  The default implementation of length_norm expresses this
       using an inverted square root.

       However, the inverted square root has a tendency to reward very short
       fields highly, which isn't always appropriate for fields you expect to
       have a lot of tokens on average.

INHERITANCE
       KinoSearch::Index::Similarity isa KinoSearch::Object::Obj.

COPYRIGHT AND LICENSE
       Copyright 2005-2010 Marvin Humphrey

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.14.1			  2011-06-20  KinoSearch::Index::Similarity(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net