KinoSearch::Index::SimUseriContributed Perl DoKinoSearch::Index::Similarity(3)NAMEKinoSearch::Index::Similarity - Judge how well a document matches a
query.
SYNOPSIS
package MySimilarity;
sub length_norm { return 1.0 } # disable length normalization
package MyFullTextType;
use base qw( KinoSearch::Plan::FullTextType );
sub make_similarity { MySimilarity->new }
DESCRIPTION
After determining whether a document matches a given query, a score
must be calculated which indicates how well the document matches the
query. The Similarity class is used to judge how "similar" the query
and the document are to each other; the closer the resemblance, they
higher the document scores.
The default implementation uses Lucene's modified cosine similarity
measure. Subclasses might tweak the existing algorithms, or might be
used in conjunction with custom Query subclasses to implement arbitrary
scoring schemes.
Most of the methods operate on single fields, but some are used to
combine scores from multiple fields.
CONSTRUCTORSnew()
my $sim = KinoSearch::Index::Similarity->new;
Constructor. Takes no arguments.
METHODSlength_norm(num_tokens)
Dampen the scores of long documents.
After a field is broken up into terms at index-time, each term must be
assigned a weight. One of the factors in calculating this weight is
the number of tokens that the original field was broken into.
Typically, we assume that the more tokens in a field, the less
important any one of them is -- so that, e.g. 5 mentions of "Kafka" in
a short article are given more heft than 5 mentions of "Kafka" in an
entire book. The default implementation of length_norm expresses this
using an inverted square root.
However, the inverted square root has a tendency to reward very short
fields highly, which isn't always appropriate for fields you expect to
have a lot of tokens on average.
INHERITANCEKinoSearch::Index::Similarity isa KinoSearch::Object::Obj.
COPYRIGHT AND LICENSE
Copyright 2005-2010 Marvin Humphrey
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
perl v5.14.1 2011-06-20 KinoSearch::Index::Similarity(3)