WordList man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]


WordList(3)							   WordList(3)

NAME
       WordList -

       abstract class to manage and use an inverted index file.

SYNOPSIS
       #include <mifluz.h>

       WordContext context;

       WordList* words = context->List();

       delete words;

DESCRIPTION
       WordList	 is the mifluz equivalent of a database handler. Each WordList
       object is bound to an inverted index file and implements the operations
       to  create  it,	fill  it with word occurrences and search for an entry
       matching a given criterion.

       WordList is an abstract class and cannot	 be  instanciated.   The  List
       method  of  the	class  WordContext  will  create an instance using the
       appropriate derived class, either WordListOne or	 WordListMulti.	 Refer
       to  the	corresponding  manual pages for more information on their spe‐
       cific semantic.

       When doing bulk insertions, mifluz creates temporary files that contain
       the  entries  to	 be  inserted  in the index. Those files are typically
       named indexC00000000 temporary file is wordlist_cache_size  /  2.  When
       the  maximum  size  of  the  temporary  file is reached, mifluz creates
       another temporary file named indexC00000001 created 50 temporary	 file.
       At  this point it merges all temporary files into one that replaces the
       first indexC00000000 to create temporary file again and keeps following
       this  algorithm	until  the  bulk  insertion is finished. When the bulk
       insertion is finished, mifluz has one  big  file	 named	indexC00000000
       that  contains  all  the	 entries  to  be inserted in the index. mifluz
       inserts all the entries from indexC00000000 into the index  and	delete
       the  temporary file when done. The insertion will be fast since all the
       entries in indexC00000000 are already sorted.

       The parameter wordlist_cache_max can be used to prevent	the  temporary
       files  to grow indefinitely. If the total cumulated size of the indexC*
       files grow beyond this parameter, they are merged into the  main	 index
       and  deleted.  For  instance  setting  this  parameter  value  to 500Mb
       garanties that the total size of the indexC* files will not grow	 above
       500Mb.

CONFIGURATION
       For  more  information  on  the configuration attributes and a complete
       list of attributes, see the mifluz(3) manual page.

       wordlist_extend {true|false} (default false)
	      If true maintain reference count of unique  words.  The  Noccur‐
	      rence method gives access to this count.

       wordlist_verbose <number> (default 0)
	      Set the verbosity level of the WordList class.

	      1 walk logic

	      2 walk logic details

	      3 walk logic lots of details

       wordlist_page_size <bytes> (default 8192)
	      Berkeley DB page size (see Berkeley DB documentation)

       wordlist_cache_size <bytes> (default 500K)
	      Berkeley	DB  cache  size	 (see Berkeley DB documentation) Cache
	      makes a huge difference in performance. It must be at  least  2%
	      of  the  expected	 total	data size. Note that if compression is
	      activated the data size is eight times larger  than  the	actual
	      file  size.  In  this case the cache must be scaled to 2% of the
	      data size, not 2% of the file size.  See	Cache  tuning  in  the
	      mifluz  guide for more hints.  See WordList(3) for the rationale
	      behind cache file handling.

       wordlist_cache_max <bytes> (default 0)
	      Maximum size of the cumulated cache files generated  when	 doing
	      bulk  insertion  with the BatchStart() function. When this limit
	      is reached, the cache files are all  merged  into	 the  inverted
	      index.	The   value   0	 means	infinite  size	allowed.   See
	      WordList(3) for the rationale behind cache file handling.

       wordlist_cache_inserts {true|false} (default false)
	      If true all Insert calls are cached in memory. When the WordList
	      object  is  closed  or  a	 different access method is called the
	      cached entries are flushed in the inverted index.

       wordlist_compress {true|false} (default false)
	      Activate compression of the index. The resulting index is	 eight
	      times smaller than the uncompressed index.

METHODS
       inline WordContext* GetContext()
	      Return  a	 pointer to the WordContext object used to create this
	      instance.

       inline const WordContext* GetContext() const
	      Return a pointer to the WordContext object used to  create  this
	      instance as a const.

       virtual inline int Override(const WordReference& wordRef)
	      Insert wordRef in index. If the Key() part of the wordRef exists
	      in the index, override it.  Returns  OK  on  success,  NOTOK  on
	      error.

       virtual int Exists(const WordReference& wordRef)
	      Returns OK if wordRef exists in the index, NOTOK otherwise.

       inline int Exists(const String& word)
	      Returns OK if word exists in the index, NOTOK otherwise.

       virtual int WalkDelete(const WordReference& wordRef)
	      Delete all entries in the index whose key matches the Key() part
	      of wordRef , using the  Walk  method.   Returns  the  number  of
	      entries successfully deleted.

       virtual int Delete(const WordReference& wordRef)
	      Delete  the  entry  in  the index that exactly matches the Key()
	      part of wordRef.	Returns OK if deletion is  successfull,	 NOTOK
	      otherwise.

       virtual int Open(const String& filename, int mode)
	      Open  inverted  index filename.  mode may be O_RDONLY or O_RDWR.
	      If mode is O_RDWR it can be or'ed with O_TRUNC to reset the con‐
	      tent of an existing inverted index.  Return OK on success, NOTOK
	      otherwise.

       virtual int Close()
	      Close inverted index.  Return OK on success, NOTOK otherwise.

       virtual unsigned int Size() const
	      Return the size of the index in pages.

       virtual int Pagesize() const
	      Return the page size

       virtual WordDict *Dict()
	      Return a pointer to the inverted index dictionnary.

       const String& Filename() const
	      Return the filename given to the last call to Open.

       int Flags() const
	      Return the mode given to the last call to Open.

       inline List *Find(const WordReference& wordRef)
	      Returns the list of word occurrences exactly matching the	 Key()
	      part  of	wordRef.   The List returned contains pointers to Wor‐
	      dReference objects. It is the responsibility of  the  caller  to
	      free the list. See List.h header for usage.

       inline List *FindWord(const String& word)
	      Returns  the list of word occurrences exactly matching the word.
	      The List returned contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See List.h
	      header for usage.

       virtual List *operator [] (const WordReference& wordRef)
	      Alias to the Find method.

       inline List *operator [] (const String& word)
	      Alias to the FindWord method.

       virtual List *Prefix (const WordReference& prefix)
	      Returns the list of word occurrences matching the Key() part  of
	      wordRef.	 In  the Key() , the string (accessed with GetWord() )
	      matches any string that begins with it. The List	returned  con‐
	      tains  pointers to WordReference objects. It is the responsibil‐
	      ity of the caller to free the list.

       inline List *Prefix (const String& prefix)
	      Returns the list of word occurrences matching the word.  In  the
	      Key() , the string (accessed with GetWord() ) matches any string
	      that begins with it. The List returned contains pointers to Wor‐
	      dReference  objects.  It	is the responsibility of the caller to
	      free the list.

       virtual List *Words()
	      Returns a list of all unique words  contained  in	 the  inverted
	      index. The List returned contains pointers to String objects. It
	      is the responsibility of the caller to free the list. See List.h
	      header for usage.

       virtual List *WordRefs()
	      Returns  a  list of all entries contained in the inverted index.
	      The List returned contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See List.h
	      header for usage.

       virtual WordCursor  *Cursor(wordlist_walk_callback_t  callback,	Object
       *callback_data)
	      Create  a	 cursor	 that  searches	 all  the  occurrences	in the
	      inverted index and call ncallback with ncallback_data for	 every
	      match.

       virtual	WordCursor  *Cursor(const  WordKey  &searchKey,	 int  action =
       HTDIG_WORDLIST_WALKER)
	      Create a	cursor	that  searches	all  the  occurrences  in  the
	      inverted	index and that match nsearchKey.  If naction is set to
	      HTDIG_WORDLIST_WALKER	calls	  searchKey.callback	  with
	      searchKey.callback_data  for  every  match. If naction is set to
	      HTDIG_WORDLIST_COLLECT push each match  in  searchKey.collectRes
	      data  member as a WordReference object. It is the responsibility
	      of the caller to free the searchKey.collectRes list.

       virtual	   WordCursor	   *Cursor(const      WordKey	   &searchKey,
       wordlist_walk_callback_t callback, Object * callback_data)
	      Create  a	 cursor	 that  searches	 all  the  occurrences	in the
	      inverted index and that match  nsearchKey	 and  calls  ncallback
	      with ncallback_data for every match.

       virtual WordKey Key(const String& bufferin)
	      Create  a WordKey object and return it. The bufferin argument is
	      used to initialize the key, as in the WordKey::Set method.   The
	      first component of bufferin must be a word that is translated to
	      the  corresponding  numerical  id	 using	the   WordDict::Serial
	      method.

       virtual WordReference Word(const String& bufferin, int exists = 0)
	      Create  a WordReference object and return it. The bufferin argu‐
	      ment is used to initialize the structure, as in  the  WordRefer‐
	      ence::Set	 method.   The	first  component of bufferin must be a
	      word that is translated to the corresponding numerical id	 using
	      the  WordDict::Serial  method.  If the exists argument is set to
	      1, the method WordDict::SerialExists is used instead, that is no
	      serial  is assigned to the word if it does not already have one.
	      Before translation  the  word  is	 normalized  using  the	 Word‐
	      Type::Normalize  method.	The word is saved using the WordRefer‐
	      ence::SetWord method.

       virtual WordReference WordExists(const String& bufferin)
	      Alias for Word(bufferin, 1).

       virtual void BatchStart()
	      Accelerate bulk insertions in the inverted index. All  insertion
	      done  with  the  Override	 method	 are  batched instead of being
	      updating the inverted  index  immediately.   No  update  of  the
	      inverted	index  file  is	 done  before  the  BatchEnd method is
	      called.

       virtual void BatchEnd()
	      Terminate a bulk insertion started with a call to the BatchStart
	      method. When all insertions are done the AllRef method is called
	      to restore statistics.

       virtual int Noccurrence(const String& key, unsigned  int&  noccurrence)
       const
	      Return  in  noccurrence  the number of occurrences of the string
	      contained in the GetWord() part of key.  Returns OK on  success,
	      NOTOK otherwise.

       virtual int Write(FILE* f)
	      Write  on	 file  descriptor f an ASCII description of the index.
	      Each line of the file contains a	WordReference  ASCII  descrip‐
	      tion.  Return OK on success, NOTOK otherwise.

       virtual int WriteDict(FILE* f)
	      Write on file descriptor f the complete dictionnary with statis‐
	      tics.  Return OK on success, NOTOK otherwise.

       virtual int Read(FILE* f)
	      Read WordReference ASCII descriptions from f , returns the  num‐
	      ber of inserted WordReference or < 0 if an error occurs. Invalid
	      descriptions are ignored as well as empty lines.

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1), mifluzload(1),
       mifluzsearch(1),	 mifluzdict(1),	 WordContext(3), WordDict(3), WordLis‐
       tOne(3), WordKey(3), WordKeyInfo(3), WordType(3),  WordDBInfo(3),  Wor‐
       dRecordInfo(3),	WordRecord(3),	WordReference(3), WordCursor(3), Word‐
       CursorOne(3), WordMonitor(3), Configuration(3), mifluz(3)

				     local			   WordList(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net