tokyocabinet man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

TOKYOCABINET(3)			 Tokyo Cabinet		       TOKYOCABINET(3)

NAME
       tokyocabinet - a modern implementation of DBM

INTRODUCTION
       Tokyo  Cabinet  is  a library of routines for managing a database.  The
       database is a simple data file containing records, each is a pair of  a
       key  and	 a  value.   Every key and value is serial bytes with variable
       length.	Both binary data and character string can be used as a key and
       a  value.   There  is  neither  concept	of data tables nor data types.
       Records are organized in hash table, B+ tree, or fixed-length array.

       As for database of hash table, each key must be unique within  a	 data‐
       base, so it is impossible to store two or more records with a key over‐
       laps.  The following access methods are provided to the database: stor‐
       ing  a  record  with  a	key  and  a value, deleting a record by a key,
       retrieving a record by a key.  Moreover, traversal access to every  key
       are  provided,  although	 the order is arbitrary.  These access methods
       are similar to ones of DBM (or its followers: NDBM  and	GDBM)  library
       defined	in the UNIX standard.  Tokyo Cabinet is an alternative for DBM
       because of its higher performance.

       As for database of B+ tree, records whose keys are  duplicated  can  be
       stored.	 Access	 methods of storing, deleting, and retrieving are pro‐
       vided as with the database of hash table.  Records are stored in	 order
       by  a comparison function assigned by a user.  It is possible to access
       each record with the cursor in ascending or descending order.   Accord‐
       ing  to	this  mechanism, forward matching search for strings and range
       search for integers are realized.

       As for database of fixed-length array, records are stored  with	unique
       natural	numbers.  It is impossible to store two or more records with a
       key overlaps.  Moreover, the length of each record is  limited  by  the
       specified  length.   Provided  operations  are the same as ones of hash
       database.

       Table database is also provided as a variant of	hash  database.	  Each
       record is identified by the primary key and has a set of named columns.
       Although there is no concept of data schema, it is possible  to	search
       for  records  with  complex  conditions efficiently by using indices of
       arbitrary columns.

       Tokyo Cabinet is written in the C language, and provided as API	of  C,
       Perl,  Ruby,  Java,  and	 Lua.  Tokyo Cabinet is available on platforms
       which have API conforming to C99 and POSIX.  Tokyo Cabinet  is  a  free
       software licensed under the GNU Lesser General Public License.

THE DINOSAUR WING OF THE DBM FORKS
       Tokyo  Cabinet  is  developed  as the successor of GDBM and QDBM on the
       following purposes.  They are achieved and Tokyo Cabinet replaces  con‐
       ventional DBM products.

	      improves space efficiency : smaller size of database file.
	      improves time efficiency : faster processing speed.
	      improves	parallelism : higher performance in multi-thread envi‐
	      ronment.
	      improves usability : simplified API.
	      improves robustness : database file is not corrupted even	 under
	      catastrophic situation.
	      supports	64-bit	architecture : enormous memory space and data‐
	      base file are available.

       As with QDBM, the following three restrictions of  traditional  DBM:  a
       process	can handle only one database, the size of a key and a value is
       bounded, a database file is sparse, are cleared.	 Moreover, the follow‐
       ing  three restrictions of QDBM: the size of a database file is limited
       to 2GB, environments with different byte orders can not share  a	 data‐
       base  file, only one thread can search a database at the same time, are
       cleared.

       Tokyo Cabinet runs very fast.  For example, elapsed  time  to  store  1
       million	records	 is 0.7 seconds for hash database, and 1.6 seconds for
       B+ tree database.  Moreover, the size of database of Tokyo  Cabinet  is
       very  small.   For  example, overhead for a record is 16 bytes for hash
       database, and 5 bytes for B+ tree database.   Furthermore,  scalability
       of Tokyo Cabinet is great.  The database size can be up to 8EB (9.22e18
       bytes).

EFFECTIVE IMPLEMENTATION OF HASH DATABASE
       Tokyo Cabinet uses hash algorithm to retrieve  records.	 If  a	bucket
       array  has  sufficient  number  of  elements,  the  time	 complexity of
       retrieval is "O(1)".  That is, time required for retrieving a record is
       constant,  regardless  of the scale of a database.  It is also the same
       about storing and deleting.  Collision of hash  values  is  managed  by
       separate chaining.  Data structure of the chains is binary search tree.
       Even if a bucket array has unusually scarce elements, the time complex‐
       ity of retrieval is "O(log n)".

       Tokyo  Cabinet attains improvement in retrieval by loading RAM with the
       whole of a bucket array.	 If a bucket array is on RAM, it  is  possible
       to  access a region of a target record by about one path of file opera‐
       tions.  A bucket array saved in a file is not read into	RAM  with  the
       `read'  call  but  directly mapped to RAM with the `mmap' call.	There‐
       fore, preparation time on connecting to a database is very  short,  and
       two or more processes can share the same memory map.

       If  the	number	of elements of a bucket array is about half of records
       stored within a database, although it depends on characteristic of  the
       input,  the  probability	 of  collision	of  hash values is about 56.7%
       (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if	 eight
       times).	 In  such  case, it is possible to retrieve a record by two or
       less paths of file operations.  If it is made into a performance index,
       in  order  to  handle  a	 database containing one million of records, a
       bucket array with half a million of elements is needed.	 The  size  of
       each  element  is 4 bytes.  That is, if 2M bytes of RAM is available, a
       database containing one million records can be handled.

       Traditional DBM provides two modes of the storing operations:  "insert"
       and  "replace".	 In  the  case	a key overlaps an existing record, the
       insert mode keeps the existing value, while the replace mode transposes
       it to the specified value.  In addition to the two modes, Tokyo Cabinet
       provides "concatenate" mode.  In the mode, the specified value is  con‐
       catenated at the end of the existing value and stored.  This feature is
       useful when adding an element to a value as an array.

       Generally speaking, while  succession  of  updating,  fragmentation  of
       available  regions  occurs,  and	 the size of a database grows rapidly.
       Tokyo Cabinet deal with this  problem  by  coalescence  of  dispensable
       regions	and  reuse  of	them.	When overwriting a record with a value
       whose size is greater than the existing one, it is necessary to	remove
       the  region to another position of the file.  Because the time complex‐
       ity of the operation depends on the size of the	region	of  a  record,
       extending  values  successively is inefficient.	However, Tokyo Cabinet
       deal with this problem by alignment.  If increment can be put  in  pad‐
       ding, it is not necessary to remove the region.

       The  "free block pool" to reuse dispensable regions efficiently is also
       implemented.  It keeps a list of	 dispensable  regions  and  reuse  the
       "best  fit" region, that is the smallest region in the list, when a new
       block is requested.  Because fragmentation is inevitable even then, two
       kinds  of  optimization	(defragmentation)  mechanisms are implemented.
       The first is called static optimization which deploys all records  into
       another	file  and  then writes them back to the original file at once.
       The second is called dynamic optimization which gathers up  dispensable
       regions	by  replacing the locations of records and dispensable regions
       gradually.

USEFUL IMPLEMENTATION OF B+ TREE DATABASE
       Although B+ tree database is slower than	 hash  database,  it  features
       ordering	 access	 to  each record.  The order can be assigned by users.
       Records of B+ tree are sorted and arranged in  logical  pages.	Sparse
       index organized in B tree that is multiway balanced tree are maintained
       for each page.  Thus, the time complexity of retrieval  and  so	on  is
       "O(log  n)".   Cursor  is provided to access each record in order.  The
       cursor can jump to a position specified by a key and can	 step  forward
       or  backward  from the current position.	 Because each page is arranged
       as double linked list,  the  time  complexity  of  stepping  cursor  is
       "O(1)".

       B+ tree database is implemented, based on above hash database.  Because
       each page of B+ tree is stored as each record of hash database, B+ tree
       database	 inherits  efficiency  of storage management of hash database.
       Because the header of each record is smaller and alignment of each page
       is  adjusted  according	to  the	 page size, in most cases, the size of
       database file is	 cut  by  half	compared  to  one  of  hash  database.
       Although	 operation  of many pages are required to update B+ tree, QDBM
       expedites the process by caching pages and  reducing  file  operations.
       In  most	 cases, because whole of the sparse index is cached on memory,
       it is possible to retrieve a record by one or less path of file	opera‐
       tions.

       Each  pages  of B+ tree can be stored with compressed.  Two compression
       method; Deflate of ZLIB and Block  Sorting  of  BZIP2,  are  supported.
       Because	each record in a page has similar patterns, high efficiency of
       compression is expected due to the Lempel-Ziv or	 the  BWT  algorithms.
       In  case handling text data, the size of a database is reduced to about
       25%.  If the scale of a database is large and disk I/O is  the  bottle‐
       neck,  featuring	 compression  makes the processing speed improved to a
       large extent.

NAIVE IMPLEMENTATION OF FIXED-LENGTH DATABASE
       Fixed-length database has restrictions that each key should be a	 natu‐
       ral number and that the length of each value is limited.	 However, time
       efficiency and space efficiency are higher than the other  data	struc‐
       tures as long as the use case is within the restriction.

       Because	the  whole  region  of the database is mapped on memory by the
       `mmap' call and referred as  a  multidimensional	 array,	 the  overhead
       related	to  the	 file I/O is minimized.	 Due to this simple structure,
       fixed-length database works faster than hash database, and its  concur‐
       rency in multi-thread environment is prominent.

       The  size  of the database is proportional to the range of keys and the
       limit size of each value.  That is, the smaller the range of keys is or
       the  smaller  the  length  of each value is, the higher the space effi‐
       ciency is.  For example, if the maximum key is 1000000  and  the	 limit
       size  of the value is 100 bytes, the size of the database will be about
       100MB.  Because regions around referred records are only loaded on  the
       RAM,  you can increase the size of the database to the size of the vir‐
       tual memory.

FLEXIBLE IMPLEMENTATION OF TABLE DATABASE
       Table  database	does  not  express  simple  key/value  structure   but
       expresses a structure like a table of relational database.  Each record
       is identified by the primary key and has	 a  set	 of  multiple  columns
       named with arbitrary strings.  For example, a stuff in your company can
       be expressed by a record identified by the primary key of the  employee
       ID  number and structured by columns of his name, division, salary, and
       so on.  Unlike relational database, table database  does	 not  need  to
       define  any  data  schema and can contain records of various structures
       different from each other.

       Table database supports query functions with not only the  primary  key
       but  also  with conditions about arbitrary columns.  Each column condi‐
       tion is composed of the name of a column and  a	condition  expression.
       Operators of full matching, forward matching, regular expression match‐
       ing, and so on are provided for the string  type.   Operators  of  full
       matching,  range	 matching  and so on are provided for the number type.
       Operators for tag search and full-text search  are  also	 provided.   A
       query can contain multiple conditions for logical intersection.	Search
       by multiple queries for logical union is also available.	 The order  of
       the result set can be specified as the ascending or descending order of
       strings or numbers.

       You can create indices for arbitrary columns to improve performance  of
       search  and  sorting.  Although columns do not have data types, indices
       have types for strings or numbers.  Inverted indices  for  space	 sepa‐
       rated tokens and character N-gram tokens are also supported.  The query
       optimizer uses  indices	in  suitable  way  according  to  each	query.
       Indices are implemented as different files of B+ tree database.

PRACTICAL FUNCTIONALITY
       Databases on the filesystem feature transaction mechanisms.  It is pos‐
       sible to commit a series of operations between the  beginning  and  the
       end  of the transaction in a lump, or to abort the transaction and per‐
       form rollback to the state before the transaction.  Two isolation  lev‐
       els  are	 supported;  serializable and read uncommitted.	 Durability is
       secured by write ahead logging and shadow paging.

       Tokyo Cabinet provides two modes to connect to a database: "reader" and
       "writer".   A  reader  can  perform  retrieving but neither storing nor
       deleting.  A writer can perform all access methods.  Exclusion  control
       between	processes  is  performed when connecting to a database by file
       locking.	 While a writer is connected to a  database,  neither  readers
       nor  writers  can be connected.	While a reader is connected to a data‐
       base, other readers can be connect, but writers can not.	 According  to
       this  mechanism,	 data consistency is guaranteed with simultaneous con‐
       nections in multitasking environment.

       Functions of API of  Tokyo  cabinet  are	 reentrant  and	 available  in
       multi-thread  environment.  Discrete database object can be operated in
       parallel entirely.  For simultaneous operations of  the	same  database
       object,	read-write lock is used for exclusion control.	That is, while
       a writing thread is operating the database, other reading  threads  and
       writing	threads are blocked.  However, while a reading thread is oper‐
       ating the database, reading threads are not blocked.  The locking gran‐
       ularity	of  hash database and fixed-length database is per record, and
       that of the other databases is per file.

SIMPLE BUT VARIOUS INTERFACES
       Tokyo Cabinet provides simple API based on the object oriented  design.
       Every  operation	 for  database	is encapsulated and published as lucid
       methods as `open'  (connect),  `close'  (disconnect),  `put'  (insert),
       `out'  (remove),	 `get'	(retrieve),  and  so on.  Because the three of
       hash, B+ tree, and fixed-length array database APIs  are	 very  similar
       with  each other, porting an application from one to the other is easy.
       Moreover, the abstract API is provided to handle these  databases  with
       the same interface.  Applications of the abstract API can determine the
       type of the database in runtime.

       The utility API is also provided.  Such fundamental data	 structure  as
       list  and  map  are  included.  And, some useful features; memory pool,
       string processing, encoding, are also included.

       Six kinds of API; the utility API, the hash database API, the  B+  tree
       database	 API,  the  fixed-length database API, the table database API,
       and the abstract database API, are provided for the C  language.	  Com‐
       mand line interfaces are also provided corresponding to each API.  They
       are useful for prototyping, test, and debugging.	 Except for  C,	 Tokyo
       Cabinet	provides  APIs	for Perl, Ruby, Java, and Lua.	APIs for other
       languages will hopefully be provided by third party.

       In cases that multiple processes access a database at the same time  or
       some  processes	access a database on a remote host, the remote service
       is useful.  The remote service is composed of a database server and its
       access  library.	  Applications can access the database server by using
       the remote database API.	 The server implements HTTP and the  memcached
       protocol	 partly	 so  that  client programs on almost all platforms can
       access the server easily.

HOW TO USE THE LIBRARY
       Tokyo Cabinet provides API of the C language and	 it  is	 available  by
       programs	 conforming  to the C89 (ANSI C) standard or the C99 standard.
       As the header files  of	Tokyo  Cabinet	are  provided  as  `tcutil.h',
       `tchdb.h',  and	`tcbdb.h',  applications should include one or more of
       them accordingly to use	the  API.   As	the  library  is  provided  as
       `libtokyocabinet.a'   and   `libtokyocabinet.so'	  and	they   depends
       `libz.so',  `librt.so',	`libpthread.so',  `libm.so',  and   `libc.so',
       linker  options	`-ltokyocabinet', `-lz', `-lbz2', `-lrt', `-lpthread',
       `-lm', and `-lc' are required for build command.	 A typical build  com‐
       mand is the following.

	      gcc -I/usr/local/include tc_example.c -o tc_example \
		-L/usr/local/lib  -ltokyocabinet  -lz -lbz2 -lrt -lpthread -lm
	      -lc

       You can also use Tokyo Cabinet in programs  written  in	C++.   Because
       each  header is wrapped in C linkage (`extern "C"' block), you can sim‐
       ply include them into your C++ programs.

LICENSE
       Tokyo Cabinet is free software; you can redistribute it	and/or	modify
       it  under  the  terms  of the GNU Lesser General Public License as pub‐
       lished by the Free Software  Foundation;	 either	 version  2.1  of  the
       License or any later version.

       Tokyo  Cabinet  is  distributed in the hope that it will be useful, but
       WITHOUT ANY  WARRANTY;  without	even  the  implied  warranty  of  MER‐
       CHANTABILITY  or	 FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
       General Public License for more details.

       You should have received a  copy	 of  the  GNU  Lesser  General	Public
       License	along  with  Tokyo  Cabinet  (See the file `COPYING'); if not,
       write to the Free Software Foundation, Inc.,  59	 Temple	 Place,	 Suite
       330, Boston, MA 02111-1307 USA.

       Tokyo  Cabinet  was written by FAL Labs.	 You can contact the author by
       e-mail to `info@fallabs.com'.

SEE ALSO
       tcutil(3), tchdb(3), tcbdb(3), tcfdb(3), tctdb(3), tcadb(3)

       Please see http://1978th.net/tokyocabinet/ for detail.

Man Page			  2012-08-18		       TOKYOCABINET(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net