resperf man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

RESPERF(1)			    Nominum			    RESPERF(1)

NAME
       resperf - test the resolution performance of a caching DNS server

SYNOPSIS
       resperf-report [-a local_addr] [-d datafile] [-s server_addr] [-p port]
       [-x local_port] [-t timeout] [-b bufsize] [-f family] [-e] [-D]
       [-y [alg:]name:secret] [-h] [-i interval] [-m max_qps] [-r rampup_time]
       [-L max_loss]

       resperf [-a local_addr] [-d datafile] [-s server_addr] [-p port]
       [-x local_port] [-t timeout] [-b bufsize] [-f family] [-e] [-D]
       [-y name:secret] [-h] [-i interval] [-m max_qps] [-P plot_data_file]
       [-r rampup_time] [-L max_loss]

DESCRIPTION
       resperf	is a companion tool to dnsperf. dnsperf was primarily designed
       for benchmarking authoritative servers, and it does not work well  with
       caching	servers	 that are talking to the live Internet. One reason for
       this is that dnsperf uses a "self-pacing" approach, which is  based  on
       the assumption that you can keep the server 100% busy simply by sending
       it a small burst of back-to-back queries to fill	 up  network  buffers,
       and  then  send	a new query whenever you get a response back. This ap‐
       proach works well for authoritative servers that process queries in or‐
       der  and	 one at a time; it also works pretty well for a caching server
       in a closed laboratory environment  talking  to	a  simulated  Internet
       that's all on the same LAN. Unfortunately, it does not work well with a
       caching server talking to the actual Internet, which may need  to  work
       on  thousands of queries in parallel to achieve its maximum throughput.
       There have been numerous attempts to use dnsperf (or  its  predecessor,
       queryperf) for benchmarking live caching servers, usually with poor re‐
       sults. Therefore, a separate tool  designed  specifically  for  caching
       servers is needed.

   How resperf works
       Unlike  the "self-pacing" approach of dnsperf, resperf works by sending
       DNS queries at a controlled, steadily increasing rate. By default, res‐
       perf  will  send traffic for 60 seconds, linearly increasing the amount
       of traffic from zero to 100,000 queries per second.

       During the test, resperf listens for  responses	from  the  server  and
       keeps  track  of	 response rates, failure rates, and latencies. It will
       also continue listening for responses for an additional 40 seconds  af‐
       ter it has stopped sending traffic, so that there is time for the serv‐
       er to respond to the last queries sent. This time period was chosen  to
       be  longer  than the overall query timeout of both Nominum CNS and cur‐
       rent versions of BIND.

       If the test is successful, the query rate will at some point exceed the
       capacity	 of  the  server  and queries will be dropped, causing the re‐
       sponse rate to stop growing or even decrease as the query rate increas‐
       es.

       The  result of the test is a set of measurements of the query rate, re‐
       sponse rate, failure response rate, and average query latency as	 func‐
       tions of time.

   What you will need
       Benchmarking  a live caching server is serious business. A fast caching
       server like Nominum CNS running on an Opteron server, resolving	a  mix
       of cacheable and non-cacheable queries typical of ISP customer traffic,
       is capable of resolving more than 50,000 queries	 per  second.  In  the
       process, it will send more than 20,000 queries per second to authorita‐
       tive servers on the Internet, and receive responses to  most  of	 them.
       Assuming an average request size of 50 bytes and a response size of 100
       bytes, this amounts to some 8 Mbps of outgoing and 16 Mbps of  incoming
       traffic.	 If  your  Internet connection can't handle the bandwidth, you
       will end up measuring the speed of the connection, not the server,  and
       may  saturate the connection causing a degradation in service for other
       users.

       Make sure there is no stateful firewall between the server and the  In‐
       ternet, because most of them can't handle the amount of UDP traffic the
       test will generate and will end up dropping packets, skewing  the  test
       results. Some will even lock up or crash.

       You  should  run	 resperf  on  a machine separate from the server under
       test, on the same LAN. Preferably, this should be  a  Gigabit  Ethernet
       network.	 The machine running resperf should be at least as fast as the
       machine being tested; otherwise, it may end up being the bottleneck.

       There should be no other applications running on	 the  machine  running
       resperf.	 Performance  testing at the traffic levels involved is essen‐
       tially a hard real-time application - consider the fact that at a query
       rate  of	 100,000  queries  per second, if resperf gets delayed by just
       1/100 of a second, 1000 incoming UDP packets will arrive in  the	 mean‐
       time. This is more than most operating systems will buffer, which means
       packets will be dropped.

       Because the granularity of the timers provided by operating systems  is
       typically  too  coarse  to  accurately schedule packet transmissions at
       sub-millisecond intervals, resperf will busy-wait between packet trans‐
       missions,  constantly polling for responses in the meantime. Therefore,
       it is normal for resperf to consume 100% CPU during the whole test run,
       even during periods where query rates are relatively low.

       You  will  also	need a set of test queries in the dnsperf file format.
       See the dnsperf man page for instructions  on  how  to  construct  this
       query  file.  To	 make  the  test as realistic as possible, the queries
       should be derived from recorded production client DNS traffic,  without
       removing	 duplicate  queries  or other filtering. With the default set‐
       tings, resperf will use up to 3 million queries in each test run.

       If the caching server to be tested has a configurable limit on the num‐
       ber  of simultaneous resolutions, like the max-recursive-clients state‐
       ment in Nominum CNS or the recursive-clients option in BIND 9, you will
       probably have to increase it. As a starting point, we recommend a value
       of 10000 for Nominum CNS and 100000 for BIND 9.	Should	the  limit  be
       reached,	 it  will show up in the plots as an increase in the number of
       failure responses.

       The server being tested should be restarted at the  beginning  of  each
       test  to make sure it is starting with an empty cache. If the cache al‐
       ready contains data from a previous test run that used the same set  of
       queries,	 almost	 all queries will be answered from the cache, yielding
       inflated performance numbers.

       To use the resperf-report script, you need to have  gnuplot  installed.
       Make  sure  your installed version of gnuplot supports the png terminal
       driver. If your gnuplot doesn't support png but does support  gif,  you
       can change the line saying terminal=png in the resperf-report script to
       terminal=gif.

   Running the test
       Resperf is typically invoked via the resperf-report script, which  will
       run resperf with its output redirected to a file and then automatically
       generate an illustrated report in HTML format. Command  line  arguments
       given to resperf-report will be passed on unchanged to resperf.

       When  running  resperf-report,  you  will  need to specify at least the
       server IP address and the query data file. A  typical  invocation  will
       look like

	      resperf-report -s 10.0.0.2 -d queryfile

       With  default  settings, the test run will take at most 100 seconds (60
       seconds of ramping up traffic and then 40 seconds of  waiting  for  re‐
       sponses),  but in practice, the 60-second traffic phase will usually be
       cut short. To be precise, resperf can transition from the traffic-send‐
       ing phase to the waiting-for-responses phase in three different ways:

       · Running for the full allotted time and successfully reaching the max‐
	 imum query rate (by default, 60 seconds and 100,000 qps,  respective‐
	 ly).  Since  this  is a very high query rate, this will rarely happen
	 (with today's hardware); one of the other two conditions listed below
	 will usually occur first.

       · Exceeding  65,536 outstanding queries. This often happens as a result
	 of (successfully) exceeding the capacity of the server being  tested,
	 causing the excess queries to be dropped. The limit of 65,536 queries
	 comes from the number of possible values for the ID field in the  DNS
	 packet.  Resperf  needs  to allocate a unique ID for each outstanding
	 query, and is therefore unable to send further queries if the set  of
	 possible IDs is exhausted.

       · When resperf finds itself unable to send queries fast enough. Resperf
	 will notice if it is falling behind in its scheduled query  transmis‐
	 sions, and if this backlog reaches 1000 queries, it will print a mes‐
	 sage like "Fell behind by 1000 queries" (or whatever the actual  num‐
	 ber is at the time) and stop sending traffic.

       Regardless  of which of the above conditions caused the traffic-sending
       phase of the test to end, you should examine  the  resulting  plots  to
       make  sure  the server's response rate is flattening out toward the end
       of the test. If it is not, then you are not loading the server  enough.
       If  you	are  getting the "Fell behind" message, make sure that the ma‐
       chine running resperf is fast enough and has no other applications run‐
       ning.

       You  should  also  monitor  the	CPU usage of the server under test. It
       should reach close to 100% CPU at the point of maximum traffic;	if  it
       does  not, you most likely have a bottleneck in some other part of your
       test setup, for example, your external Internet connection.

       The report generated by resperf-report will be  stored  with  a	unique
       file name based on the current date and time, e.g., 20060812-1550.html.
       The PNG images of the plots and other auxiliary files will be stored in
       separate	 files	beginning  with the same date-time string. To view the
       report, simply open the .html file in a web browser.

       If you need to copy the report to a separate machine for viewing,  make
       sure  to	 copy the .png files along with the .html file (or simply copy
       all the files, e.g., using scp 20060812-1550.* host:directory/).

   Interpreting the report
       The .html file produced by resperf-report consists of two sections. The
       first  section, "Resperf output", contains output from the resperf pro‐
       gram such as progress messages, a summary of  the  command  line	 argu‐
       ments,  and  summary  statistics. The second section, "Plots", contains
       two plots generated by gnuplot: "Query/response/failure rate" and  "La‐
       tency".

       The  "Query/response/failure  rate"  plot  contains  three  graphs. The
       "Queries sent per second" graph shows the amount of traffic being  sent
       to  the	server; this should be very close to a straight diagonal line,
       reflecting the linear ramp-up of traffic.

       The "Total responses received per second" graph shows how many  of  the
       queries received a response from the server. All responses are counted,
       whether successful (NOERROR or NXDOMAIN) or not (e.g., SERVFAIL).

       The "Failure responses received per second" graph shows how many of the
       queries	received  a failure response. A response is considered to be a
       failure if its RCODE is neither NOERROR nor NXDOMAIN.

       By visually inspecting the graphs, you can get an idea of how the serv‐
       er  behaves  under  increasing  load. The "Total responses received per
       second" graph will initially closely follow the "Queries sent per  sec‐
       ond"  graph (often rendering it invisible in the plot as the two graphs
       are plotted on top of one another), but when the load exceeds the serv‐
       er's  capacity, the "Total responses received per second" graph may di‐
       verge from the "Queries sent per second" graph and flatten  out,	 indi‐
       cating that some of the queries are being dropped.

       The  "Failure responses received per second" graph will normally show a
       roughly linear ramp close to the bottom of the plot  with  some	random
       fluctuation,  since  typical query traffic will contain some small per‐
       centage of failing queries randomly interspersed	 with  the  successful
       ones.  As  the total traffic increases, the number of failures will in‐
       crease proportionally.

       If the "Failure responses received per second" graph turns sharply  up‐
       wards,  this  can  be another indication that the load has exceeded the
       server's capacity. This will happen if the server reacts to overload by
       sending	SERVFAIL  responses  rather  than  by  dropping queries. Since
       Nominum CNS and BIND 9 will both respond with SERVFAIL when they exceed
       their max-recursive-clients or recursive-clients limit, respectively, a
       sudden increase in the number of failures could	mean  that  the	 limit
       needs to be increased.

       The  "Latency"  plot  contains a single graph marked "Average latency".
       This shows how the latency varies during the course of the test.	 Typi‐
       cally,  the  latency  graph  will exhibit a downwards trend because the
       cache hit rate improves as ever more responses are  cached  during  the
       test,  and the latency for a cache hit is much smaller than for a cache
       miss. The latency graph is provided as an aid in determining the	 point
       where  the server gets overloaded, which can be seen as a sharp upwards
       turn in the graph. The latency graph is not intended for	 making	 abso‐
       lute latency measurements or comparisons between servers; the latencies
       shown in the graph are not representative of production	latencies  due
       to  the	initially  empty  cache	 and the deliberate overloading of the
       server towards the end of the test.

       Note that all measurements are displayed on the plot at the  horizontal
       position	 corresponding	to  the point in time when the query was sent,
       not when the response (if any) was received. This makes it it  easy  to
       compare	the  query  and response rates; for example, if no queries are
       dropped, the query and response graphs will be  identical.  As  another
       example,	 if  the plot shows 10% failure responses at t=5 seconds, this
       means that 10% of the queries sent at t=5  seconds  eventually  failed,
       not that 10% of the responses received at t=5 seconds were failures.

   Determining the server's maximum throughput
       Often, the goal of running resperf is to determine the server's maximum
       throughput, in other words, the number of queries per second it is  ca‐
       pable of handling. This is not always an easy task, because as a server
       is driven into overload, the service it provides may deteriorate gradu‐
       ally,  and this deterioration can manifest itself either as queries be‐
       ing dropped, as an increase in the number of SERVFAIL responses, or  an
       increase	 in  latency.	The  maximum  throughput may be defined as the
       highest level of traffic at which the server still provides an  accept‐
       able  level of service, but that means you first need to decide what an
       acceptable level of service means in terms of packet  drop  percentage,
       SERVFAIL percentage, and latency.

       The  summary  statistics	 in the "Resperf output" section of the report
       contains a "Maximum throughput" value which by  default	is  determined
       from the maximum rate at which the server was able to return responses,
       without regard to the number of queries being  dropped  or  failing  at
       that  point. This method of throughput measurement has the advantage of
       simplicity, but it may or may not be appropriate for  your  needs;  the
       reported value should always be validated by a visual inspection of the
       graphs to ensure that service has not already deteriorated unacceptably
       before  the maximum response rate is reached. It may also be helpful to
       look at the "Lost at that point" value in the summary statistics;  this
       indicates  the  percentage of the queries that was being dropped at the
       point in the test when the maximum throughput was reached.

       Alternatively, you can make resperf report the throughput at the	 point
       in  the	test  where  the percentage of queries dropped exceeds a given
       limit (or the maximum as above if the limit is  never  exceeded).  This
       can be a more realistic indication of how much the server can be loaded
       while still providing an acceptable level of service. This is done  us‐
       ing  the	 -L  command  line option; for example, specifying -L 10 makes
       resperf report the highest throughput reached before the server	starts
       dropping more than 10% of the queries.

       There  is  no  corresponding  way of automatically constraining results
       based on the number of failed queries, because unlike dropped  queries,
       resolution  failures  will  occur even when the the server is not over‐
       loaded, and the number of such failures is  heavily  dependent  on  the
       query data and network conditions. Therefore, the plots should be manu‐
       ally inspected to ensure that there is not an abnormal number of	 fail‐
       ures.

GENERATING CONSTANT TRAFFIC
       In  addition to ramping up traffic linearly, resperf also has the capa‐
       bility to send a constant stream of traffic. This can  be  useful  when
       using  resperf  for tasks other than performance measurement; for exam‐
       ple, it can be used to "soak test" a server by subjecting it to a  sus‐
       tained load for an extended period of time.

       To  generate  a	constant traffic load, use the -c command line option,
       together with the -m option which specifies the desired constant	 query
       rate. For example, to send 10000 queries per second for an hour, use -m
       10000 -c 3600. This will include the usual 30-second gradual ramp-up of
       traffic	at the beginning, which may be useful to avoid initially over‐
       whelming a server that is starting with an empty cache.	To  start  the
       onslaught of traffic instantly, use -m 10000 -c 3600 -r 0.

       To be precise, resperf will do a linear ramp-up of traffic from 0 to -m
       queries per second over a period of -r seconds, followed by  a  plateau
       of steady traffic at -m queries per second lasting for -c seconds, fol‐
       lowed by waiting for responses for an  extra  40	 seconds.  Either  the
       ramp-up or the plateau can be suppressed by supplying a duration of ze‐
       ro seconds with -r 0 and -c 0, respectively. The latter is the default.

       Sending traffic at high rates for hours on end will of  course  require
       very large amounts of input data. Also, a long-running test will gener‐
       ate a large amount of plot data, which is kept in memory for the	 dura‐
       tion  of the test.  To reduce the memory usage and the size of the plot
       file, consider increasing the interval between  measurements  from  the
       default of 0.5 seconds using the -i option in long-running tests.

       When  using  resperf  for  long-running tests, it is important that the
       traffic rate specified using the -m is one that both resperf itself and
       the  server under test can sustain. Otherwise, the test is likely to be
       cut short as a result of either running out of query  IDs  (because  of
       large  numbers  of  dropped  queries)  or of resperf falling behind its
       transmission schedule.

OPTIONS
       Because the resperf-report script passes its command line  options  di‐
       rectly  to  the	resperf programs, they both accept the same set of op‐
       tions, with one exception: resperf-report automatically adds an	appro‐
       priate  -P  to  the resperf command line, and therefore does not itself
       take a -P option.

       -d datafile
	      Specifies the input data file. If not  specified,	 resperf  will
	      read from standard input.

       -s server_addr
	      Specifies	 the  name  or address of the server to which requests
	      will be sent.  The default is the loopback address, 127.0.0.1.

       -p port
	      Sets the port on which the DNS packets are sent. If  not	speci‐
	      fied, the standard DNS port (53) is used.

       -a local_addr
	      Specifies the local address from which to send requests. The de‐
	      fault is the wildcard address.

       -x local_port
	      Specifies the local port from which to send  requests.  The  de‐
	      fault is the wildcard port (0).

       -t timeout
	      Specifies the request timeout value, in seconds. resperf will no
	      longer wait for a response to a particular  request  after  this
	      many seconds have elapsed. The default is 45 seconds.

	      resperf  times out unanswered requests in order to reclaim query
	      IDs so that the query ID space will not be exhausted in a	 long-
	      running  test,  such  as when "soak testing" a server for an day
	      with -m 10000 -c 86400.  The timeouts and the  ability  to  tune
	      them are of little use in the more typical use case of a perfor‐
	      mance test lasting only a minute or two.

	      The default timeout of 45 seconds was chosen to be  longer  than
	      the  query timeout of current caching servers. Note that this is
	      longer  than  the	 corresponding	default	 in  dnsperf,  because
	      caching  servers can take many orders of magnitude longer to an‐
	      swer a query than authoritative servers do.

	      If a short timeout is used, there is a possibility that  resperf
	      will  receive  a	response  after	 the corresponding request has
	      timed out; in this case, a message like Warning: Received a  re‐
	      sponse with an unexpected id: 141 will be printed.

       -b bufsize
	      Sets the size of the socket's send and receive buffers, in kilo‐
	      bytes. If not specified, the default value is 32k.

       -f family
	      Specifies the address family used for sending DNS	 packets.  The
	      possible values are "inet", "inet6", or "any". If "any" (the de‐
	      fault value) is specified, resperf will  use  whichever  address
	      family is appropriate for the server it is sending packets to.

       -e
	      Enables  EDNS0 [RFC2671], by adding an OPT record to all packets
	      sent.

       -D
	      Sets the DO (DNSSEC OK) bit [RFC3225] in all packets sent.  This
	      also enables EDNS0, which is required for DNSSEC.

       -y [alg:]name:secret
	      Add a TSIG record [RFC2845] to all packets sent, using the spec‐
	      ified TSIG key algorithm, name and secret, where	the  algorithm
	      defaults	to  hmac-md5  and the secret is expressed as a base-64
	      encoded string.

       -h
	      Print a usage statement and exit.

       -i interval
	      Specifies the time interval between  data	 points	 in  the  plot
	      file. The default is 0.5 seconds.

       -m max_qps
	      Specifies the target maximum query rate (in queries per second).
	      This should be higher than the expected  maximum	throughput  of
	      the server being tested.	Traffic will be ramped up at a linear‐
	      ly increasing rate until this value is reached, or until one  of
	      the other conditions described in the section "Running the test"
	      occurs. The default is 100000 queries per second.

       -P plot_data_file
	      Specifies the name of the plot data file. The  default  is  res‐
	      perf.gnuplot.

       -r rampup_time
	      Specifies	 the  length of time over which traffic will be ramped
	      up. The default is 60 seconds.

       -c constant_traffic_time
	      Specifies the length of time for which traffic will be sent at a
	      constant	rate  following	 the initial ramp-up. The default is 0
	      seconds, meaning no sending of traffic at a constant  rate  will
	      be done.

       -L max_loss
	      Specifies	 the maximum acceptable query loss percentage for pur‐
	      poses of determining the maximum throughput value.  The  default
	      is  100%, meaning that resperf will measure the maximum through‐
	      put without regard to query loss.

THE PLOT DATA FILE
       The plot data file is written by the resperf program and	 contains  the
       data  to	 be  plotted  using gnuplot. When running resperf via the res‐
       perf-report script, there is no need for the user  to  deal  with  this
       file directly, but its format and contents are documented here for com‐
       pleteness and in case you wish to run resperf directly and use its out‐
       put for purposes other than viewing it with gnuplot.

       The  first line of the file is a comment identifying the fields. It may
       be recognized as a comment by its leading hash sign (#).

       Subsequent lines contain the actual plot data. For purposes of generat‐
       ing  the plot data file, the test run is divided into time intervals of
       0.5 seconds (or some other length of time specified with the -i command
       line  option). Each line corresponds to one such interval, and contains
       the following values as floating-point numbers:

       Time
	      The midpoint of this time interval, in seconds since the	begin‐
	      ning of the run

       Target queries per second
	      The  number  of  queries per second scheduled to be sent in this
	      time interval

       Actual queries per second
	      The number of queries per second actually sent in this time  in‐
	      terval

       Responses per second
	      The  number  of responses received corresponding to queries sent
	      in this time interval, divided by the length of the interval

       Failures per second
	      The number of responses received corresponding to	 queries  sent
	      in  this time interval and having an RCODE other than NOERROR or
	      NXDOMAIN, divided by the length of the interval

       Average latency
	      The average time between sending the query and receiving	a  re‐
	      sponse, for queries sent in this time interval

AUTHOR
       Nominum, Inc.

SEE ALSO
       dnsperf(1)

Nominum				 Nov 22, 2011			    RESPERF(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net