linkchecker man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

LINKCHECKER(1)		 LinkChecker commandline usage		LINKCHECKER(1)

NAME
       linkchecker  - command line client to check HTML documents and websites
       for broken links

SYNOPSIS
       linkchecker [options] [file-or-url]...

DESCRIPTION
       LinkChecker features

       ·      recursive and multithreaded checking,

       ·      output in colored or normal text,	 HTML,	SQL,  CSV,  XML	 or  a
	      sitemap graph in different formats,

       ·      support  for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet
	      and local file links,

       ·      restriction of link checking with URL filters,

       ·      proxy support,

       ·      username/password authorization for HTTP, FTP and Telnet,

       ·      support for robots.txt exclusion protocol,

       ·      support for Cookies

       ·      support for HTML5

       ·      HTML and CSS syntax check

       ·      Antivirus check

       ·      a command line, GUI and web interface

EXAMPLES
       The most common use checks the given domain recursively:
	 linkchecker http://www.example.com/
       Beware that this checks the whole site  which  can  have	 thousands  of
       URLs.  Use the -r option to restrict the recursion depth.
       Don't  check URLs with /secret in its name. All other links are checked
       as usual:
	 linkchecker --ignore-url=/secret mysite.example.com
       Checking a local HTML file on Unix:
	 linkchecker ../bla.html
       Checking a local HTML file on Windows:
	 linkchecker c:\temp\test.html
       You can skip the http:// url part if the domain starts with www.:
	 linkchecker www.example.com
       You can skip the ftp:// url part if the domain starts with ftp.:
	 linkchecker -r0 ftp.example.com
       Generate a sitemap graph and convert it with the graphviz dot utility:
	 linkchecker -odot -v www.example.com | dot -Tps > sitemap.ps

OPTIONS
   General options
       -fFILENAME, --config=FILENAME
	      Use FILENAME as configuration file. As default LinkChecker  uses
	      ~/.linkchecker/linkcheckerrc.

       -h, --help
	      Help me! Print usage information for this program.

       --stdin
	      Read list of white-space separated URLs to check from stdin.

       -tNUMBER, --threads=NUMBER
	      Generate	no more than the given number of threads. Default num‐
	      ber of threads is 100. To disable threading specify a  non-posi‐
	      tive number.

       -V, --version
	      Print version and exit.

       --list-plugins
	      Print available check plugins and exit.

   Output options
       -DSTRING, --debug=STRING
	      Print  debugging output for the given logger.  Available loggers
	      are cmdline, checking, cache, gui, dns and all.  Specifying  all
	      is  an  alias  for specifying all available loggers.  The option
	      can be given multiple times to debug with more than one  logger.
		For  accurate results, threading will be disabled during debug
	      runs.

       -FTYPE[/ENCODING][/FILENAME], --file-output=TYPE[/ENCODING][/FILENAME]
	      Output to a file linkchecker-out.TYPE, $HOME/.linkchecker/black‐
	      list for blacklist output, or FILENAME if specified.  The ENCOD‐
	      ING specifies the output encoding, the default is that  of  your
	      locale.	     Valid	encodings      are	listed	    at
	      http://docs.python.org/library/codecs.html#standard-encodings.
	      The FILENAME and ENCODING parts of the none output type will  be
	      ignored,	else  if the file already exists, it will be overwrit‐
	      ten.  You can specify this option more  than  once.  Valid  file
	      output  types  are text, html, sql, csv, gml, dot, xml, sitemap,
	      none or blacklist.  Default is no file output. The various  out‐
	      put  types  are documented below. Note that you can suppress all
	      console output with the option -o none.

       --no-status
	      Do not print check status messages.

       --no-warnings
	      Don't log warnings. Default is to log warnings.

       -oTYPE[/ENCODING], --output=TYPE[/ENCODING]
	      Specify output type as text, html,  sql,	csv,  gml,  dot,  xml,
	      sitemap,	none  or blacklist.  Default type is text. The various
	      output types are documented below.
	      The ENCODING specifies the output encoding, the default is  that
	      of    your    locale.    Valid	encodings    are   listed   at
	      http://docs.python.org/library/codecs.html#standard-encodings.

       -q, --quiet
	      Quiet operation, an alias for -o none.  This is only useful with
	      -F.

       -v, --verbose
	      Log  all	checked	 URLs. Default is to log only errors and warn‐
	      ings.

       -WREGEX, --warning-regex=REGEX
	      Define a regular expression which prints a warning if it matches
	      any  content  of	the  checked link.  This applies only to valid
	      pages, so we can get their content.
	      Use this to check for pages that contain some form of error, for
	      example "This page has moved" or "Oracle Application error".
	      Note that multiple values can be combined in the regular expres‐
	      sion, for	 example  "(This  page	has  moved|Oracle  Application
	      error)".
	      See section REGULAR EXPRESSIONS for more info.

   Checking options
       --cookiefile=FILENAME
	      Read  a file with initial cookie data. The cookie data format is
	      explained below.

       --check-extern
	      Check external URLs.

       --ignore-url=REGEX
	      URLs matching the given regular expression will be  ignored  and
	      not checked.
	      This option can be given multiple times.
	      See section REGULAR EXPRESSIONS for more info.

       -NSTRING, --nntp-server=STRING
	      Specify  an NNTP server for news: links. Default is the environ‐
	      ment variable NNTP_SERVER. If no host is given, only the	syntax
	      of the link is checked.

       --no-follow-url=REGEX
	      Check  but  do  not recurse into URLs matching the given regular
	      expression.
	      This option can be given multiple times.
	      See section REGULAR EXPRESSIONS for more info.

       -p, --password
	      Read a password from console and use it for HTTP and FTP	autho‐
	      rization.	  For FTP the default password is anonymous@. For HTTP
	      there is no default password. See also -u.

       -rNUMBER, --recursion-level=NUMBER
	      Check recursively all links up to given depth.  A negative depth
	      will enable infinite recursion.  Default depth is infinite.

       --timeout=NUMBER
	      Set  the timeout for connection attempts in seconds. The default
	      timeout is 60 seconds.

       -uSTRING, --user=STRING
	      Try the given username for HTTP and FTP authorization.  For  FTP
	      the  default username is anonymous. For HTTP there is no default
	      username. See also -p.

       --user-agent=STRING
	      Specify the User-Agent string to send to the  HTTP  server,  for
	      example  "Mozilla/4.0".  The  default is "LinkChecker/X.Y" where
	      X.Y is the current version of LinkChecker.

CONFIGURATION FILES
       Configuration files can specify all options above. They can also	 spec‐
       ify  some  options  that	 cannot	 be  set  on  the  command  line.  See
       linkcheckerrc(5) for more info.

OUTPUT TYPES
       Note that by default only errors and warnings are logged.   You	should
       use  the --verbose option to get the complete URL list, especially when
       outputting a sitemap graph format.

       text   Standard text logger, logging URLs in keyword: argument fashion.

       html   Log URLs in keyword: argument fashion, formatted as HTML.	 Addi‐
	      tionally	has  links  to the referenced pages. Invalid URLs have
	      HTML and CSS syntax check links appended.

       csv    Log check result in CSV format with one URL per line.

       gml    Log parent-child relations between linked URLs as a GML  sitemap
	      graph.

       dot    Log  parent-child relations between linked URLs as a DOT sitemap
	      graph.

       gxml   Log check result as a GraphXML sitemap graph.

       xml    Log check result as machine-readable XML.

       sitemap
	      Log check result as an XML sitemap whose protocol is  documented
	      at http://www.sitemaps.org/protocol.html.

       sql    Log  check result as SQL script with INSERT commands. An example
	      script to create the initial  SQL	 table	is  included  as  cre‐
	      ate.sql.

       blacklist
	      Suitable	for  cron  jobs.  Logs	the  check  result into a file
	      ~/.linkchecker/blacklist	which  only  contains	entries	  with
	      invalid URLs and the number of times they have failed.

       none   Logs nothing. Suitable for debugging or checking the exit code.

REGULAR EXPRESSIONS
       LinkChecker     accepts	   Python     regular	  expressions.	   See
       http://docs.python.org/howto/regex.html for an introduction.

       An addition is that a leading  exclamation  mark	 negates  the  regular
       expression.

COOKIE FILES
       A  cookie  file	contains standard HTTP header (RFC 2616) data with the
       following possible names:

       Host (required)
	      Sets the domain the cookies are valid for.

       Path (optional)
	      Gives the path the cookies are value for; default path is /.

       Set-cookie (required)
	      Set cookie name/value. Can be given more than once.

       Multiple entries are separated by a blank line.	The example below will
       send  two  cookies  to all URLs starting with http://example.com/hello/
       and one to all URLs starting with https://example.org/:

	Host: example.com
	Path: /hello
	Set-cookie: ID="smee"
	Set-cookie: spam="egg"

	Host: example.org
	Set-cookie: baggage="elitist"; comment="hologram"

PROXY SUPPORT
       To use a proxy on Unix or Windows set the $http_proxy, $https_proxy  or
       $ftp_proxy environment variables to the proxy URL. The URL should be of
       the form http://[user:pass@]host[:port].	 LinkChecker also detects man‐
       ual  proxy  settings  of	 Internet  Explorer under Windows systems, and
       gconf or KDE on Linux systems.  On a Mac use  the  Internet  Config  to
       select  a proxy.	 You can also set a comma-separated domain list in the
       $no_proxy environment variables to ignore any proxy settings for	 these
       domains.	 Setting a HTTP proxy on Unix for example looks like this:

	 export http_proxy="http://proxy.example.com:8080"

       Proxy authentication is also supported:

	 export http_proxy="http://user1:mypass@proxy.example.org:8081"

       Setting a proxy on the Windows command prompt:

	 set http_proxy=http://proxy.example.com:8080

PERFORMED CHECKS
       All URLs have to pass a preliminary syntax test. Minor quoting mistakes
       will issue a warning, all  other	 invalid  syntax  issues  are  errors.
       After  the syntax check passes, the URL is queued for connection check‐
       ing. All connection check types are described below.

       HTTP links (http:, https:)
	      After connecting to the given HTTP  server  the  given  path  or
	      query  is	 requested.  All  redirections	are  followed,	and if
	      user/password is given it will be	 used  as  authorization  when
	      necessary.   All	final  HTTP  status  codes  other than 2xx are
	      errors.  HTML page contents are checked for recursion.

       Local files (file:)
	      A regular, readable file that can be opened is valid. A readable
	      directory	 is  also  valid.  All other files, for example device
	      files, unreadable or non-existing files  are  errors.   HTML  or
	      other parseable file contents are checked for recursion.

       Mail links (mailto:)
	      A mailto: link eventually resolves to a list of email addresses.
	      If one address fails, the whole list will fail.  For  each  mail
	      address we check the following things:
		1) Check the adress syntax, both of the part before and after
		   the @ sign.
		2) Look up the MX DNS records. If we found no MX record,
		   print an error.
		3) Check if one of the mail hosts accept an SMTP connection.
		   Check hosts with higher priority first.
		   If no host accepts SMTP, we print a warning.
		4) Try to verify the address with the VRFY command. If we got
		   an answer, print the verified address as an info.

       FTP links (ftp:)

		For FTP links we do:

		1) connect to the specified host
		2) try to login with the given user and password. The default
		   user	 is  ``anonymous``,  the  default password is ``anony‐
	      mous@``.
		3) try to change to the given directory
		4) list the file with the NLST command

       Telnet links (``telnet:``)

		We try to connect and if user/password are given, login to the
		given telnet server.

       NNTP links (``news:``, ``snews:``, ``nntp``)

		We try to connect to the given NNTP server. If a news group or
		article is specified, try to request it from the server.

       Unsupported links (``javascript:``, etc.)

		An unsupported link will only  print  a	 warning.  No  further
	      checking
		will be made.

		The  complete list of recognized, but unsupported links can be
	      found
		in the linkcheck/checker/unknownurl.py source file.
		The most prominent of them should be JavaScript links.

PLUGINS
       There are two plugin types: connection and content plugins.  Connection
       plugins are run after a successful connection to the URL host.  Content
       plugins are run if the URL type has content (mailto: URLs have no  con‐
       tent  for  example)  and if the check is not forbidden (ie. by HTTP ro‐
       bots.txt).  See linkchecker --list-plugins for a list  of  plugins  and
       their  documentation.  All plugins are enabled via the linkcheckerrc(5)
       configuration file.

RECURSION
       Before descending recursively into a URL, it  has  to  fulfill  several
       conditions. They are checked in this order:

       1. A URL must be valid.

       2. A URL must be parseable. This currently includes HTML files,
	  Opera bookmarks files, and directories. If a file type cannot
	  be determined (for example it does not have a common HTML file
	  extension, and the content does not look like HTML), it is assumed
	  to be non-parseable.

       3. The URL content must be retrievable. This is usually the case
	  except for example mailto: or unknown URL types.

       4. The maximum recursion level must not be exceeded. It is configured
	  with the --recursion-level option and is unlimited per default.

       5. It must not match the ignored URL list. This is controlled with
	  the --ignore-url option.

       6. The Robots Exclusion Protocol must allow links in the URL to be
	  followed recursively. This is checked by searching for a
	  "nofollow" directive in the HTML header data.

       Note  that  the	directory recursion reads all files in that directory,
       not just a subset like index.htm*.

NOTES
       URLs on the commandline starting with ftp. are treated like ftp://ftp.,
       URLs  starting  with  www.  are treated like http://www..  You can also
       give local files as arguments.

       If you have your system configured to automatically establish a connec‐
       tion  to	 the internet (e.g. with diald), it will connect when checking
       links not pointing to your local host.  Use the --ignore-url option  to
       prevent this.

       Javascript links are not supported.

       If  your	 platform  does not support threading, LinkChecker disables it
       automatically.

       You can supply multiple user/password pairs in a configuration file.

       When checking news: links the given NNTP host doesn't need  to  be  the
       same as the host of the user browsing your pages.

ENVIRONMENT
       NNTP_SERVER - specifies default NNTP server
       http_proxy - specifies default HTTP proxy server
       ftp_proxy - specifies default FTP proxy server
       no_proxy	 - comma-separated list of domains to not contact over a proxy
       server
       LC_MESSAGES, LANG, LANGUAGE - specify output language

RETURN VALUE
       The return value is 2 when

       ·      a program error occurred.

       The return value is 1 when

       ·      invalid links were found or

       ·      link warnings were found and warnings are enabled

       Else the return value is zero.

LIMITATIONS
       LinkChecker consumes memory for each queued URL to  check.  With	 thou‐
       sands  of  queued  URLs	the amount of consumed memory can become quite
       large. This might slow down the program or even the whole system.

FILES
       ~/.linkchecker/linkcheckerrc - default configuration file
       ~/.linkchecker/blacklist - default blacklist logger output filename
       linkchecker-out.TYPE - default logger file output name
       http://docs.python.org/library/codecs.html#standard-encodings  -	 valid
       output encodings
       http://docs.python.org/howto/regex.html - regular expression documenta‐
       tion

SEE ALSO
       linkcheckerrc(5)

AUTHOR
       Bastian Kleineidam <bastian.kleineidam@web.de>

COPYRIGHT
       Copyright © 2000-2014 Bastian Kleineidam

LinkChecker			  2010-07-01			LINKCHECKER(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net