urlgrabber man page on Scientific

Man page or keyword search:  
man Server   26626 pages
apropos Keyword Search (all sections)
Output format
Scientific logo
[printable version]

URLGRABBER(1)							 URLGRABBER(1)

NAME
       urlgrabber - a high-level cross-protocol url-grabber.

SYNOPSIS
       urlgrabber [OPTIONS] URL [FILE]

DESCRIPTION
       urlgrabber is a binary program and python module for fetching files. It
       is designed to be used in programs that need common (but not
       necessarily simple) url-fetching features.

OPTIONS
       --help, -h
	   help page specifying available options to the binary program.

       --copy-local
	   ignored except for file:// urls, in which case it specifies whether
	   urlgrab should still make a copy of the file, or simply point to
	   the existing copy.

       --throttle=NUMBER
	   if it's an int, it's the bytes/second throttle limit. If it's a
	   float, it is first multiplied by bandwidth. If throttle == 0,
	   throttling is disabled. If None, the module-level default (which
	   can be set with set_throttle) is used.

       --bandwidth=NUMBER
	   the nominal max bandwidth in bytes/second. If throttle is a float
	   and bandwidth == 0, throttling is disabled. If None, the
	   module-level default (which can be set with set_bandwidth) is used.

       --range=RANGE
	   a tuple of the form first_byte,last_byte describing a byte range to
	   retrieve. Either or both of the values may be specified. If
	   first_byte is None, byte offset 0 is assumed. If last_byte is None,
	   the last byte available is assumed. Note that both first and
	   last_byte values are inclusive so a range of (10,11) would return
	   the 10th and 11th bytes of the resource.

       --user-agent=STR
	   the user-agent string provide if the url is HTTP.

       --retry=NUMBER
	   the number of times to retry the grab before bailing. If this is
	   zero, it will retry forever. This was intentional... really, it was
	   :). If this value is not supplied or is supplied but is None
	   retrying does not occur.

       --retrycodes
	   a sequence of errorcodes (values of e.errno) for which it should
	   retry. See the doc on URLGrabError for more details on this.
	   retrycodes defaults to -1,2,4,5,6,7 if not specified explicitly.

MODULE USE EXAMPLES
       In its simplest form, urlgrabber can be a replacement for urllib2's
       open, or even python's file if you're just reading:

	     from urlgrabber import urlopen
	     fo = urlopen(url)
	     data = fo.read()
	     fo.close()

       Here, the url can be http, https, ftp, or file. It's also pretty smart
       so if you just give it something like /tmp/foo, it will figure it out.
       For even more fun, you can also do:

	     from urlgrabber import urlopen
	     local_filename = urlgrab(url)  # grab a local copy of the file
	     data = urlread(url)	    # just read the data into a string

       Now, like urllib2, what's really happening here is that you're using a
       module-level object (called a grabber) that kind of serves as a
       default. That's just fine, but you might want to get your own private
       version for a couple of reasons:

	   * it's a little ugly to modify the default grabber because you have to
	     reach into the module to do it
	   * you could run into conflicts if different parts of the code
	     modify the default grabber and therefore expect different
	     behavior

       Therefore, you're probably better off making your own. This also gives
       you lots of flexibility for later, as you'll see:

	     from urlgrabber.grabber import URLGrabber
	     g = URLGrabber()
	     data = g.urlread(url)

       This is nice because you can specify options when you create the
       grabber. For example, let's turn on simple reget mode so that if we
       have part of a file, we only need to fetch the rest:

	     from urlgrabber.grabber import URLGrabber
	     g = URLGrabber(reget='simple')
	     local_filename = g.urlgrab(url)

       The available options are listed in the module documentation, and can
       usually be specified as a default at the grabber-level or as options to
       the method:

	   from urlgrabber.grabber import URLGrabber
	   g = URLGrabber(reget='simple')
	   local_filename = g.urlgrab(url, filename=None, reget=None)

AUTHORS
       Written by: Michael D. Stenner <mstenner@linux.duke.edu> Ryan Tomayko
       <rtomayko@naeblis.cx>

       This manual page was written by Kevin Coyner <kevin@rustybear.com> for
       the Debian system (but may be used by others). It borrows heavily on
       the documentation included in the urlgrabber module. Permission is
       granted to copy, distribute and/or modify this document under the terms
       of the GNU General Public License, Version 2 any later version
       published by the Free Software Foundation.

RESOURCES
       Main web site: http://linux.duke.edu/projects/urlgrabber/

				  04/09/2007			 URLGRABBER(1)
[top]

List of man pages available for Scientific

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net