|
|
|
|
|
|
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
|
|
"http://www.w3.org/TR/html4/loose.dtd"> |
|
|
<html> |
|
|
<head> |
|
|
<meta name="generator" content="groff -Thtml, see www.gnu.org"> |
|
|
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> |
|
|
<meta name="Content-Style" content="text/css"> |
|
|
<style type="text/css"> |
|
|
p { margin-top: 0; margin-bottom: 0; vertical-align: top } |
|
|
pre { margin-top: 0; margin-bottom: 0; vertical-align: top } |
|
|
table { margin-top: 0; margin-bottom: 0; vertical-align: top } |
|
|
h1 { text-align: center } |
|
|
</style> |
|
|
<title>WGET</title> |
|
|
|
|
|
</head> |
|
|
<body> |
|
|
|
|
|
<h1 align="center">WGET</h1> |
|
|
|
|
|
<a href="#NAME">NAME</a><br> |
|
|
<a href="#SYNOPSIS">SYNOPSIS</a><br> |
|
|
<a href="#DESCRIPTION">DESCRIPTION</a><br> |
|
|
<a href="#OPTIONS">OPTIONS</a><br> |
|
|
<a href="#ENVIRONMENT">ENVIRONMENT</a><br> |
|
|
<a href="#EXIT STATUS">EXIT STATUS</a><br> |
|
|
<a href="#FILES">FILES</a><br> |
|
|
<a href="#BUGS">BUGS</a><br> |
|
|
<a href="#SEE ALSO">SEE ALSO</a><br> |
|
|
<a href="#AUTHOR">AUTHOR</a><br> |
|
|
<a href="#COPYRIGHT">COPYRIGHT</a><br> |
|
|
|
|
|
<hr> |
|
|
|
|
|
|
|
|
<h2>NAME |
|
|
<a name="NAME"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget - |
|
|
The non-interactive network downloader.</p> |
|
|
|
|
|
<h2>SYNOPSIS |
|
|
<a name="SYNOPSIS"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">wget |
|
|
[<i>option</i>]... [ <i><small>URL</small></i> ]...</p> |
|
|
|
|
|
<h2>DESCRIPTION |
|
|
<a name="DESCRIPTION"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><small>GNU</small> |
|
|
Wget is a free utility for non-interactive download of files |
|
|
from the Web. It supports <small>HTTP, HTTPS,</small> and |
|
|
<small>FTP</small> protocols, as well as retrieval through |
|
|
<small>HTTP</small> proxies.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget is |
|
|
non-interactive, meaning that it can work in the background, |
|
|
while the user is not logged on. This allows you to start a |
|
|
retrieval and disconnect from the system, letting Wget |
|
|
finish the work. By contrast, most of the Web browsers |
|
|
require constant user’s presence, which can be a great |
|
|
hindrance when transferring a lot of data.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget can follow |
|
|
links in <small>HTML, XHTML,</small> and <small>CSS</small> |
|
|
pages, to create local versions of remote web sites, fully |
|
|
recreating the directory structure of the original site. |
|
|
This is sometimes referred to as "recursive |
|
|
downloading." While doing that, Wget respects the Robot |
|
|
Exclusion Standard (<i>/robots.txt</i>). Wget can be |
|
|
instructed to convert the links in downloaded files to point |
|
|
at the local files, for offline viewing.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget has been |
|
|
designed for robustness over slow or unstable network |
|
|
connections; if a download fails due to a network problem, |
|
|
it will keep retrying until the whole file has been |
|
|
retrieved. If the server supports regetting, it will |
|
|
instruct the server to continue the download from where it |
|
|
left off.</p> |
|
|
|
|
|
<h2>OPTIONS |
|
|
<a name="OPTIONS"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Option |
|
|
Syntax</b> <br> |
|
|
Since Wget uses <small>GNU</small> getopt to process |
|
|
command-line arguments, every option has a long form along |
|
|
with the short one. Long options are more convenient to |
|
|
remember, but take time to type. You may freely mix |
|
|
different option styles, or specify options after the |
|
|
command-line arguments. Thus you may write:</p> |
|
|
|
|
|
<pre style="margin-left:11%; margin-top: 1em"> wget -r --tries=10 http://fly.srk.fer.hr/ -o log</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">The space |
|
|
between the option accepting an argument and the argument |
|
|
may be omitted. Instead of <b>-o log</b> you can write |
|
|
<b>-olog</b>.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">You may put |
|
|
several options that do not require arguments together, |
|
|
like:</p> |
|
|
|
|
|
<pre style="margin-left:11%; margin-top: 1em"> wget -drc <URL></pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">This is |
|
|
completely equivalent to:</p> |
|
|
|
|
|
<pre style="margin-left:11%; margin-top: 1em"> wget -d -r -c <URL></pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Since the |
|
|
options can be specified after the arguments, you may |
|
|
terminate them with <b>--</b>. So the following |
|
|
will try to download <small>URL</small> <b>-x</b>, |
|
|
reporting failure to <i>log</i>:</p> |
|
|
|
|
|
<pre style="margin-left:11%; margin-top: 1em"> wget -o log -- -x</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">The options |
|
|
that accept comma-separated lists all respect the convention |
|
|
that specifying an empty list clears its value. This can be |
|
|
useful to clear the <i>.wgetrc</i> settings. For instance, |
|
|
if your <i>.wgetrc</i> sets |
|
|
<tt>"exclude_directories"</tt> to |
|
|
<i>/cgi-bin</i>, the following example will first |
|
|
reset it, and then set it to exclude <i>/~nobody</i> and |
|
|
<i>/~somebody</i>. You can also clear the lists in |
|
|
<i>.wgetrc</i>.</p> |
|
|
|
|
|
<pre style="margin-left:11%; margin-top: 1em"> wget -X " -X /~nobody,/~somebody</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Most options |
|
|
that do not accept arguments are <i>boolean</i> options, so |
|
|
named because their state can be captured with a yes-or-no |
|
|
("boolean") variable. For example, |
|
|
<b>--follow-ftp</b> tells Wget to follow |
|
|
<small>FTP</small> links from <small>HTML</small> files and, |
|
|
on the other hand, <b>--no-glob</b> tells |
|
|
it not to perform file globbing on <small>FTP</small> URLs. |
|
|
A boolean option is either <i>affirmative</i> or |
|
|
<i>negative</i> (beginning with <b>--no</b>). |
|
|
All such options share several properties.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Unless stated |
|
|
otherwise, it is assumed that the default behavior is the |
|
|
opposite of what the option accomplishes. For example, the |
|
|
documented existence of |
|
|
<b>--follow-ftp</b> assumes that the |
|
|
default is to <i>not</i> follow <small>FTP</small> links |
|
|
from <small>HTML</small> pages.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Affirmative |
|
|
options can be negated by prepending the |
|
|
<b>--no-</b> to the option name; negative |
|
|
options can be negated by omitting the |
|
|
<b>--no-</b> prefix. This might seem |
|
|
superfluous---if the default for an |
|
|
affirmative option is to not do something, then why provide |
|
|
a way to explicitly turn it off? But the startup file may in |
|
|
fact change the default. For instance, using |
|
|
<tt>"follow_ftp = on"</tt> in <i>.wgetrc</i> makes |
|
|
Wget <i>follow</i> <small>FTP</small> links by default, and |
|
|
using <b>--no-follow-ftp</b> is the |
|
|
only way to restore the factory default from the command |
|
|
line.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Basic |
|
|
Startup Options</b></p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-V</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--version</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Display the version of |
|
|
Wget.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-h</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:11%;"><b>--help</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Print a help message describing |
|
|
all of Wget’s command-line options.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-b</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--background</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Go to background immediately |
|
|
after startup. If no output file is specified via the |
|
|
<b>-o</b>, output is redirected to |
|
|
<i>wget-log</i>.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-e</b> <i>command</i> |
|
|
<b><br> |
|
|
--execute</b> <i>command</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Execute <i>command</i> as if it |
|
|
were a part of <i>.wgetrc</i>. A command thus invoked will |
|
|
be executed <i>after</i> the commands in <i>.wgetrc</i>, |
|
|
thus taking precedence over them. If you need to specify |
|
|
more than one wgetrc command, use multiple instances of |
|
|
<b>-e</b>.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Logging and |
|
|
Input File Options <br> |
|
|
-o</b> <i>logfile</i> <b><br> |
|
|
--output-file=</b><i>logfile</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Log all messages to |
|
|
<i>logfile</i>. The messages are normally reported to |
|
|
standard error.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-a</b> <i>logfile</i> |
|
|
<b><br> |
|
|
--append-output=</b><i>logfile</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Append to <i>logfile</i>. This |
|
|
is the same as <b>-o</b>, only it appends to |
|
|
<i>logfile</i> instead of overwriting the old log file. If |
|
|
<i>logfile</i> does not exist, a new file is created.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-d</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:11%;"><b>--debug</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on debug output, meaning |
|
|
various information important to the developers of Wget if |
|
|
it does not work properly. Your system administrator may |
|
|
have chosen to compile Wget without debug support, in which |
|
|
case <b>-d</b> will not work. Please note that |
|
|
compiling with debug support is always |
|
|
safe---Wget compiled with the debug |
|
|
support will <i>not</i> print any debug info unless |
|
|
requested with <b>-d</b>.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-q</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:11%;"><b>--quiet</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off Wget’s |
|
|
output.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-v</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--verbose</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on verbose output, with |
|
|
all the available data. The default output is verbose.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="4%"> |
|
|
|
|
|
|
|
|
<p><b>-nv</b></p></td> |
|
|
<td width="85%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-verbose</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off verbose without being |
|
|
completely quiet (use <b>-q</b> for that), which means |
|
|
that error messages and basic information still get |
|
|
printed.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--report-speed=</b><i>type</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Output bandwidth as |
|
|
<i>type</i>. The only accepted value is <b>bits</b>.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-i</b> <i>file</i> |
|
|
<b><br> |
|
|
--input-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Read URLs from a local or |
|
|
external <i>file</i>. If <b>-</b> is specified as |
|
|
<i>file</i>, URLs are read from the standard input. (Use |
|
|
<b>./-</b> to read from a file literally named |
|
|
<b>-</b>.)</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If this |
|
|
function is used, no URLs need be present on the command |
|
|
line. If there are URLs both on the command line and in an |
|
|
input file, those on the command lines will be the first |
|
|
ones to be retrieved. If |
|
|
<b>--force-html</b> is not specified, then |
|
|
<i>file</i> should consist of a series of URLs, one per |
|
|
line.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">However, if you |
|
|
specify <b>--force-html</b>, the document |
|
|
will be regarded as <b>html</b>. In that case you may have |
|
|
problems with relative links, which you can solve either by |
|
|
adding <tt>"<base |
|
|
href="</tt><i>url</i><tt>">"</tt> to the |
|
|
documents or by specifying |
|
|
<b>--base=</b><i>url</i> on the command |
|
|
line.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If the |
|
|
<i>file</i> is an external one, the document will be |
|
|
automatically treated as <b>html</b> if the Content-Type |
|
|
matches <b>text/html</b>. Furthermore, the |
|
|
<i>file</i>’s location will be implicitly used as base |
|
|
href if none was specified.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--input-metalink=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Downloads files covered in |
|
|
local Metalink <i>file</i>. Metalink version 3 and 4 are |
|
|
supported.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--keep-badhash</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Keeps downloaded |
|
|
Metalink’s files with a bad hash. It appends .badhash |
|
|
to the name of Metalink’s files which have a checksum |
|
|
mismatch, except without overwriting existing files.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--metalink-over-http</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Issues <small>HTTP HEAD</small> |
|
|
request instead of <small>GET</small> and extracts Metalink |
|
|
metadata from response headers. Then it switches to Metalink |
|
|
download. If no valid Metalink metadata is found, it falls |
|
|
back to ordinary <small>HTTP</small> download. Enables |
|
|
<b>Content-Type: application/metalink4+xml</b> files |
|
|
download/processing.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--metalink-index=</b><i>number</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the Metalink |
|
|
<b>application/metalink4+xml</b> metaurl ordinal |
|
|
<small>NUMBER.</small> From 1 to the total number of |
|
|
"application/metalink4+xml" available. Specify 0 |
|
|
or <b>inf</b> to choose the first good one. Metaurls, such |
|
|
as those from a |
|
|
<b>--metalink-over-http</b>, may |
|
|
have been sorted by priority key’s value; keep this in |
|
|
mind to choose the right <small>NUMBER.</small></p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--preferred-location</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set preferred location for |
|
|
Metalink resources. This has effect if multiple resources |
|
|
with same priority are available.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>--xattr</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Enable use of file |
|
|
system’s extended attributes to save the original |
|
|
<small>URL</small> and the Referer <small>HTTP</small> |
|
|
header value if used.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Be aware that |
|
|
the <small>URL</small> might contain private information |
|
|
like access tokens or credentials.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-F</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--force-html</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When input is read from a file, |
|
|
force it to be treated as an <small>HTML</small> file. This |
|
|
enables you to retrieve relative links from existing |
|
|
<small>HTML</small> files on your local disk, by adding |
|
|
<tt>"<base |
|
|
href="</tt><i>url</i><tt>">"</tt> to |
|
|
<small>HTML,</small> or using the <b>--base</b> |
|
|
command-line option.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-B</b> |
|
|
<i><small>URL</small></i> <b><br> |
|
|
--base=</b> <i><small>URL</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Resolves relative links using |
|
|
<i><small>URL</small></i> as the point of reference, when |
|
|
reading links from an <small>HTML</small> file specified via |
|
|
the <b>-i</b>/<b>--input-file</b> |
|
|
option (together with <b>--force-html</b>, |
|
|
or when the input file was fetched remotely from a server |
|
|
describing it as <small>HTML</small> ). This is equivalent |
|
|
to the presence of a <tt>"BASE"</tt> tag in the |
|
|
<small>HTML</small> input file, with |
|
|
<i><small>URL</small></i> as the value for the |
|
|
<tt>"href"</tt> attribute.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">For instance, |
|
|
if you specify <b>http://foo/bar/a.html</b> for |
|
|
<i><small>URL</small></i> , and Wget reads |
|
|
<b>../baz/b.html</b> from the input file, it would be |
|
|
resolved to <b>http://foo/baz/b.html</b>.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>--config=</b> |
|
|
<i><small>FILE</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the location of a |
|
|
startup file you wish to use instead of the default one(s). |
|
|
Use --no-config to disable reading of |
|
|
config files. If both --config and |
|
|
--no-config are given, |
|
|
--no-config is ignored.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--rejected-log=</b><i>logfile</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Logs all <small>URL</small> |
|
|
rejections to <i>logfile</i> as comma separated values. The |
|
|
values include the reason of rejection, the |
|
|
<small>URL</small> and the parent <small>URL</small> it was |
|
|
found in.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Download |
|
|
Options <br> |
|
|
--bind-address=</b> |
|
|
<i><small>ADDRESS</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">When making client |
|
|
<small>TCP/IP</small> connections, bind to |
|
|
<i><small>ADDRESS</small></i> on the local machine. |
|
|
<i><small>ADDRESS</small></i> may be specified as a hostname |
|
|
or <small>IP</small> address. This option can be useful if |
|
|
your machine is bound to multiple IPs.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--bind-dns-address=</b> |
|
|
<i><small>ADDRESS</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">[libcares only] This address |
|
|
overrides the route for <small>DNS</small> requests. If you |
|
|
ever need to circumvent the standard settings from |
|
|
/etc/resolv.conf, this option together with |
|
|
<b>--dns-servers</b> is your friend. |
|
|
<i><small>ADDRESS</small></i> must be specified either as |
|
|
IPv4 or IPv6 address. Wget needs to be built with libcares |
|
|
for this option to be available.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--dns-servers=</b> |
|
|
<i><small>ADDRESSES</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">[libcares only] The given |
|
|
address(es) override the standard nameserver addresses, e.g. |
|
|
as configured in /etc/resolv.conf. |
|
|
<i><small>ADDRESSES</small></i> may be specified either as |
|
|
IPv4 or IPv6 addresses, comma-separated. Wget needs to be |
|
|
built with libcares for this option to be available.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-t</b> <i>number</i> |
|
|
<b><br> |
|
|
--tries=</b><i>number</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set number of tries to |
|
|
<i>number</i>. Specify 0 or <b>inf</b> for infinite |
|
|
retrying. The default is to retry 20 times, with the |
|
|
exception of fatal errors like "connection |
|
|
refused" or "not found" (404), which are not |
|
|
retried.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-O</b> <i>file</i> |
|
|
<b><br> |
|
|
--output-document=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">The documents will not be |
|
|
written to the appropriate files, but all will be |
|
|
concatenated together and written to <i>file</i>. If |
|
|
<b>-</b> is used as <i>file</i>, documents will be |
|
|
printed to standard output, disabling link conversion. (Use |
|
|
<b>./-</b> to print to a file literally named |
|
|
<b>-</b>.)</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Use of |
|
|
<b>-O</b> is <i>not</i> intended to mean simply |
|
|
"use the name <i>file</i> instead of the one in the |
|
|
<small>URL</small> ;" rather, it is analogous to shell |
|
|
redirection: <b>wget -O file http://foo</b> is |
|
|
intended to work like <b>wget -O - http://foo |
|
|
> file</b>; <i>file</i> will be truncated immediately, |
|
|
and <i>all</i> downloaded content will be written there.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">For this |
|
|
reason, <b>-N</b> (for timestamp-checking) is not |
|
|
supported in combination with <b>-O</b>: since |
|
|
<i>file</i> is always newly created, it will always have a |
|
|
very new timestamp. A warning will be issued if this |
|
|
combination is used.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Similarly, |
|
|
using <b>-r</b> or <b>-p</b> with |
|
|
<b>-O</b> may not work as you expect: Wget won’t |
|
|
just download the first file to <i>file</i> and then |
|
|
download the rest to their normal names: <i>all</i> |
|
|
downloaded content will be placed in <i>file</i>. This was |
|
|
disabled in version 1.11, but has been reinstated (with a |
|
|
warning) in 1.11.2, as there are some cases where this |
|
|
behavior can actually have some use.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">A combination |
|
|
with <b>-nc</b> is only accepted if the given output |
|
|
file does not exist.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that a |
|
|
combination with <b>-k</b> is only permitted when |
|
|
downloading a single document, as in that case it will just |
|
|
convert all relative URIs to external ones; <b>-k</b> |
|
|
makes no sense for multiple URIs when they’re all |
|
|
being downloaded to a single file; <b>-k</b> can be |
|
|
used only when the output is a regular file.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="4%"> |
|
|
|
|
|
|
|
|
<p><b>-nc</b></p></td> |
|
|
<td width="85%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-clobber</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If a file is downloaded more |
|
|
than once in the same directory, Wget’s behavior |
|
|
depends on a few options, including <b>-nc</b>. In |
|
|
certain cases, the local file will be <i>clobbered</i>, or |
|
|
overwritten, upon repeated download. In other cases it will |
|
|
be preserved.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When running |
|
|
Wget without <b>-N</b>, <b>-nc</b>, |
|
|
<b>-r</b>, or <b>-p</b>, downloading the same |
|
|
file in the same directory will result in the original copy |
|
|
of <i>file</i> being preserved and the second copy being |
|
|
named <i>file</i><b>.1</b>. If that file is downloaded yet |
|
|
again, the third copy will be named <i>file</i><b>.2</b>, |
|
|
and so on. (This is also the behavior with <b>-nd</b>, |
|
|
even if <b>-r</b> or <b>-p</b> are in effect.) |
|
|
When <b>-nc</b> is specified, this behavior is |
|
|
suppressed, and Wget will refuse to download newer copies of |
|
|
<i>file</i>. Therefore, |
|
|
"<tt>"no-clobber"</tt>" is |
|
|
actually a misnomer in this |
|
|
mode---it’s not clobbering |
|
|
that’s prevented (as the numeric suffixes were already |
|
|
preventing clobbering), but rather the multiple version |
|
|
saving that’s prevented.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When running |
|
|
Wget with <b>-r</b> or <b>-p</b>, but without |
|
|
<b>-N</b>, <b>-nd</b>, or <b>-nc</b>, |
|
|
re-downloading a file will result in the new copy simply |
|
|
overwriting the old. Adding <b>-nc</b> will prevent |
|
|
this behavior, instead causing the original version to be |
|
|
preserved and any newer copies on the server to be |
|
|
ignored.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When running |
|
|
Wget with <b>-N</b>, with or without <b>-r</b> |
|
|
or <b>-p</b>, the decision as to whether or not to |
|
|
download a newer copy of a file depends on the local and |
|
|
remote timestamp and size of the file. <b>-nc</b> may |
|
|
not be specified at the same time as <b>-N</b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">A combination |
|
|
with |
|
|
<b>-O</b>/<b>--output-document</b> |
|
|
is only accepted if the given output file does not |
|
|
exist.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that when |
|
|
<b>-nc</b> is specified, files with the suffixes |
|
|
<b>.html</b> or <b>.htm</b> will be loaded from the local |
|
|
disk and parsed as if they had been retrieved from the |
|
|
Web.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--backups=</b><i>backups</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Before (over)writing a file, |
|
|
back up an existing file by adding a <b>.1</b> suffix |
|
|
(<b>_1</b> on <small>VMS</small> ) to the file name. Such |
|
|
backup files are rotated to <b>.2</b>, <b>.3</b>, and so on, |
|
|
up to <i>backups</i> (and lost beyond that).</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-netrc</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not try to obtain |
|
|
credentials from <i>.netrc</i> file. By default |
|
|
<i>.netrc</i> file is searched for credentials in case none |
|
|
have been passed on command line and authentication is |
|
|
required.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-c</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--continue</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Continue getting a |
|
|
partially-downloaded file. This is useful when you want to |
|
|
finish up a download started by a previous instance of Wget, |
|
|
or by another program. For instance:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If there is a |
|
|
file named <i>ls-lR.Z</i> in the current directory, |
|
|
Wget will assume that it is the first portion of the remote |
|
|
file, and will ask the server to continue the retrieval from |
|
|
an offset equal to the length of the local file.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that you |
|
|
don’t need to specify this option if you just want the |
|
|
current invocation of Wget to retry downloading a file |
|
|
should the connection be lost midway through. This is the |
|
|
default behavior. <b>-c</b> only affects resumption of |
|
|
downloads started <i>prior</i> to this invocation of Wget, |
|
|
and whose local files are still sitting around.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Without |
|
|
<b>-c</b>, the previous example would just download |
|
|
the remote file to <i>ls-lR.Z.1</i>, leaving the |
|
|
truncated <i>ls-lR.Z</i> file alone.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you use |
|
|
<b>-c</b> on a non-empty file, and the server does not |
|
|
support continued downloading, Wget will restart the |
|
|
download from scratch and overwrite the existing file |
|
|
entirely.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Beginning with |
|
|
Wget 1.7, if you use <b>-c</b> on a file which is of |
|
|
equal size as the one on the server, Wget will refuse to |
|
|
download the file and print an explanatory message. The same |
|
|
happens when the file is smaller on the server than locally |
|
|
(presumably because it was changed on the server since your |
|
|
last download attempt)---because |
|
|
"continuing" is not meaningful, no download |
|
|
occurs.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">On the other |
|
|
side of the coin, while using <b>-c</b>, any file |
|
|
that’s bigger on the server than locally will be |
|
|
considered an incomplete download and only |
|
|
<tt>"(length(remote) - length(local))"</tt> |
|
|
bytes will be downloaded and tacked onto the end of the |
|
|
local file. This behavior can be desirable in certain |
|
|
cases---for instance, you can use <b>wget |
|
|
-c</b> to download just the new portion that’s |
|
|
been appended to a data collection or log file.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">However, if the |
|
|
file is bigger on the server because it’s been |
|
|
<i>changed</i>, as opposed to just <i>appended</i> to, |
|
|
you’ll end up with a garbled file. Wget has no way of |
|
|
verifying that the local file is really a valid prefix of |
|
|
the remote file. You need to be especially careful of this |
|
|
when using <b>-c</b> in conjunction with |
|
|
<b>-r</b>, since every file will be considered as an |
|
|
"incomplete download" candidate.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Another |
|
|
instance where you’ll get a garbled file if you try to |
|
|
use <b>-c</b> is if you have a lame |
|
|
<small>HTTP</small> proxy that inserts a "transfer |
|
|
interrupted" string into the local file. In the future |
|
|
a "rollback" option may be added to deal with this |
|
|
case.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that |
|
|
<b>-c</b> only works with <small>FTP</small> servers |
|
|
and with <small>HTTP</small> servers that support the |
|
|
<tt>"Range"</tt> header.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--start-pos=</b> |
|
|
<i><small>OFFSET</small></i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Start downloading at zero-based |
|
|
position <i><small>OFFSET</small></i> . Offset may be |
|
|
expressed in bytes, kilobytes with the ‘k’ |
|
|
suffix, or megabytes with the ‘m’ suffix, |
|
|
etc.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em"><b>--start-pos</b> |
|
|
has higher precedence over <b>--continue</b>. |
|
|
When <b>--start-pos</b> and |
|
|
<b>--continue</b> are both specified, wget will |
|
|
emit a warning then proceed as if |
|
|
<b>--continue</b> was absent.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Server support |
|
|
for continued download is required, otherwise |
|
|
<b>--start-pos</b> cannot help. See |
|
|
<b>-c</b> for details.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--progress=</b><i>type</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Select the type of the progress |
|
|
indicator you wish to use. Legal indicators are |
|
|
"dot" and "bar".</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
"bar" indicator is used by default. It draws an |
|
|
<small>ASCII</small> progress bar graphics (a.k.a |
|
|
"thermometer" display) indicating the status of |
|
|
retrieval. If the output is not a <small>TTY,</small> the |
|
|
"dot" bar will be used by default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Use |
|
|
<b>--progress=dot</b> to switch to the |
|
|
"dot" display. It traces the retrieval by printing |
|
|
dots on the screen, each dot representing a fixed amount of |
|
|
downloaded data.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The progress |
|
|
<i>type</i> can also take one or more parameters. The |
|
|
parameters vary based on the <i>type</i> selected. |
|
|
Parameters to <i>type</i> are passed by appending them to |
|
|
the type sperated by a colon (:) like this: |
|
|
<b>--progress=</b><i>type</i><b>:</b><i>parameter1</i><b>:</b><i>parameter2</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When using the |
|
|
dotted retrieval, you may set the <i>style</i> by specifying |
|
|
the type as <b>dot:</b><i>style</i>. Different styles assign |
|
|
different meaning to one dot. With the |
|
|
<tt>"default"</tt> style each dot represents 1K, |
|
|
there are ten dots in a cluster and 50 dots in a line. The |
|
|
<tt>"binary"</tt> style has a more |
|
|
"computer"-like |
|
|
orientation---8K dots, 16-dots |
|
|
clusters and 48 dots per line (which makes for 384K lines). |
|
|
The <tt>"mega"</tt> style is suitable for |
|
|
downloading large files---each dot |
|
|
represents 64K retrieved, there are eight dots in a cluster, |
|
|
and 48 dots on each line (so each line contains 3M). If |
|
|
<tt>"mega"</tt> is not enough then you can use the |
|
|
<tt>"giga"</tt> style---each dot |
|
|
represents 1M retrieved, there are eight dots in a cluster, |
|
|
and 32 dots on each line (so each line contains 32M).</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">With |
|
|
<b>--progress=bar</b>, there are currently two |
|
|
possible parameters, <i>force</i> and <i>noscroll</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When the output |
|
|
is not a <small>TTY,</small> the progress bar always falls |
|
|
back to "dot", even if |
|
|
<b>--progress=bar</b> was passed to Wget during |
|
|
invocation. This behaviour can be overridden and the |
|
|
"bar" output forced by using the "force" |
|
|
parameter as <b>--progress=bar:force</b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, the |
|
|
<b>bar</b> style progress bar scroll the name of the file |
|
|
from left to right for the file being downloaded if the |
|
|
filename exceeds the maximum length allotted for its |
|
|
display. In certain cases, such as with |
|
|
<b>--progress=bar:force</b>, one may not want |
|
|
the scrolling filename in the progress bar. By passing the |
|
|
"noscroll" parameter, Wget can be forced to |
|
|
display as much of the filename as possible without |
|
|
scrolling through it.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that you |
|
|
can set the default style using the |
|
|
<tt>"progress"</tt> command in <i>.wgetrc</i>. |
|
|
That setting may be overridden from the command line. For |
|
|
example, to force the bar output without scrolling, use |
|
|
<b>--progress=bar:force:noscroll</b>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--show-progress</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Force wget to display the |
|
|
progress bar in any verbosity.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, |
|
|
wget only displays the progress bar in verbose mode. One may |
|
|
however, want wget to display the progress bar on screen in |
|
|
conjunction with any other verbosity modes like |
|
|
<b>--no-verbose</b> or |
|
|
<b>--quiet</b>. This is often a desired a |
|
|
property when invoking wget to download several small/large |
|
|
files. In such a case, wget could simply be invoked with |
|
|
this parameter to get a much cleaner output on the |
|
|
screen.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This option |
|
|
will also force the progress bar to be printed to |
|
|
<i>stderr</i> when used alongside the |
|
|
<b>--output-file</b> option.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-N</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--timestamping</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on time-stamping.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-if-modified-since</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not send If-Modified-Since |
|
|
header in <b>-N</b> mode. Send preliminary |
|
|
<small>HEAD</small> request instead. This has only effect in |
|
|
<b>-N</b> mode.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-use-server-timestamps</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Don’t set the local |
|
|
file’s timestamp by the one on the server.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, |
|
|
when a file is downloaded, its timestamps are set to match |
|
|
those from the remote file. This allows the use of |
|
|
<b>--timestamping</b> on subsequent invocations |
|
|
of wget. However, it is sometimes useful to base the local |
|
|
file’s timestamp on when it was actually downloaded; |
|
|
for that purpose, the |
|
|
<b>--no-use-server-timestamps</b> |
|
|
option has been provided.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-S</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--server-response</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Print the headers sent by |
|
|
<small>HTTP</small> servers and responses sent by |
|
|
<small>FTP</small> servers.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--spider</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When invoked with this option, |
|
|
Wget will behave as a Web <i>spider</i>, which means that it |
|
|
will not download the pages, just check that they are there. |
|
|
For example, you can use Wget to check your bookmarks:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget --spider --force-html -i bookmarks.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This feature |
|
|
needs much more work for Wget to get close to the |
|
|
functionality of real web spiders.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-T seconds <br> |
|
|
--timeout=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the network timeout to |
|
|
<i>seconds</i> seconds. This is equivalent to specifying |
|
|
<b>--dns-timeout</b>, |
|
|
<b>--connect-timeout</b>, and |
|
|
<b>--read-timeout</b>, all at the same |
|
|
time.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When |
|
|
interacting with the network, Wget can check for timeout and |
|
|
abort the operation if it takes too long. This prevents |
|
|
anomalies like hanging reads and infinite connects. The only |
|
|
timeout enabled by default is a 900-second read |
|
|
timeout. Setting a timeout to 0 disables it altogether. |
|
|
Unless you know what you are doing, it is best not to change |
|
|
the default timeout settings.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">All |
|
|
timeout-related options accept decimal values, as well as |
|
|
subsecond values. For example, <b>0.1</b> seconds is a legal |
|
|
(though unwise) choice of timeout. Subsecond timeouts are |
|
|
useful for checking server response times or for testing |
|
|
network latency.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--dns-timeout=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the <small>DNS</small> |
|
|
lookup timeout to <i>seconds</i> seconds. <small>DNS</small> |
|
|
lookups that don’t complete within the specified time |
|
|
will fail. By default, there is no timeout on |
|
|
<small>DNS</small> lookups, other than that implemented by |
|
|
system libraries.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--connect-timeout=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the connect timeout to |
|
|
<i>seconds</i> seconds. <small>TCP</small> connections that |
|
|
take longer to establish will be aborted. By default, there |
|
|
is no connect timeout, other than that implemented by system |
|
|
libraries.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--read-timeout=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the read (and write) |
|
|
timeout to <i>seconds</i> seconds. The "time" of |
|
|
this timeout refers to <i>idle time</i>: if, at any point in |
|
|
the download, no data is received for more than the |
|
|
specified number of seconds, reading fails and the download |
|
|
is restarted. This option does not directly affect the |
|
|
duration of the entire download.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Of course, the |
|
|
remote server may choose to terminate the connection sooner |
|
|
than this option requires. The default read timeout is 900 |
|
|
seconds.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--limit-rate=</b><i>amount</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Limit the download speed to |
|
|
<i>amount</i> bytes per second. Amount may be expressed in |
|
|
bytes, kilobytes with the <b>k</b> suffix, or megabytes with |
|
|
the <b>m</b> suffix. For example, |
|
|
<b>--limit-rate=20k</b> will limit the |
|
|
retrieval rate to 20KB/s. This is useful when, for whatever |
|
|
reason, you don’t want Wget to consume the entire |
|
|
available bandwidth.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This option |
|
|
allows the use of decimal numbers, usually in conjunction |
|
|
with power suffixes; for example, |
|
|
<b>--limit-rate=2.5k</b> is a legal |
|
|
value.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that Wget |
|
|
implements the limiting by sleeping the appropriate amount |
|
|
of time after a network read that took less time than |
|
|
specified by the rate. Eventually this strategy causes the |
|
|
<small>TCP</small> transfer to slow down to approximately |
|
|
the specified rate. However, it may take some time for this |
|
|
balance to be achieved, so don’t be surprised if |
|
|
limiting the rate doesn’t work well with very small |
|
|
files.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-w</b> <i>seconds</i> |
|
|
<b><br> |
|
|
--wait=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Wait the specified number of |
|
|
seconds between the retrievals. Use of this option is |
|
|
recommended, as it lightens the server load by making the |
|
|
requests less frequent. Instead of in seconds, the time can |
|
|
be specified in minutes using the <tt>"m"</tt> |
|
|
suffix, in hours using <tt>"h"</tt> suffix, or in |
|
|
days using <tt>"d"</tt> suffix.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Specifying a |
|
|
large value for this option is useful if the network or the |
|
|
destination host is down, so that Wget can wait long enough |
|
|
to reasonably expect the network error to be fixed before |
|
|
the retry. The waiting interval specified by this function |
|
|
is influenced by |
|
|
<tt>"--random-wait"</tt>, which |
|
|
see.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--waitretry=</b><i>seconds</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">If you don’t want Wget to |
|
|
wait between <i>every</i> retrieval, but only between |
|
|
retries of failed downloads, you can use this option. Wget |
|
|
will use <i>linear backoff</i>, waiting 1 second after the |
|
|
first failure on a given file, then waiting 2 seconds after |
|
|
the second failure on that file, up to the maximum number of |
|
|
<i>seconds</i> you specify.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, |
|
|
Wget will assume a value of 10 seconds.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--random-wait</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Some web sites may perform log |
|
|
analysis to identify retrieval programs such as Wget by |
|
|
looking for statistically significant similarities in the |
|
|
time between requests. This option causes the time between |
|
|
requests to vary between 0.5 and 1.5 * <i>wait</i> seconds, |
|
|
where <i>wait</i> was specified using the |
|
|
<b>--wait</b> option, in order to mask |
|
|
Wget’s presence from such analysis.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">A 2001 article |
|
|
in a publication devoted to development on a popular |
|
|
consumer platform provided code to perform this analysis on |
|
|
the fly. Its author suggested blocking at the class C |
|
|
address level to ensure automated retrieval programs were |
|
|
blocked despite changing DHCP-supplied addresses.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<b>--random-wait</b> option was inspired |
|
|
by this ill-advised recommendation to block many unrelated |
|
|
users from a web site due to the actions of one.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-proxy</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Don’t use proxies, even |
|
|
if the appropriate <tt>*_proxy</tt> environment variable is |
|
|
defined.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-Q</b> <i>quota</i> |
|
|
<b><br> |
|
|
--quota=</b><i>quota</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify download quota for |
|
|
automatic retrievals. The value can be specified in bytes |
|
|
(default), kilobytes (with <b>k</b> suffix), or megabytes |
|
|
(with <b>m</b> suffix).</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that quota |
|
|
will never affect downloading a single file. So if you |
|
|
specify <b>wget -Q10k |
|
|
https://example.com/ls-lR.gz</b>, all of the |
|
|
<i>ls-lR.gz</i> will be downloaded. The same goes even |
|
|
when several URLs are specified on the command-line. |
|
|
However, quota is respected when retrieving either |
|
|
recursively, or from an input file. Thus you may safely type |
|
|
<b>wget -Q2m -i |
|
|
sites</b>---download will be aborted when |
|
|
the quota is exceeded.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Setting quota |
|
|
to 0 or to <b>inf</b> unlimits the download quota.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-dns-cache</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off caching of |
|
|
<small>DNS</small> lookups. Normally, Wget remembers the |
|
|
<small>IP</small> addresses it looked up from |
|
|
<small>DNS</small> so it doesn’t have to repeatedly |
|
|
contact the <small>DNS</small> server for the same |
|
|
(typically small) set of hosts it retrieves from. This cache |
|
|
exists in memory only; a new Wget run will contact |
|
|
<small>DNS</small> again.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">However, it has |
|
|
been reported that in some situations it is not desirable to |
|
|
cache host names, even for the duration of a short-running |
|
|
application like Wget. With this option Wget issues a new |
|
|
<small>DNS</small> lookup (more precisely, a new call to |
|
|
<tt>"gethostbyname"</tt> or |
|
|
<tt>"getaddrinfo"</tt>) each time it makes a new |
|
|
connection. Please note that this option will <i>not</i> |
|
|
affect caching that might be performed by the resolving |
|
|
library or by an external caching layer, such as |
|
|
<small>NSCD.</small></p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you |
|
|
don’t understand exactly what this option does, you |
|
|
probably won’t need it.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--restrict-file-names=</b><i>modes</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Change which characters found |
|
|
in remote URLs must be escaped during generation of local |
|
|
filenames. Characters that are <i>restricted</i> by this |
|
|
option are escaped, i.e. replaced with <b>%HH</b>, where |
|
|
<b><small>HH</small></b> is the hexadecimal number that |
|
|
corresponds to the restricted character. This option may |
|
|
also be used to force all alphabetical cases to be either |
|
|
lower- or uppercase.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, |
|
|
Wget escapes the characters that are not valid or safe as |
|
|
part of file names on your operating system, as well as |
|
|
control characters that are typically unprintable. This |
|
|
option is useful for changing these defaults, perhaps |
|
|
because you are downloading to a non-native partition, or |
|
|
because you want to disable escaping of the control |
|
|
characters, or you want to further restrict characters to |
|
|
only those in the <small>ASCII</small> range of values.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<i>modes</i> are a comma-separated set of text values. The |
|
|
acceptable values are <b>unix</b>, <b>windows</b>, |
|
|
<b>nocontrol</b>, <b>ascii</b>, <b>lowercase</b>, and |
|
|
<b>uppercase</b>. The values <b>unix</b> and <b>windows</b> |
|
|
are mutually exclusive (one will override the other), as are |
|
|
<b>lowercase</b> and <b>uppercase</b>. Those last are |
|
|
special cases, as they do not change the set of characters |
|
|
that would be escaped, but rather force local file paths to |
|
|
be converted either to lower- or uppercase.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When |
|
|
"unix" is specified, Wget escapes the character |
|
|
<b>/</b> and the control characters in the ranges |
|
|
0--31 and 128--159. This is the |
|
|
default on Unix-like operating systems.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When |
|
|
"windows" is given, Wget escapes the characters |
|
|
<b>\</b>, <b>|</b>, <b>/</b>, <b>:</b>, <b>?</b>, |
|
|
<b>"</b>, <b>*</b>, <b><</b>, <b>></b>, and the |
|
|
control characters in the ranges 0--31 and |
|
|
128--159. In addition to this, Wget in Windows |
|
|
mode uses <b>+</b> instead of <b>:</b> to separate host and |
|
|
port in local file names, and uses <b>@</b> instead of |
|
|
<b>?</b> to separate the query portion of the file name from |
|
|
the rest. Therefore, a <small>URL</small> that would be |
|
|
saved as <b>www.xemacs.org:4300/search.pl?input=blah</b> in |
|
|
Unix mode would be saved as |
|
|
<b>www.xemacs.org+4300/search.pl@input=blah</b> in Windows |
|
|
mode. This mode is the default on Windows.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you specify |
|
|
<b>nocontrol</b>, then the escaping of the control |
|
|
characters is also switched off. This option may make sense |
|
|
when you are downloading URLs whose names contain |
|
|
<small>UTF-8</small> characters, on a system which can |
|
|
save and display filenames in <small>UTF-8</small> |
|
|
(some possible byte values used in |
|
|
<small>UTF-8</small> byte sequences fall in the range |
|
|
of values designated by Wget as "controls").</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<b>ascii</b> mode is used to specify that any bytes whose |
|
|
values are outside the range of <small>ASCII</small> |
|
|
characters (that is, greater than 127) shall be escaped. |
|
|
This can be useful when saving filenames whose encoding does |
|
|
not match the one used locally.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-4</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--inet4-only</b></p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-6</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--inet6-only</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Force connecting to IPv4 or |
|
|
IPv6 addresses. With <b>--inet4-only</b> |
|
|
or <b>-4</b>, Wget will only connect to IPv4 hosts, |
|
|
ignoring <small>AAAA</small> records in <small>DNS,</small> |
|
|
and refusing to connect to IPv6 addresses specified in URLs. |
|
|
Conversely, with <b>--inet6-only</b> or |
|
|
<b>-6</b>, Wget will only connect to IPv6 hosts and |
|
|
ignore A records and IPv4 addresses.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Neither options |
|
|
should be needed normally. By default, an IPv6-aware |
|
|
Wget will use the address family specified by the |
|
|
host’s <small>DNS</small> record. If the |
|
|
<small>DNS</small> responds with both IPv4 and IPv6 |
|
|
addresses, Wget will try them in sequence until it finds one |
|
|
it can connect to. (Also see |
|
|
<tt>"--prefer-family"</tt> |
|
|
option described below.)</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">These options |
|
|
can be used to deliberately force the use of IPv4 or IPv6 |
|
|
address families on dual family systems, usually to aid |
|
|
debugging or to deal with broken network configuration. Only |
|
|
one of <b>--inet6-only</b> and |
|
|
<b>--inet4-only</b> may be specified at |
|
|
the same time. Neither option is available in Wget compiled |
|
|
without IPv6 support.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--prefer-family=none/IPv4/IPv6</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When given a choice of several |
|
|
addresses, connect to the addresses with specified address |
|
|
family first. The address order returned by |
|
|
<small>DNS</small> is used without change by default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This avoids |
|
|
spurious errors and connect attempts when accessing hosts |
|
|
that resolve to both IPv6 and IPv4 addresses from IPv4 |
|
|
networks. For example, <b>www.kame.net</b> resolves to |
|
|
<b>2001:200:0:8002:203:47ff:fea5:3085</b> and to |
|
|
<b>203.178.141.194</b>. When the preferred family is |
|
|
<tt>"IPv4"</tt>, the IPv4 address is used first; |
|
|
when the preferred family is <tt>"IPv6"</tt>, the |
|
|
IPv6 address is used first; if the specified value is |
|
|
<tt>"none"</tt>, the address order returned by |
|
|
<small>DNS</small> is used without change.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Unlike |
|
|
<b>-4</b> and <b>-6</b>, this option |
|
|
doesn’t inhibit access to any address family, it only |
|
|
changes the <i>order</i> in which the addresses are |
|
|
accessed. Also note that the reordering performed by this |
|
|
option is <i>stable</i>---it doesn’t |
|
|
affect order of addresses of the same family. That is, the |
|
|
relative order of all IPv4 addresses and of all IPv6 |
|
|
addresses remains intact in all cases.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--retry-connrefused</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Consider "connection |
|
|
refused" a transient error and try again. Normally Wget |
|
|
gives up on a <small>URL</small> when it is unable to |
|
|
connect to the site because failure to connect is taken as a |
|
|
sign that the server is not running at all and that retries |
|
|
would not help. This option is for mirroring unreliable |
|
|
sites whose servers tend to disappear for short periods of |
|
|
time.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--user=</b><i>user</i> |
|
|
<b><br> |
|
|
--password=</b><i>password</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the username |
|
|
<i>user</i> and password <i>password</i> for both |
|
|
<small>FTP</small> and <small>HTTP</small> file retrieval. |
|
|
These parameters can be overridden using the |
|
|
<b>--ftp-user</b> and |
|
|
<b>--ftp-password</b> options for |
|
|
<small>FTP</small> connections and the |
|
|
<b>--http-user</b> and |
|
|
<b>--http-password</b> options for |
|
|
<small>HTTP</small> connections.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ask-password</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Prompt for a password for each |
|
|
connection established. Cannot be specified when |
|
|
<b>--password</b> is being used, because they |
|
|
are mutually exclusive.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--use-askpass=</b><i>command</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Prompt for a user and password |
|
|
using the specified command. If no command is specified then |
|
|
the command in the environment variable |
|
|
<small>WGET_ASKPASS</small> is used. If |
|
|
<small>WGET_ASKPASS</small> is not set then the command in |
|
|
the environment variable <small>SSH_ASKPASS</small> is |
|
|
used.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You can set the |
|
|
default command for use-askpass in the <i>.wgetrc</i>. That |
|
|
setting may be overridden from the command line.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-iri</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off internationalized |
|
|
<small>URI</small> ( <small>IRI</small> ) support. Use |
|
|
<b>--iri</b> to turn it on. <small>IRI</small> |
|
|
support is activated by default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You can set the |
|
|
default state of <small>IRI</small> support using the |
|
|
<tt>"iri"</tt> command in <i>.wgetrc</i>. That |
|
|
setting may be overridden from the command line.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--local-encoding=</b><i>encoding</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Force Wget to use |
|
|
<i>encoding</i> as the default system encoding. That affects |
|
|
how Wget converts URLs specified as arguments from locale to |
|
|
<small>UTF-8</small> for <small>IRI</small> |
|
|
support.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Wget use the |
|
|
function <tt>"nl_langinfo()"</tt> and then the |
|
|
<tt>"CHARSET"</tt> environment variable to get the |
|
|
locale. If it fails, <small>ASCII</small> is used.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You can set the |
|
|
default local encoding using the |
|
|
<tt>"local_encoding"</tt> command in |
|
|
<i>.wgetrc</i>. That setting may be overridden from the |
|
|
command line.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--remote-encoding=</b><i>encoding</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Force Wget to use |
|
|
<i>encoding</i> as the default remote server encoding. That |
|
|
affects how Wget converts URIs found in files from remote |
|
|
encoding to <small>UTF-8</small> during a recursive |
|
|
fetch. This options is only useful for <small>IRI</small> |
|
|
support, for the interpretation of non-ASCII characters.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">For |
|
|
<small>HTTP,</small> remote encoding can be found in |
|
|
<small>HTTP</small> <tt>"Content-Type"</tt> |
|
|
header and in <small>HTML</small> |
|
|
<tt>"Content-Type http-equiv"</tt> |
|
|
meta tag.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You can set the |
|
|
default encoding using the |
|
|
<tt>"remoteencoding"</tt> command in |
|
|
<i>.wgetrc</i>. That setting may be overridden from the |
|
|
command line.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--unlink</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Force Wget to unlink file |
|
|
instead of clobbering existing file. This option is useful |
|
|
for downloading to the directory with hardlinks.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Directory |
|
|
Options</b></p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="4%"> |
|
|
|
|
|
|
|
|
<p><b>-nd</b></p></td> |
|
|
<td width="85%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-directories</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not create a hierarchy of |
|
|
directories when retrieving recursively. With this option |
|
|
turned on, all files will get saved to the current |
|
|
directory, without clobbering (if a name shows up more than |
|
|
once, the filenames will get extensions <b>.n</b>).</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-x</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--force-directories</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">The opposite of |
|
|
<b>-nd</b>---create a hierarchy of |
|
|
directories, even if one would not have been created |
|
|
otherwise. E.g. <b>wget -x |
|
|
http://fly.srk.fer.hr/robots.txt</b> will save the |
|
|
downloaded file to <i>fly.srk.fer.hr/robots.txt</i>.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="4%"> |
|
|
|
|
|
|
|
|
<p><b>-nH</b></p></td> |
|
|
<td width="85%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-host-directories</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Disable generation of |
|
|
host-prefixed directories. By default, invoking Wget with |
|
|
<b>-r http://fly.srk.fer.hr/</b> will create a |
|
|
structure of directories beginning with |
|
|
<i>fly.srk.fer.hr/</i>. This option disables such |
|
|
behavior.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--protocol-directories</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use the protocol name as a |
|
|
directory component of local file names. For example, with |
|
|
this option, <b>wget -r http://</b><i>host</i> will |
|
|
save to <b>http/</b><i>host</i><b>/...</b> rather than just |
|
|
to <i>host</i><b>/...</b>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--cut-dirs=</b><i>number</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Ignore <i>number</i> directory |
|
|
components. This is useful for getting a fine-grained |
|
|
control over the directory where recursive retrieval will be |
|
|
saved.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Take, for |
|
|
example, the directory at |
|
|
<b>ftp://ftp.xemacs.org/pub/xemacs/</b>. If you retrieve it |
|
|
with <b>-r</b>, it will be saved locally under |
|
|
<i>ftp.xemacs.org/pub/xemacs/</i>. While the |
|
|
<b>-nH</b> option can remove the |
|
|
<i>ftp.xemacs.org/</i> part, you are still stuck with |
|
|
<i>pub/xemacs</i>. This is where |
|
|
<b>--cut-dirs</b> comes in handy; it makes |
|
|
Wget not "see" <i>number</i> remote directory |
|
|
components. Here are several examples of how |
|
|
<b>--cut-dirs</b> option works.</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> No options -> ftp.xemacs.org/pub/xemacs/ |
|
|
-nH -> pub/xemacs/ |
|
|
-nH --cut-dirs=1 -> xemacs/ |
|
|
-nH --cut-dirs=2 -> . |
|
|
--cut-dirs=1 -> ftp.xemacs.org/xemacs/ |
|
|
...</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you just |
|
|
want to get rid of the directory structure, this option is |
|
|
similar to a combination of <b>-nd</b> and |
|
|
<b>-P</b>. However, unlike <b>-nd</b>, |
|
|
<b>--cut-dirs</b> does not lose with |
|
|
subdirectories---for instance, with |
|
|
<b>-nH --cut-dirs=1</b>, a |
|
|
<i>beta/</i> subdirectory will be placed to |
|
|
<i>xemacs/beta</i>, as one would expect.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-P</b> <i>prefix</i> |
|
|
<b><br> |
|
|
--directory-prefix=</b><i>prefix</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set directory prefix to |
|
|
<i>prefix</i>. The <i>directory prefix</i> is the directory |
|
|
where all other files and subdirectories will be saved to, |
|
|
i.e. the top of the retrieval tree. The default is <b>.</b> |
|
|
(the current directory).</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b><small>HTTP</small> |
|
|
Options <br> |
|
|
--default-page=</b><i>name</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use <i>name</i> as the default |
|
|
file name when it isn’t known (i.e., for URLs that end |
|
|
in a slash), instead of <i>index.html</i>.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-E</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--adjust-extension</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If a file of type |
|
|
<b>application/xhtml+xml</b> or <b>text/html</b> is |
|
|
downloaded and the <small>URL</small> does not end with the |
|
|
regexp <b>\.[Hh][Tt][Mm][Ll]?</b>, this option will cause |
|
|
the suffix <b>.html</b> to be appended to the local |
|
|
filename. This is useful, for instance, when you’re |
|
|
mirroring a remote site that uses <b>.asp</b> pages, but you |
|
|
want the mirrored pages to be viewable on your stock Apache |
|
|
server. Another good use for this is when you’re |
|
|
downloading CGI-generated materials. A <small>URL</small> |
|
|
like <b>http://site.com/article.cgi?25</b> will be saved as |
|
|
<i>article.cgi?25.html</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that |
|
|
filenames changed in this way will be re-downloaded every |
|
|
time you re-mirror a site, because Wget can’t tell |
|
|
that the local <i>X.html</i> file corresponds to remote |
|
|
<small>URL</small> <i>X</i> (since it doesn’t yet know |
|
|
that the <small>URL</small> produces output of type |
|
|
<b>text/html</b> or <b>application/xhtml+xml</b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">As of version |
|
|
1.12, Wget will also ensure that any downloaded files of |
|
|
type <b>text/css</b> end in the suffix <b>.css</b>, and the |
|
|
option was renamed from |
|
|
<b>--html-extension</b>, to better reflect |
|
|
its new behavior. The old option name is still acceptable, |
|
|
but should now be considered deprecated.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">As of version |
|
|
1.19.2, Wget will also ensure that any downloaded files with |
|
|
a <tt>"Content-Encoding"</tt> of <b>br</b>, |
|
|
<b>compress</b>, <b>deflate</b> or <b>gzip</b> end in the |
|
|
suffix <b>.br</b>, <b>.Z</b>, <b>.zlib</b> and <b>.gz</b> |
|
|
respectively.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">At some point |
|
|
in the future, this option may well be expanded to include |
|
|
suffixes for other types of content, including content types |
|
|
that are not parsed by Wget.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--http-user=</b><i>user</i> |
|
|
<b><br> |
|
|
--http-password=</b><i>password</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the username |
|
|
<i>user</i> and password <i>password</i> on an |
|
|
<small>HTTP</small> server. According to the type of the |
|
|
challenge, Wget will encode them using either the |
|
|
<tt>"basic"</tt> (insecure), the |
|
|
<tt>"digest"</tt>, or the Windows |
|
|
<tt>"NTLM"</tt> authentication scheme.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Another way to |
|
|
specify username and password is in the <small>URL</small> |
|
|
itself. Either method reveals your password to anyone who |
|
|
bothers to run <tt>"ps"</tt>. To prevent the |
|
|
passwords from being seen, use the |
|
|
<b>--use-askpass</b> or store them in |
|
|
<i>.wgetrc</i> or <i>.netrc</i>, and make sure to protect |
|
|
those files from other users with |
|
|
<tt>"chmod"</tt>. If the passwords are really |
|
|
important, do not leave them lying in those files |
|
|
either---edit the files and delete them |
|
|
after Wget has started the download.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-http-keep-alive</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off the |
|
|
"keep-alive" feature for <small>HTTP</small> |
|
|
downloads. Normally, Wget asks the server to keep the |
|
|
connection open so that, when you download more than one |
|
|
document from the same server, they get transferred over the |
|
|
same <small>TCP</small> connection. This saves time and at |
|
|
the same time reduces the load on the server.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This option is |
|
|
useful when, for some reason, persistent (keep-alive) |
|
|
connections don’t work for you, for example due to a |
|
|
server bug or due to the inability of server-side scripts to |
|
|
cope with the connections.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-cache</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Disable server-side cache. In |
|
|
this case, Wget will send the remote server appropriate |
|
|
directives (<b>Cache-Control: no-cache</b> and <b>Pragma: |
|
|
no-cache</b>) to get the file from the remote service, |
|
|
rather than returning the cached version. This is especially |
|
|
useful for retrieving and flushing out-of-date documents on |
|
|
proxy servers.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Caching is |
|
|
allowed by default.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-cookies</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Disable the use of cookies. |
|
|
Cookies are a mechanism for maintaining server-side state. |
|
|
The server sends the client a cookie using the |
|
|
<tt>"Set-Cookie"</tt> header, and the client |
|
|
responds with the same cookie upon further requests. Since |
|
|
cookies allow the server owners to keep track of visitors |
|
|
and for sites to exchange this information, some consider |
|
|
them a breach of privacy. The default is to use cookies; |
|
|
however, <i>storing</i> cookies is not on by default.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--load-cookies</b> |
|
|
<i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Load cookies from <i>file</i> |
|
|
before the first <small>HTTP</small> retrieval. <i>file</i> |
|
|
is a textual file in the format originally used by |
|
|
Netscape’s <i>cookies.txt</i> file.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You will |
|
|
typically use this option when mirroring sites that require |
|
|
that you be logged in to access some or all of their |
|
|
content. The login process typically works by the web server |
|
|
issuing an <small>HTTP</small> cookie upon receiving and |
|
|
verifying your credentials. The cookie is then resent by the |
|
|
browser when accessing that part of the site, and so proves |
|
|
your identity.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Mirroring such |
|
|
a site requires Wget to send the same cookies your browser |
|
|
sends when communicating with the site. This is achieved by |
|
|
<b>--load-cookies</b>---simply |
|
|
point Wget to the location of the <i>cookies.txt</i> file, |
|
|
and it will send the same cookies your browser would send in |
|
|
the same situation. Different browsers keep textual cookie |
|
|
files in different locations: <br> |
|
|
"Netscape 4.x."</p> |
|
|
|
|
|
<p style="margin-left:23%;">The cookies are in |
|
|
<i>~/.netscape/cookies.txt</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%;">"Mozilla and Netscape |
|
|
6.x."</p> |
|
|
|
|
|
<p style="margin-left:23%;">Mozilla’s cookie file is |
|
|
also named <i>cookies.txt</i>, located somewhere under |
|
|
<i>~/.mozilla</i>, in the directory of your profile. The |
|
|
full path usually ends up looking somewhat like |
|
|
<i>~/.mozilla/default/some-weird-string/cookies.txt</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%;">"Internet |
|
|
Explorer."</p> |
|
|
|
|
|
<p style="margin-left:23%;">You can produce a cookie file |
|
|
Wget can use by using the File menu, Import and Export, |
|
|
Export Cookies. This has been tested with Internet Explorer |
|
|
5; it is not guaranteed to work with earlier versions.</p> |
|
|
|
|
|
<p style="margin-left:17%;">"Other browsers."</p> |
|
|
|
|
|
<p style="margin-left:23%;">If you are using a different |
|
|
browser to create your cookies, |
|
|
<b>--load-cookies</b> will only work if |
|
|
you can locate or produce a cookie file in the Netscape |
|
|
format that Wget expects.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you cannot |
|
|
use <b>--load-cookies</b>, there might |
|
|
still be an alternative. If your browser supports a |
|
|
"cookie manager", you can use it to view the |
|
|
cookies used when accessing the site you’re mirroring. |
|
|
Write down the name and value of the cookie, and manually |
|
|
instruct Wget to send those cookies, bypassing the |
|
|
"official" cookie support:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget --no-cookies --header "Cookie: <name>=<value>"</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--save-cookies</b> |
|
|
<i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Save cookies to <i>file</i> |
|
|
before exiting. This will not save cookies that have expired |
|
|
or that have no expiry time (so-called "session |
|
|
cookies"), but also see |
|
|
<b>--keep-session-cookies</b>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--keep-session-cookies</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When specified, causes |
|
|
<b>--save-cookies</b> to also save session |
|
|
cookies. Session cookies are normally not saved because they |
|
|
are meant to be kept in memory and forgotten when you exit |
|
|
the browser. Saving them is useful on sites that require you |
|
|
to log in or to visit the home page before you can access |
|
|
some pages. With this option, multiple Wget runs are |
|
|
considered a single browser session as far as the site is |
|
|
concerned.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Since the |
|
|
cookie file format does not normally carry session cookies, |
|
|
Wget marks them with an expiry timestamp of 0. Wget’s |
|
|
<b>--load-cookies</b> recognizes those as |
|
|
session cookies, but it might confuse other browsers. Also |
|
|
note that cookies so loaded will be treated as other session |
|
|
cookies, which means that if you want |
|
|
<b>--save-cookies</b> to preserve them |
|
|
again, you must use |
|
|
<b>--keep-session-cookies</b> |
|
|
again.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ignore-length</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Unfortunately, some |
|
|
<small>HTTP</small> servers ( <small>CGI</small> programs, |
|
|
to be more precise) send out bogus |
|
|
<tt>"Content-Length"</tt> headers, which |
|
|
makes Wget go wild, as it thinks not all the document was |
|
|
retrieved. You can spot this syndrome if Wget retries |
|
|
getting the same document again and again, each time |
|
|
claiming that the (otherwise normal) connection has closed |
|
|
on the very same byte.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">With this |
|
|
option, Wget will ignore the |
|
|
<tt>"Content-Length"</tt> |
|
|
header---as if it never existed.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--header=</b><i>header-line</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Send <i>header-line</i> along |
|
|
with the rest of the headers in each <small>HTTP</small> |
|
|
request. The supplied header is sent as-is, which means it |
|
|
must contain name and value separated by colon, and must not |
|
|
contain newlines.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You may define |
|
|
more than one additional header by specifying |
|
|
<b>--header</b> more than once.</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget --header='Accept-Charset: iso-8859-2' \ |
|
|
--header='Accept-Language: hr' \ |
|
|
http://fly.srk.fer.hr/</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Specification |
|
|
of an empty string as the header value will clear all |
|
|
previous user-defined headers.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">As of Wget |
|
|
1.10, this option can be used to override headers otherwise |
|
|
generated automatically. This example instructs Wget to |
|
|
connect to localhost, but to specify <b>foo.bar</b> in the |
|
|
<tt>"Host"</tt> header:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget --header="Host: foo.bar" http://localhost/</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">In versions of |
|
|
Wget prior to 1.10 such use of <b>--header</b> |
|
|
caused sending of duplicate headers.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--compression=</b><i>type</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Choose the type of compression |
|
|
to be used. Legal values are <b>auto</b>, <b>gzip</b> and |
|
|
<b>none</b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If <b>auto</b> |
|
|
or <b>gzip</b> are specified, Wget asks the server to |
|
|
compress the file using the gzip compression format. If the |
|
|
server compresses the file and responds with the |
|
|
<tt>"Content-Encoding"</tt> header field set |
|
|
appropriately, the file will be decompressed |
|
|
automatically.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If <b>none</b> |
|
|
is specified, wget will not ask the server to compress the |
|
|
file and will not decompress any server responses. This is |
|
|
the default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Compression |
|
|
support is currently experimental. In case it is turned on, |
|
|
please report any bugs to |
|
|
<tt>"bug-wget@gnu.org"</tt>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--max-redirect=</b><i>number</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specifies the maximum number of |
|
|
redirections to follow for a resource. The default is 20, |
|
|
which is usually far more than necessary. However, on those |
|
|
occasions where you want to allow more (or fewer), this is |
|
|
the option to use.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--proxy-user=</b><i>user</i> |
|
|
<b><br> |
|
|
--proxy-password=</b><i>password</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the username |
|
|
<i>user</i> and password <i>password</i> for authentication |
|
|
on a proxy server. Wget will encode them using the |
|
|
<tt>"basic"</tt> authentication scheme.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Security |
|
|
considerations similar to those with |
|
|
<b>--http-password</b> pertain here as |
|
|
well.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--referer=</b><i>url</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Include ‘Referer: |
|
|
<i>url</i>’ header in <small>HTTP</small> request. |
|
|
Useful for retrieving documents with server-side processing |
|
|
that assume they are always being retrieved by interactive |
|
|
web browsers and only come out properly when Referer is set |
|
|
to one of the pages that point to them.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--save-headers</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Save the headers sent by the |
|
|
<small>HTTP</small> server to the file, preceding the actual |
|
|
contents, with an empty line as the separator.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-U</b> |
|
|
<i>agent-string</i> <b><br> |
|
|
--user-agent=</b><i>agent-string</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Identify as <i>agent-string</i> |
|
|
to the <small>HTTP</small> server.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<small>HTTP</small> protocol allows the clients to identify |
|
|
themselves using a <tt>"User-Agent"</tt> |
|
|
header field. This enables distinguishing the |
|
|
<small>WWW</small> software, usually for statistical |
|
|
purposes or for tracing of protocol violations. Wget |
|
|
normally identifies as <b>Wget/</b><i>version</i>, |
|
|
<i>version</i> being the current version number of Wget.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">However, some |
|
|
sites have been known to impose the policy of tailoring the |
|
|
output according to the |
|
|
<tt>"User-Agent"</tt>-supplied |
|
|
information. While this is not such a bad idea in theory, it |
|
|
has been abused by servers denying information to clients |
|
|
other than (historically) Netscape or, more frequently, |
|
|
Microsoft Internet Explorer. This option allows you to |
|
|
change the <tt>"User-Agent"</tt> line issued |
|
|
by Wget. Use of this option is discouraged, unless you |
|
|
really know what you are doing.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Specifying |
|
|
empty user agent with |
|
|
<b>--user-agent=""</b> instructs |
|
|
Wget not to send the <tt>"User-Agent"</tt> |
|
|
header in <small>HTTP</small> requests.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--post-data=</b><i>string</i> |
|
|
<b><br> |
|
|
--post-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use <small>POST</small> as the |
|
|
method for all <small>HTTP</small> requests and send the |
|
|
specified data in the request body. |
|
|
<b>--post-data</b> sends <i>string</i> as |
|
|
data, whereas <b>--post-file</b> sends the |
|
|
contents of <i>file</i>. Other than that, they work in |
|
|
exactly the same way. In particular, they <i>both</i> expect |
|
|
content of the form |
|
|
<tt>"key1=value1&key2=value2"</tt>, with |
|
|
percent-encoding for special characters; the only difference |
|
|
is that one expects its content as a command-line parameter |
|
|
and the other accepts its content from a file. In |
|
|
particular, <b>--post-file</b> is |
|
|
<i>not</i> for transmitting files as form attachments: those |
|
|
must appear as <tt>"key=value"</tt> data (with |
|
|
appropriate percent-coding) just like everything else. Wget |
|
|
does not currently support |
|
|
<tt>"multipart/form-data"</tt> for |
|
|
transmitting <small>POST</small> data; only |
|
|
<tt>"application/x-www-form-urlencoded"</tt>. |
|
|
Only one of <b>--post-data</b> and |
|
|
<b>--post-file</b> should be |
|
|
specified.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Please note |
|
|
that wget does not require the content to be of the form |
|
|
<tt>"key1=value1&key2=value2"</tt>, and |
|
|
neither does it test for it. Wget will simply transmit |
|
|
whatever data is provided to it. Most servers however expect |
|
|
the <small>POST</small> data to be in the above format when |
|
|
processing <small>HTML</small> Forms.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When sending a |
|
|
<small>POST</small> request using the |
|
|
<b>--post-file</b> option, Wget treats the |
|
|
file as a binary file and will send every character in the |
|
|
<small>POST</small> request without stripping trailing |
|
|
newline or formfeed characters. Any other control characters |
|
|
in the text will also be sent as-is in the |
|
|
<small>POST</small> request.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Please be aware |
|
|
that Wget needs to know the size of the <small>POST</small> |
|
|
data in advance. Therefore the argument to |
|
|
<tt>"--post-file"</tt> must be a |
|
|
regular file; specifying a <small>FIFO</small> or something |
|
|
like <i>/dev/stdin</i> won’t work. It’s not |
|
|
quite clear how to work around this limitation inherent in |
|
|
<small>HTTP/1.0.</small> Although <small>HTTP/1.1</small> |
|
|
introduces <i>chunked</i> transfer that doesn’t |
|
|
require knowing the request length in advance, a client |
|
|
can’t use chunked unless it knows it’s talking |
|
|
to an <small>HTTP/1.1</small> server. And it can’t |
|
|
know that until it receives a response, which in turn |
|
|
requires the request to have been completed -- a |
|
|
chicken-and-egg problem.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note: As of |
|
|
version 1.15 if Wget is redirected after the |
|
|
<small>POST</small> request is completed, its behaviour will |
|
|
depend on the response code returned by the server. In case |
|
|
of a 301 Moved Permanently, 302 Moved Temporarily or 307 |
|
|
Temporary Redirect, Wget will, in accordance with |
|
|
<small>RFC2616,</small> continue to send a |
|
|
<small>POST</small> request. In case a server wants the |
|
|
client to change the Request method upon redirection, it |
|
|
should send a 303 See Other response code.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This example |
|
|
shows how to log in to a server using <small>POST</small> |
|
|
and then proceed to download the desired pages, presumably |
|
|
only accessible to authorized users:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> # Log in to the server. This can be done only once. |
|
|
wget --save-cookies cookies.txt \ |
|
|
--post-data 'user=foo&password=bar' \ |
|
|
http://example.com/auth.php |
|
|
# Now grab the page or pages we care about. |
|
|
wget --load-cookies cookies.txt \ |
|
|
-p http://example.com/interesting/article.php</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If the server |
|
|
is using session cookies to track user authentication, the |
|
|
above will not work because |
|
|
<b>--save-cookies</b> will not save them |
|
|
(and neither will browsers) and the <i>cookies.txt</i> file |
|
|
will be empty. In that case use |
|
|
<b>--keep-session-cookies</b> along |
|
|
with <b>--save-cookies</b> to force saving |
|
|
of session cookies.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--method=</b><i>HTTP-Method</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">For the purpose of RESTful |
|
|
scripting, Wget allows sending of other <small>HTTP</small> |
|
|
Methods without the need to explicitly set them using |
|
|
<b>--header=Header-Line</b>. Wget will use |
|
|
whatever string is passed to it after |
|
|
<b>--method</b> as the <small>HTTP</small> |
|
|
Method to the server.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--body-data=</b><i>Data-String</i> |
|
|
<b><br> |
|
|
--body-file=</b><i>Data-File</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Must be set when additional |
|
|
data needs to be sent to the server along with the Method |
|
|
specified using <b>--method</b>. |
|
|
<b>--body-data</b> sends <i>string</i> as |
|
|
data, whereas <b>--body-file</b> sends the |
|
|
contents of <i>file</i>. Other than that, they work in |
|
|
exactly the same way.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Currently, |
|
|
<b>--body-file</b> is <i>not</i> for |
|
|
transmitting files as a whole. Wget does not currently |
|
|
support <tt>"multipart/form-data"</tt> for |
|
|
transmitting data; only |
|
|
<tt>"application/x-www-form-urlencoded"</tt>. |
|
|
In the future, this may be changed so that wget sends the |
|
|
<b>--body-file</b> as a complete file |
|
|
instead of sending its contents to the server. Please be |
|
|
aware that Wget needs to know the contents of |
|
|
<small>BODY</small> Data in advance, and hence the argument |
|
|
to <b>--body-file</b> should be a regular |
|
|
file. See <b>--post-file</b> for a more |
|
|
detailed explanation. Only one of |
|
|
<b>--body-data</b> and |
|
|
<b>--body-file</b> should be |
|
|
specified.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If Wget is |
|
|
redirected after the request is completed, Wget will suspend |
|
|
the current method and send a <small>GET</small> request |
|
|
till the redirection is completed. This is true for all |
|
|
redirection response codes except 307 Temporary Redirect |
|
|
which is used to explicitly specify that the request method |
|
|
should <i>not</i> change. Another exception is when the |
|
|
method is set to <tt>"POST"</tt>, in which case |
|
|
the redirection rules specified under |
|
|
<b>--post-data</b> are followed.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--content-disposition</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If this is set to on, |
|
|
experimental (not fully-functional) support for |
|
|
<tt>"Content-Disposition"</tt> headers is |
|
|
enabled. This can currently result in extra round-trips to |
|
|
the server for a <tt>"HEAD"</tt> request, and is |
|
|
known to suffer from a few bugs, which is why it is not |
|
|
currently enabled by default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">This option is |
|
|
useful for some file-downloading <small>CGI</small> programs |
|
|
that use <tt>"Content-Disposition"</tt> |
|
|
headers to describe what the name of a downloaded file |
|
|
should be.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When combined |
|
|
with <b>--metalink-over-http</b> and |
|
|
<b>--trust-server-names</b>, a |
|
|
<b>Content-Type: application/metalink4+xml</b> file is named |
|
|
using the <tt>"Content-Disposition"</tt> |
|
|
filename field, if available.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--content-on-error</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If this is set to on, wget will |
|
|
not skip the content when the server responds with a http |
|
|
status code that indicates error.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--trust-server-names</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If this is set, on a redirect, |
|
|
the local file name will be based on the redirection |
|
|
<small>URL.</small> By default the local file name is based |
|
|
on the original <small>URL.</small> When doing recursive |
|
|
retrieving this can be helpful because in many web sites |
|
|
redirected URLs correspond to an underlying file structure, |
|
|
while link URLs do not.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--auth-no-challenge</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If this option is given, Wget |
|
|
will send Basic <small>HTTP</small> authentication |
|
|
information (plaintext username and password) for all |
|
|
requests, just like Wget 1.10.2 and prior did by |
|
|
default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Use of this |
|
|
option is not recommended, and is intended only to support |
|
|
some few obscure servers, which never send |
|
|
<small>HTTP</small> authentication challenges, but accept |
|
|
unsolicited auth info, say, in addition to form-based |
|
|
authentication.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--retry-on-host-error</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Consider host errors, such as |
|
|
"Temporary failure in name resolution", as |
|
|
non-fatal, transient errors.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--retry-on-http-error=</b><i>code[,code,...]</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Consider given |
|
|
<small>HTTP</small> response codes as non-fatal, transient |
|
|
errors. Supply a comma-separated list of 3-digit |
|
|
<small>HTTP</small> response codes as argument. Useful to |
|
|
work around special circumstances where retries are |
|
|
required, but the server responds with an error code |
|
|
normally not retried by Wget. Such errors might be 503 |
|
|
(Service Unavailable) and 429 (Too Many Requests). Retries |
|
|
enabled by this option are performed subject to the normal |
|
|
retry timing and retry count limitations of Wget.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Using this |
|
|
option is intended to support special use cases only and is |
|
|
generally not recommended, as it can force retries even in |
|
|
cases where the server is actually trying to decrease its |
|
|
load. Please use wisely and only if you know what you are |
|
|
doing.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b><small>HTTPS</small> |
|
|
( <small>SSL/TLS</small> ) Options</b> <br> |
|
|
To support encrypted <small>HTTP</small> ( |
|
|
<small>HTTPS</small> ) downloads, Wget must be compiled with |
|
|
an external <small>SSL</small> library. The current default |
|
|
is GnuTLS. In addition, Wget also supports |
|
|
<small>HSTS</small> ( <small>HTTP</small> Strict Transport |
|
|
Security). If Wget is compiled without <small>SSL</small> |
|
|
support, none of these options are available. <b><br> |
|
|
--secure-protocol=</b><i>protocol</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Choose the secure protocol to |
|
|
be used. Legal values are <b>auto</b>, <b>SSLv2</b>, |
|
|
<b>SSLv3</b>, <b>TLSv1</b>, <b>TLSv1_1</b>, <b>TLSv1_2</b>, |
|
|
<b>TLSv1_3</b> and <b><small>PFS</small></b> . If |
|
|
<b>auto</b> is used, the <small>SSL</small> library is given |
|
|
the liberty of choosing the appropriate protocol |
|
|
automatically, which is achieved by sending a TLSv1 |
|
|
greeting. This is the default.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Specifying |
|
|
<b>SSLv2</b>, <b>SSLv3</b>, <b>TLSv1</b>, <b>TLSv1_1</b>, |
|
|
<b>TLSv1_2</b> or <b>TLSv1_3</b> forces the use of the |
|
|
corresponding protocol. This is useful when talking to old |
|
|
and buggy <small>SSL</small> server implementations that |
|
|
make it hard for the underlying <small>SSL</small> library |
|
|
to choose the correct protocol version. Fortunately, such |
|
|
servers are quite rare.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Specifying |
|
|
<b><small>PFS</small></b> enforces the use of the so-called |
|
|
Perfect Forward Security cipher suites. In short, |
|
|
<small>PFS</small> adds security by creating a one-time key |
|
|
for each <small>SSL</small> connection. It has a bit more |
|
|
<small>CPU</small> impact on client and server. We use known |
|
|
to be secure ciphers (e.g. no <small>MD4</small> ) and the |
|
|
<small>TLS</small> protocol. This mode also explicitly |
|
|
excludes non-PFS key exchange methods, such as |
|
|
<small>RSA.</small></p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--https-only</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When in recursive mode, only |
|
|
<small>HTTPS</small> links are followed.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ciphers</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the cipher list string. |
|
|
Typically this string sets the cipher suites and other |
|
|
<small>SSL/TLS</small> options that the user wish should be |
|
|
used, in a set order of preference (GnuTLS calls it |
|
|
’priority string’). This string will be fed |
|
|
verbatim to the <small>SSL/TLS</small> engine (OpenSSL or |
|
|
GnuTLS) and hence its format and syntax is dependent on |
|
|
that. Wget will not process or manipulate it in any way. |
|
|
Refer to the OpenSSL or GnuTLS documentation for more |
|
|
information.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-check-certificate</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Don’t check the server |
|
|
certificate against the available certificate authorities. |
|
|
Also don’t require the <small>URL</small> host name to |
|
|
match the common name presented by the certificate.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">As of Wget |
|
|
1.10, the default is to verify the server’s |
|
|
certificate against the recognized certificate authorities, |
|
|
breaking the <small>SSL</small> handshake and aborting the |
|
|
download if the verification fails. Although this provides |
|
|
more secure downloads, it does break interoperability with |
|
|
some sites that worked with previous Wget versions, |
|
|
particularly those using self-signed, expired, or otherwise |
|
|
invalid certificates. This option forces an |
|
|
"insecure" mode of operation that turns the |
|
|
certificate verification errors into warnings and allows you |
|
|
to proceed.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you |
|
|
encounter "certificate verification" errors or |
|
|
ones saying that "common name doesn’t match |
|
|
requested host name", you can use this option to bypass |
|
|
the verification and proceed with the download. <i>Only use |
|
|
this option if you are otherwise convinced of the |
|
|
site’s authenticity, or if you really don’t care |
|
|
about the validity of its certificate.</i> It is almost |
|
|
always a bad idea not to check the certificates when |
|
|
transmitting confidential or important data. For |
|
|
self-signed/internal certificates, you should download |
|
|
the certificate and verify against that instead of forcing |
|
|
this insecure mode. If you are really sure of not desiring |
|
|
any certificate verification, you can specify |
|
|
--check-certificate=quiet to tell wget to |
|
|
not print any warning about invalid certificates, albeit in |
|
|
most cases this is the wrong thing to do.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--certificate=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use the client certificate |
|
|
stored in <i>file</i>. This is needed for servers that are |
|
|
configured to require certificates from the clients that |
|
|
connect to them. Normally a certificate is not required and |
|
|
this switch is optional.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--certificate-type=</b><i>type</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the type of the client |
|
|
certificate. Legal values are <b><small>PEM</small></b> |
|
|
(assumed by default) and <b><small>DER</small></b> , also |
|
|
known as <b><small>ASN1</small></b> .</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--private-key=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Read the private key from |
|
|
<i>file</i>. This allows you to provide the private key in a |
|
|
file separate from the certificate.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--private-key-type=</b><i>type</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the type of the private |
|
|
key. Accepted values are <b><small>PEM</small></b> (the |
|
|
default) and <b><small>DER</small></b> .</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ca-certificate=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use <i>file</i> as the file |
|
|
with the bundle of certificate authorities (" |
|
|
<small>CA"</small> ) to verify the peers. The |
|
|
certificates must be in <small>PEM</small> format.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Without this |
|
|
option Wget looks for <small>CA</small> certificates at the |
|
|
system-specified locations, chosen at OpenSSL installation |
|
|
time.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ca-directory=</b><i>directory</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specifies directory containing |
|
|
<small>CA</small> certificates in <small>PEM</small> format. |
|
|
Each file contains one <small>CA</small> certificate, and |
|
|
the file name is based on a hash value derived from the |
|
|
certificate. This is achieved by processing a certificate |
|
|
directory with the <tt>"c_rehash"</tt> utility |
|
|
supplied with OpenSSL. Using |
|
|
<b>--ca-directory</b> is more efficient |
|
|
than <b>--ca-certificate</b> when many |
|
|
certificates are installed because it allows Wget to fetch |
|
|
certificates on demand.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Without this |
|
|
option Wget looks for <small>CA</small> certificates at the |
|
|
system-specified locations, chosen at OpenSSL installation |
|
|
time.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--crl-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specifies a <small>CRL</small> |
|
|
file in <i>file</i>. This is needed for certificates that |
|
|
have been revocated by the CAs.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--pinnedpubkey=file/hashes</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Tells wget to use the specified |
|
|
public key file (or hashes) to verify the peer. This can be |
|
|
a path to a file which contains a single public key in |
|
|
<small>PEM</small> or <small>DER</small> format, or any |
|
|
number of base64 encoded sha256 hashes preceded by |
|
|
"sha256//" and separated by ";"</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When |
|
|
negotiating a <small>TLS</small> or <small>SSL</small> |
|
|
connection, the server sends a certificate indicating its |
|
|
identity. A public key is extracted from this certificate |
|
|
and if it does not exactly match the public key(s) provided |
|
|
to this option, wget will abort the connection before |
|
|
sending or receiving any data.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--random-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">[OpenSSL and LibreSSL only] Use |
|
|
<i>file</i> as the source of random data for seeding the |
|
|
pseudo-random number generator on systems without |
|
|
<i>/dev/urandom</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">On such systems |
|
|
the <small>SSL</small> library needs an external source of |
|
|
randomness to initialize. Randomness may be provided by |
|
|
<small>EGD</small> (see <b>--egd-file</b> |
|
|
below) or read from an external source specified by the |
|
|
user. If this option is not specified, Wget looks for random |
|
|
data in <tt>$RANDFILE</tt> or, if that is unset, in |
|
|
<i>$HOME/.rnd</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you’re |
|
|
getting the "Could not seed OpenSSL <small>PRNG</small> |
|
|
; disabling <small>SSL."</small> error, you should |
|
|
provide random data using some of the methods described |
|
|
above.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--egd-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">[OpenSSL only] Use <i>file</i> |
|
|
as the <small>EGD</small> socket. <small>EGD</small> stands |
|
|
for <i>Entropy Gathering Daemon</i>, a user-space program |
|
|
that collects data from various unpredictable system sources |
|
|
and makes it available to other programs that might need it. |
|
|
Encryption software, such as the <small>SSL</small> library, |
|
|
needs sources of non-repeating randomness to seed the random |
|
|
number generator used to produce cryptographically strong |
|
|
keys.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">OpenSSL allows |
|
|
the user to specify his own source of entropy using the |
|
|
<tt>"RAND_FILE"</tt> environment variable. If this |
|
|
variable is unset, or if the specified file does not produce |
|
|
enough randomness, OpenSSL will read random data from |
|
|
<small>EGD</small> socket specified using this option.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If this option |
|
|
is not specified (and the equivalent startup command is not |
|
|
used), <small>EGD</small> is never contacted. |
|
|
<small>EGD</small> is not needed on modern Unix systems that |
|
|
support <i>/dev/urandom</i>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-hsts</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Wget supports |
|
|
<small>HSTS</small> ( <small>HTTP</small> Strict Transport |
|
|
Security, <small>RFC 6797</small> ) by default. Use |
|
|
<b>--no-hsts</b> to make Wget act as a |
|
|
non-HSTS-compliant <small>UA.</small> As a consequence, Wget |
|
|
would ignore all the |
|
|
<tt>"Strict-Transport-Security"</tt> |
|
|
headers, and would not enforce any existing |
|
|
<small>HSTS</small> policy.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--hsts-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">By default, Wget stores its |
|
|
<small>HSTS</small> database in <i>~/.wget-hsts</i>. |
|
|
You can use <b>--hsts-file</b> to override |
|
|
this. Wget will use the supplied file as the |
|
|
<small>HSTS</small> database. Such file must conform to the |
|
|
correct <small>HSTS</small> database format used by Wget. If |
|
|
Wget cannot parse the provided file, the behaviour is |
|
|
unspecified.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
Wget’s <small>HSTS</small> database is a plain text |
|
|
file. Each line contains an <small>HSTS</small> entry (ie. a |
|
|
site that has issued a |
|
|
<tt>"Strict-Transport-Security"</tt> |
|
|
header and that therefore has specified a concrete |
|
|
<small>HSTS</small> policy to be applied). Lines starting |
|
|
with a dash (<tt>"#"</tt>) are ignored by Wget. |
|
|
Please note that in spite of this convenient |
|
|
human-readability hand-hacking the <small>HSTS</small> |
|
|
database is generally not a good idea.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">An |
|
|
<small>HSTS</small> entry line consists of several fields |
|
|
separated by one or more whitespace:</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em"><tt>"<hostname> |
|
|
SP [<port>] SP <include subdomains> SP |
|
|
<created> SP <max-age>"</tt></p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<i>hostname</i> and <i>port</i> fields indicate the hostname |
|
|
and port to which the given <small>HSTS</small> policy |
|
|
applies. The <i>port</i> field may be zero, and it will, in |
|
|
most of the cases. That means that the port number will not |
|
|
be taken into account when deciding whether such |
|
|
<small>HSTS</small> policy should be applied on a given |
|
|
request (only the hostname will be evaluated). When |
|
|
<i>port</i> is different to zero, both the target hostname |
|
|
and the port will be evaluated and the <small>HSTS</small> |
|
|
policy will only be applied if both of them match. This |
|
|
feature has been included for testing/development purposes |
|
|
only. The Wget testsuite (in <i>testenv/</i>) creates |
|
|
<small>HSTS</small> databases with explicit ports with the |
|
|
purpose of ensuring Wget’s correct behaviour. Applying |
|
|
<small>HSTS</small> policies to ports other than the default |
|
|
ones is discouraged by <small>RFC 6797</small> (see Appendix |
|
|
B "Differences between <small>HSTS</small> Policy and |
|
|
Same-Origin Policy"). Thus, this functionality should |
|
|
not be used in production environments and <i>port</i> will |
|
|
typically be zero. The last three fields do what they are |
|
|
expected to. The field <i>include_subdomains</i> can either |
|
|
be <tt>1</tt> or <tt>0</tt> and it signals whether the |
|
|
subdomains of the target domain should be part of the given |
|
|
<small>HSTS</small> policy as well. The <i>created</i> and |
|
|
<i>max-age</i> fields hold the timestamp values of when such |
|
|
entry was created (first seen by Wget) and the HSTS-defined |
|
|
value ’max-age’, which states how long |
|
|
should that <small>HSTS</small> policy remain active, |
|
|
measured in seconds elapsed since the timestamp stored in |
|
|
<i>created</i>. Once that time has passed, that |
|
|
<small>HSTS</small> policy will no longer be valid and will |
|
|
eventually be removed from the database.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If you supply |
|
|
your own <small>HSTS</small> database via |
|
|
<b>--hsts-file</b>, be aware that Wget may |
|
|
modify the provided file if any change occurs between the |
|
|
<small>HSTS</small> policies requested by the remote servers |
|
|
and those in the file. When Wget exists, it effectively |
|
|
updates the <small>HSTS</small> database by rewriting the |
|
|
database file with the new entries.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If the supplied |
|
|
file does not exist, Wget will create one. This file will |
|
|
contain the new <small>HSTS</small> entries. If no |
|
|
<small>HSTS</small> entries were generated (no |
|
|
<tt>"Strict-Transport-Security"</tt> |
|
|
headers were sent by any of the servers) then no file will |
|
|
be created, not even an empty one. This behaviour applies to |
|
|
the default database file (<i>~/.wget-hsts</i>) as |
|
|
well: it will not be created until some server enforces an |
|
|
<small>HSTS</small> policy.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Care is taken |
|
|
not to override possible changes made by other Wget |
|
|
processes at the same time over the <small>HSTS</small> |
|
|
database. Before dumping the updated <small>HSTS</small> |
|
|
entries on the file, Wget will re-read it and merge the |
|
|
changes.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Using a custom |
|
|
<small>HSTS</small> database and/or modifying an existing |
|
|
one is discouraged. For more information about the potential |
|
|
security threats arose from such practice, see section 14 |
|
|
"Security Considerations" of <small>RFC |
|
|
6797,</small> specially section 14.9 "Creative |
|
|
Manipulation of <small>HSTS</small> Policy Store".</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-file=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use <i>file</i> as the |
|
|
destination <small>WARC</small> file.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-header=</b><i>string</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Use <i>string</i> into as the |
|
|
warcinfo record.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-max-size=</b><i>size</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set the maximum size of the |
|
|
<small>WARC</small> files to <i>size</i>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-cdx</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Write <small>CDX</small> index |
|
|
files.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-dedup=</b><i>file</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not store records listed in |
|
|
this <small>CDX</small> file.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-warc-compression</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not compress |
|
|
<small>WARC</small> files with <small>GZIP.</small></p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-warc-digests</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not calculate |
|
|
<small>SHA1</small> digests.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-warc-keep-log</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not store the log file in a |
|
|
<small>WARC</small> record.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--warc-tempdir=</b><i>dir</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the location for |
|
|
temporary files created by the <small>WARC</small> |
|
|
writer.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b><small>FTP</small> |
|
|
Options <br> |
|
|
--ftp-user=</b><i>user</i> <b><br> |
|
|
--ftp-password=</b><i>password</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the username |
|
|
<i>user</i> and password <i>password</i> on an |
|
|
<small>FTP</small> server. Without this, or the |
|
|
corresponding startup option, the password defaults to |
|
|
<b>-wget@</b>, normally used for anonymous |
|
|
<small>FTP.</small></p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Another way to |
|
|
specify username and password is in the <small>URL</small> |
|
|
itself. Either method reveals your password to anyone who |
|
|
bothers to run <tt>"ps"</tt>. To prevent the |
|
|
passwords from being seen, store them in <i>.wgetrc</i> or |
|
|
<i>.netrc</i>, and make sure to protect those files from |
|
|
other users with <tt>"chmod"</tt>. If the |
|
|
passwords are really important, do not leave them lying in |
|
|
those files either---edit the files and |
|
|
delete them after Wget has started the download.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-remove-listing</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Don’t remove the |
|
|
temporary <i>.listing</i> files generated by |
|
|
<small>FTP</small> retrievals. Normally, these files contain |
|
|
the raw directory listings received from <small>FTP</small> |
|
|
servers. Not removing them can be useful for debugging |
|
|
purposes, or when you want to be able to easily check on the |
|
|
contents of remote server directories (e.g. to verify that a |
|
|
mirror you’re running is complete).</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that even |
|
|
though Wget writes to a known filename for this file, this |
|
|
is not a security hole in the scenario of a user making |
|
|
<i>.listing</i> a symbolic link to <i>/etc/passwd</i> or |
|
|
something and asking <tt>"root"</tt> to run Wget |
|
|
in his or her directory. Depending on the options used, |
|
|
either Wget will refuse to write to <i>.listing</i>, making |
|
|
the globbing/recursion/time-stamping operation fail, |
|
|
or the symbolic link will be deleted and replaced with the |
|
|
actual <i>.listing</i> file, or the listing will be written |
|
|
to a <i>.listing.number</i> file.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Even though |
|
|
this situation isn’t a problem, though, |
|
|
<tt>"root"</tt> should never run Wget in a |
|
|
non-trusted user’s directory. A user could do |
|
|
something as simple as linking <i>index.html</i> to |
|
|
<i>/etc/passwd</i> and asking <tt>"root"</tt> to |
|
|
run Wget with <b>-N</b> or <b>-r</b> so the file |
|
|
will be overwritten.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-glob</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn off <small>FTP</small> |
|
|
globbing. Globbing refers to the use of shell-like special |
|
|
characters (<i>wildcards</i>), like <b>*</b>, <b>?</b>, |
|
|
<b>[</b> and <b>]</b> to retrieve more than one file from |
|
|
the same directory at once, like:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget ftp://gnjilux.srk.fer.hr/*.msg</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">By default, |
|
|
globbing will be turned on if the <small>URL</small> |
|
|
contains a globbing character. This option may be used to |
|
|
turn globbing on or off permanently.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">You may have to |
|
|
quote the <small>URL</small> to protect it from being |
|
|
expanded by your shell. Globbing makes Wget look for a |
|
|
directory listing, which is system-specific. This is why it |
|
|
currently works only with Unix <small>FTP</small> servers |
|
|
(and the ones emulating Unix <tt>"ls"</tt> |
|
|
output).</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-passive-ftp</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Disable the use of the |
|
|
<i>passive</i> <small>FTP</small> transfer mode. Passive |
|
|
<small>FTP</small> mandates that the client connect to the |
|
|
server to establish the data connection rather than the |
|
|
other way around.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If the machine |
|
|
is connected to the Internet directly, both passive and |
|
|
active <small>FTP</small> should work equally well. Behind |
|
|
most firewall and <small>NAT</small> configurations passive |
|
|
<small>FTP</small> has a better chance of working. However, |
|
|
in some rare firewall configurations, active |
|
|
<small>FTP</small> actually works when passive |
|
|
<small>FTP</small> doesn’t. If you suspect this to be |
|
|
the case, use this option, or set |
|
|
<tt>"passive_ftp=off"</tt> in your init file.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--preserve-permissions</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Preserve remote file |
|
|
permissions instead of permissions set by umask.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--retr-symlinks</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">By default, when retrieving |
|
|
<small>FTP</small> directories recursively and a symbolic |
|
|
link is encountered, the symbolic link is traversed and the |
|
|
pointed-to files are retrieved. Currently, Wget does not |
|
|
traverse symbolic links to directories to download them |
|
|
recursively, though this feature may be added in the |
|
|
future.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">When |
|
|
<b>--retr-symlinks=no</b> is specified, |
|
|
the linked-to file is not downloaded. Instead, a matching |
|
|
symbolic link is created on the local filesystem. The |
|
|
pointed-to file will not be retrieved unless this recursive |
|
|
retrieval would have encountered it separately and |
|
|
downloaded it anyway. This option poses a security risk |
|
|
where a malicious <small>FTP</small> Server may cause Wget |
|
|
to write to files outside of the intended directories |
|
|
through a specially crafted .LISTING file.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that when |
|
|
retrieving a file (not a directory) because it was specified |
|
|
on the command-line, rather than because it was recursed to, |
|
|
this option has no effect. Symbolic links are always |
|
|
traversed in this case.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b><small>FTPS</small> |
|
|
Options <br> |
|
|
--ftps-implicit</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This option tells Wget to use |
|
|
<small>FTPS</small> implicitly. Implicit <small>FTPS</small> |
|
|
consists of initializing <small>SSL/TLS</small> from the |
|
|
very beginning of the control connection. This option does |
|
|
not send an <tt>"AUTH TLS"</tt> command: it |
|
|
assumes the server speaks <small>FTPS</small> and directly |
|
|
starts an <small>SSL/TLS</small> connection. If the attempt |
|
|
is successful, the session continues just like regular |
|
|
<small>FTPS</small> (<tt>"PBSZ"</tt> and |
|
|
<tt>"PROT"</tt> are sent, etc.). Implicit |
|
|
<small>FTPS</small> is no longer a requirement for |
|
|
<small>FTPS</small> implementations, and thus many servers |
|
|
may not support it. If |
|
|
<b>--ftps-implicit</b> is passed and no |
|
|
explicit port number specified, the default port for |
|
|
implicit <small>FTPS, 990,</small> will be used, instead of |
|
|
the default port for the "normal" (explicit) |
|
|
<small>FTPS</small> which is the same as that of <small>FTP, |
|
|
21.</small></p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-ftps-resume-ssl</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not resume the |
|
|
<small>SSL/TLS</small> session in the data channel. When |
|
|
starting a data connection, Wget tries to resume the |
|
|
<small>SSL/TLS</small> session previously started in the |
|
|
control connection. <small>SSL/TLS</small> session |
|
|
resumption avoids performing an entirely new handshake by |
|
|
reusing the <small>SSL/TLS</small> parameters of a previous |
|
|
session. Typically, the <small>FTPS</small> servers want it |
|
|
that way, so Wget does this by default. Under rare |
|
|
circumstances however, one might want to start an entirely |
|
|
new <small>SSL/TLS</small> session in every data connection. |
|
|
This is what |
|
|
<b>--no-ftps-resume-ssl</b> is |
|
|
for.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ftps-clear-data-connection</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">All the data connections will |
|
|
be in plain text. Only the control connection will be under |
|
|
<small>SSL/TLS.</small> Wget will send a <tt>"PROT |
|
|
C"</tt> command to achieve this, which must be approved |
|
|
by the server.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ftps-fallback-to-ftp</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Fall back to <small>FTP</small> |
|
|
if <small>FTPS</small> is not supported by the target |
|
|
server. For security reasons, this option is not asserted by |
|
|
default. The default behaviour is to exit with an error. If |
|
|
a server does not successfully reply to the initial |
|
|
<tt>"AUTH TLS"</tt> command, or in the case of |
|
|
implicit <small>FTPS,</small> if the initial |
|
|
<small>SSL/TLS</small> connection attempt is rejected, it is |
|
|
considered that such server does not support |
|
|
<small>FTPS.</small></p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Recursive |
|
|
Retrieval Options</b></p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-r</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--recursive</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on recursive retrieving. |
|
|
The default maximum depth is 5.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-l</b> <i>depth</i> |
|
|
<b><br> |
|
|
--level=</b><i>depth</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify recursion maximum depth |
|
|
level <i>depth</i>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--delete-after</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This option tells Wget to |
|
|
delete every single file it downloads, <i>after</i> having |
|
|
done so. It is useful for pre-fetching popular pages through |
|
|
a proxy, e.g.:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -r -nd --delete-after http://whatever.com/~popular/page/</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">The |
|
|
<b>-r</b> option is to retrieve recursively, and |
|
|
<b>-nd</b> to not create directories.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that |
|
|
<b>--delete-after</b> deletes files on the |
|
|
local machine. It does not issue the |
|
|
<b><small>DELE</small></b> command to remote |
|
|
<small>FTP</small> sites, for instance. Also note that when |
|
|
<b>--delete-after</b> is specified, |
|
|
<b>--convert-links</b> is ignored, so |
|
|
<b>.orig</b> files are simply not created in the first |
|
|
place.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-k</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--convert-links</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">After the download is complete, |
|
|
convert the links in the document to make them suitable for |
|
|
local viewing. This affects not only the visible hyperlinks, |
|
|
but any part of the document that links to external content, |
|
|
such as embedded images, links to style sheets, hyperlinks |
|
|
to non-HTML content, etc.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Each link will |
|
|
be changed in one of the two ways:</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="17%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>•</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="77%"> |
|
|
|
|
|
|
|
|
<p>The links to files that have been downloaded by Wget |
|
|
will be changed to refer to the file they point to as a |
|
|
relative link.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:23%; margin-top: 1em">Example: if the |
|
|
downloaded file <i>/foo/doc.html</i> links to |
|
|
<i>/bar/img.gif</i>, also downloaded, then the link in |
|
|
<i>doc.html</i> will be modified to point to |
|
|
<b>../bar/img.gif</b>. This kind of transformation works |
|
|
reliably for arbitrary combinations of directories.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="17%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">•</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="77%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">The links to files that have not |
|
|
been downloaded by Wget will be changed to include host name |
|
|
and absolute path of the location they point to.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:23%; margin-top: 1em">Example: if the |
|
|
downloaded file <i>/foo/doc.html</i> links to |
|
|
<i>/bar/img.gif</i> (or to <i>../bar/img.gif</i>), then the |
|
|
link in <i>doc.html</i> will be modified to point to |
|
|
<i>http://hostname/bar/img.gif</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Because of |
|
|
this, local browsing works reliably: if a linked file was |
|
|
downloaded, the link will refer to its local name; if it was |
|
|
not downloaded, the link will refer to its full Internet |
|
|
address rather than presenting a broken link. The fact that |
|
|
the former links are converted to relative links ensures |
|
|
that you can move the downloaded hierarchy to another |
|
|
directory.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that only |
|
|
at the end of the download can Wget know which links have |
|
|
been downloaded. Because of that, the work done by |
|
|
<b>-k</b> will be performed at the end of all the |
|
|
downloads.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--convert-file-only</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This option converts only the |
|
|
filename part of the URLs, leaving the rest of the URLs |
|
|
untouched. This filename part is sometimes referred to as |
|
|
the "basename", although we avoid that term here |
|
|
in order not to cause confusion.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">It works |
|
|
particularly well in conjunction with |
|
|
<b>--adjust-extension</b>, although this |
|
|
coupling is not enforced. It proves useful to populate |
|
|
Internet caches with files downloaded from different |
|
|
hosts.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Example: if |
|
|
some link points to <i>//foo.com/bar.cgi?xyz</i> with |
|
|
<b>--adjust-extension</b> asserted and its |
|
|
local destination is intended to be |
|
|
<i>./foo.com/bar.cgi?xyz.css</i>, then the link would be |
|
|
converted to <i>//foo.com/bar.cgi?xyz.css</i>. Note that |
|
|
only the filename part has been modified. The rest of the |
|
|
<small>URL</small> has been left untouched, including the |
|
|
net path (<tt>"//"</tt>) which would otherwise be |
|
|
processed by Wget and converted to the effective scheme (ie. |
|
|
<tt>"http://"</tt>).</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-K</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--backup-converted</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">When converting a file, back up |
|
|
the original version with a <b>.orig</b> suffix. Affects the |
|
|
behavior of <b>-N</b>.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-m</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--mirror</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on options suitable for |
|
|
mirroring. This option turns on recursion and time-stamping, |
|
|
sets infinite recursion depth and keeps <small>FTP</small> |
|
|
directory listings. It is currently equivalent to |
|
|
<b>-r -N -l inf |
|
|
--no-remove-listing</b>.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-p</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--page-requisites</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This option causes Wget to |
|
|
download all the files that are necessary to properly |
|
|
display a given <small>HTML</small> page. This includes such |
|
|
things as inlined images, sounds, and referenced |
|
|
stylesheets.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Ordinarily, |
|
|
when downloading a single <small>HTML</small> page, any |
|
|
requisite documents that may be needed to display it |
|
|
properly are not downloaded. Using <b>-r</b> together |
|
|
with <b>-l</b> can help, but since Wget does not |
|
|
ordinarily distinguish between external and inlined |
|
|
documents, one is generally left with "leaf |
|
|
documents" that are missing their requisites.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">For instance, |
|
|
say document <i>1.html</i> contains an |
|
|
<tt>"<IMG>"</tt> tag referencing |
|
|
<i>1.gif</i> and an <tt>"<A>"</tt> tag |
|
|
pointing to external document <i>2.html</i>. Say that |
|
|
<i>2.html</i> is similar but that its image is <i>2.gif</i> |
|
|
and it links to <i>3.html</i>. Say this continues up to some |
|
|
arbitrarily high number.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If one executes |
|
|
the command:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -r -l 2 http://<site>/1.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">then |
|
|
<i>1.html</i>, <i>1.gif</i>, <i>2.html</i>, <i>2.gif</i>, |
|
|
and <i>3.html</i> will be downloaded. As you can see, |
|
|
<i>3.html</i> is without its requisite <i>3.gif</i> because |
|
|
Wget is simply counting the number of hops (up to 2) away |
|
|
from <i>1.html</i> in order to determine where to stop the |
|
|
recursion. However, with this command:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -r -l 2 -p http://<site>/1.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">all the above |
|
|
files <i>and 3.html</i>’s requisite <i>3.gif</i> will |
|
|
be downloaded. Similarly,</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -r -l 1 -p http://<site>/1.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">will cause |
|
|
<i>1.html</i>, <i>1.gif</i>, <i>2.html</i>, and <i>2.gif</i> |
|
|
to be downloaded. One might think that:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -r -l 0 -p http://<site>/1.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">would download |
|
|
just <i>1.html</i> and <i>1.gif</i>, but unfortunately this |
|
|
is not the case, because <b>-l 0</b> is equivalent to |
|
|
<b>-l inf</b>---that is, infinite |
|
|
recursion. To download a single <small>HTML</small> page (or |
|
|
a handful of them, all specified on the command-line or in a |
|
|
<b>-i</b> <small>URL</small> input file) and its (or |
|
|
their) requisites, simply leave off <b>-r</b> and |
|
|
<b>-l</b>:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -p http://<site>/1.html</pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note that Wget |
|
|
will behave as if <b>-r</b> had been specified, but |
|
|
only that single page and its requisites will be downloaded. |
|
|
Links from that page to external documents will not be |
|
|
followed. Actually, to download a single page and all its |
|
|
requisites (even if they exist on separate websites), and |
|
|
make sure the lot displays properly locally, this author |
|
|
likes to use a few options in addition to |
|
|
<b>-p</b>:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget -E -H -k -K -p http://<site>/<document></pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">To finish off |
|
|
this topic, it’s worth knowing that Wget’s idea |
|
|
of an external document link is any <small>URL</small> |
|
|
specified in an <tt>"<A>"</tt> tag, an |
|
|
<tt>"<AREA>"</tt> tag, or a |
|
|
<tt>"<LINK>"</tt> tag other than |
|
|
<tt>"<LINK |
|
|
REL="stylesheet">"</tt>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--strict-comments</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Turn on strict parsing of |
|
|
<small>HTML</small> comments. The default is to terminate |
|
|
comments at the first occurrence of |
|
|
<b>--></b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">According to |
|
|
specifications, <small>HTML</small> comments are expressed |
|
|
as <small>SGML</small> <i>declarations</i>. Declaration is |
|
|
special markup that begins with <b><!</b> and ends with |
|
|
<b>></b>, such as <b><!DOCTYPE ...></b>, that may |
|
|
contain comments between a pair of <b>--</b> |
|
|
delimiters. <small>HTML</small> comments are "empty |
|
|
declarations", <small>SGML</small> declarations without |
|
|
any non-comment text. Therefore, |
|
|
<b><!--foo--></b> is a valid |
|
|
comment, and so is <b><!--one-- |
|
|
--two--></b>, but |
|
|
<b><!--1--2--></b> |
|
|
is not.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">On the other |
|
|
hand, most <small>HTML</small> writers don’t perceive |
|
|
comments as anything other than text delimited with |
|
|
<b><!--</b> and <b>--></b>, |
|
|
which is not quite the same. For example, something like |
|
|
<b><!------------></b> |
|
|
works as a valid comment as long as the number of dashes is |
|
|
a multiple of four (!). If not, the comment technically |
|
|
lasts until the next <b>--</b>, which may be at |
|
|
the other end of the document. Because of this, many popular |
|
|
browsers completely ignore the specification and implement |
|
|
what users have come to expect: comments delimited with |
|
|
<b><!--</b> and |
|
|
<b>--></b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Until version |
|
|
1.9, Wget interpreted comments strictly, which resulted in |
|
|
missing links in many web pages that displayed fine in |
|
|
browsers, but had the misfortune of containing non-compliant |
|
|
comments. Beginning with version 1.9, Wget has joined the |
|
|
ranks of clients that implements "naive" comments, |
|
|
terminating each comment at the first occurrence of |
|
|
<b>--></b>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">If, for |
|
|
whatever reason, you want strict comment parsing, use this |
|
|
option to turn it on.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>Recursive |
|
|
Accept/Reject Options <br> |
|
|
-A</b> <i>acclist</i> <b>--accept</b> |
|
|
<i>acclist</i> <b><br> |
|
|
-R</b> <i>rejlist</i> <b>--reject</b> |
|
|
<i>rejlist</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify comma-separated lists |
|
|
of file name suffixes or patterns to accept or reject. Note |
|
|
that if any of the wildcard characters, <b>*</b>, <b>?</b>, |
|
|
<b>[</b> or <b>]</b>, appear in an element of <i>acclist</i> |
|
|
or <i>rejlist</i>, it will be treated as a pattern, rather |
|
|
than a suffix. In this case, you have to enclose the pattern |
|
|
into quotes to prevent your shell from expanding it, like in |
|
|
<b>-A "*.mp3"</b> or <b>-A |
|
|
’*.mp3’</b>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--accept-regex</b> |
|
|
<i>urlregex</i> <b><br> |
|
|
--reject-regex</b> <i>urlregex</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify a regular expression to |
|
|
accept or reject the complete <small>URL.</small></p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--regex-type</b> |
|
|
<i>regextype</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the regular expression |
|
|
type. Possible types are <b>posix</b> or <b>pcre</b>. Note |
|
|
that to be able to use <b>pcre</b> type, wget has to be |
|
|
compiled with libpcre support.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-D</b> |
|
|
<i>domain-list</i> <b><br> |
|
|
--domains=</b><i>domain-list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Set domains to be followed. |
|
|
<i>domain-list</i> is a comma-separated list of domains. |
|
|
Note that it does <i>not</i> turn on <b>-H</b>.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--exclude-domains</b> |
|
|
<i>domain-list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify the domains that are |
|
|
<i>not</i> to be followed.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--follow-ftp</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Follow <small>FTP</small> links |
|
|
from <small>HTML</small> documents. Without this option, |
|
|
Wget will ignore all the <small>FTP</small> links.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--follow-tags=</b><i>list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Wget has an internal table of |
|
|
<small>HTML</small> tag / attribute pairs that it considers |
|
|
when looking for linked documents during a recursive |
|
|
retrieval. If a user wants only a subset of those tags to be |
|
|
considered, however, he or she should be specify such tags |
|
|
in a comma-separated <i>list</i> with this option.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ignore-tags=</b><i>list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">This is the opposite of the |
|
|
<b>--follow-tags</b> option. To skip |
|
|
certain <small>HTML</small> tags when recursively looking |
|
|
for documents to download, specify them in a comma-separated |
|
|
<i>list</i>.</p> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">In the past, |
|
|
this option was the best bet for downloading a single page |
|
|
and its requisites, using a command-line like:</p> |
|
|
|
|
|
<pre style="margin-left:17%; margin-top: 1em"> wget --ignore-tags=a,area -H -k -K -r http://<site>/<document></pre> |
|
|
|
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">However, the |
|
|
author of this option came across a page with tags like |
|
|
<tt>"<LINK REL="home" |
|
|
HREF="/">"</tt> and came to the |
|
|
realization that specifying tags to ignore was not enough. |
|
|
One can’t just tell Wget to ignore |
|
|
<tt>"<LINK>"</tt>, because then stylesheets |
|
|
will not be downloaded. Now the best bet for downloading a |
|
|
single page and its requisites is the dedicated |
|
|
<b>--page-requisites</b> option.</p> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--ignore-case</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Ignore case when matching files |
|
|
and directories. This influences the behavior of -R, |
|
|
-A, -I, and -X options, as well as |
|
|
globbing implemented when downloading from |
|
|
<small>FTP</small> sites. For example, with this option, |
|
|
<b>-A "*.txt"</b> will match |
|
|
<b>file1.txt</b>, but also <b>file2.TXT</b>, |
|
|
<b>file3.TxT</b>, and so on. The quotes in the example are |
|
|
to prevent the shell from expanding the pattern.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-H</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--span-hosts</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Enable spanning across hosts |
|
|
when doing recursive retrieving.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p><b>-L</b></p></td> |
|
|
<td width="86%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--relative</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Follow relative links only. |
|
|
Useful for retrieving a specific home page without any |
|
|
distractions, not even those from the same hosts.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-I</b> <i>list</i> |
|
|
<b><br> |
|
|
--include-directories=</b><i>list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify a comma-separated list |
|
|
of directories you wish to follow when downloading. Elements |
|
|
of <i>list</i> may contain wildcards.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>-X</b> <i>list</i> |
|
|
<b><br> |
|
|
--exclude-directories=</b><i>list</i></p> |
|
|
|
|
|
<p style="margin-left:17%;">Specify a comma-separated list |
|
|
of directories you wish to exclude from download. Elements |
|
|
of <i>list</i> may contain wildcards.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="4%"> |
|
|
|
|
|
|
|
|
<p><b>-np</b></p></td> |
|
|
<td width="85%"> |
|
|
</td></tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%;"><b>--no-parent</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Do not ever ascend to the |
|
|
parent directory when retrieving recursively. This is a |
|
|
useful option, since it guarantees that only the files |
|
|
<i>below</i> a certain hierarchy will be downloaded.</p> |
|
|
|
|
|
<h2>ENVIRONMENT |
|
|
<a name="ENVIRONMENT"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget supports |
|
|
proxies for both <small>HTTP</small> and <small>FTP</small> |
|
|
retrievals. The standard way to specify proxy location, |
|
|
which Wget recognizes, is using the following environment |
|
|
variables: <b><br> |
|
|
http_proxy <br> |
|
|
https_proxy</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">If set, the <b>http_proxy</b> |
|
|
and <b>https_proxy</b> variables should contain the URLs of |
|
|
the proxies for <small>HTTP</small> and <small>HTTPS</small> |
|
|
connections respectively.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>ftp_proxy</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This variable should contain |
|
|
the <small>URL</small> of the proxy for <small>FTP</small> |
|
|
connections. It is quite common that <b>http_proxy</b> and |
|
|
<b>ftp_proxy</b> are set to the same <small>URL.</small></p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>no_proxy</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">This variable should contain a |
|
|
comma-separated list of domain extensions proxy should |
|
|
<i>not</i> be used for. For instance, if the value of |
|
|
<b>no_proxy</b> is <b>.mit.edu</b>, proxy will not be used |
|
|
to retrieve documents from <small>MIT.</small></p> |
|
|
|
|
|
<h2>EXIT STATUS |
|
|
<a name="EXIT STATUS"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Wget may return |
|
|
one of several error codes if it encounters problems.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>0</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>No problems occurred.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>1</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Generic error code.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>2</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Parse error---for instance, when |
|
|
parsing command-line options, the <b>.wgetrc</b> or |
|
|
<b>.netrc</b>...</p> </td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>3</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>File I/O error.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>4</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Network failure.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>5</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p><small>SSL</small> verification failure.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>6</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Username/password authentication failure.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>7</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Protocol errors.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="1%"> |
|
|
|
|
|
|
|
|
<p>8</p></td> |
|
|
<td width="5%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Server issued an error response.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">With the |
|
|
exceptions of 0 and 1, the lower-numbered exit codes take |
|
|
precedence over higher-numbered ones, when multiple types of |
|
|
errors are encountered.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">In versions of |
|
|
Wget prior to 1.12, Wget’s exit status tended to be |
|
|
unhelpful and inconsistent. Recursive downloads would |
|
|
virtually always return 0 (success), regardless of any |
|
|
issues encountered, and non-recursive fetches only returned |
|
|
the status corresponding to the most recently-attempted |
|
|
download.</p> |
|
|
|
|
|
<h2>FILES |
|
|
<a name="FILES"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em"><b>/usr/local/etc/wgetrc</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">Default location of the |
|
|
<i>global</i> startup file.</p> |
|
|
|
|
|
<p style="margin-left:11%;"><b>.wgetrc</b></p> |
|
|
|
|
|
<p style="margin-left:17%;">User startup file.</p> |
|
|
|
|
|
<h2>BUGS |
|
|
<a name="BUGS"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">You are welcome |
|
|
to submit bug reports via the <small>GNU</small> Wget bug |
|
|
tracker (see |
|
|
<<b>https://savannah.gnu.org/bugs/?func=additem&group=wget</b>>) |
|
|
or to our mailing list |
|
|
<<b>bug-wget@gnu.org</b>>.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Visit |
|
|
<<b>https://lists.gnu.org/mailman/listinfo/bug-wget</b>> |
|
|
to get more info (how to subscribe, list archives, ...).</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Before actually |
|
|
submitting a bug report, please try to follow a few simple |
|
|
guidelines.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p>1.</p></td> |
|
|
<td width="3%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Please try to ascertain that the behavior you see really |
|
|
is a bug. If Wget crashes, it’s a bug. If Wget does |
|
|
not behave as documented, it’s a bug. If things work |
|
|
strange, but you are not sure about the way they are |
|
|
supposed to work, it might well be a bug, but you might want |
|
|
to double-check the documentation and the mailing lists.</p></td></tr> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p>2.</p></td> |
|
|
<td width="3%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p>Try to repeat the bug in as simple circumstances as |
|
|
possible. E.g. if Wget crashes while downloading <b>wget |
|
|
-rl0 -kKE -t5 --no-proxy |
|
|
http://example.com -o /tmp/log</b>, you should try to |
|
|
see if the crash is repeatable, and if will occur with a |
|
|
simpler set of options. You might even try to start the |
|
|
download at the page where the crash occurred to see if that |
|
|
page somehow triggered the crash.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Also, while I |
|
|
will probably be interested to know the contents of your |
|
|
<i>.wgetrc</i> file, just dumping it into the debug message |
|
|
is probably a bad idea. Instead, you should first try to see |
|
|
if the bug repeats with <i>.wgetrc</i> moved out of the way. |
|
|
Only if it turns out that <i>.wgetrc</i> settings affect the |
|
|
bug, mail me the relevant parts of the file.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">3.</p></td> |
|
|
<td width="3%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">Please start Wget with |
|
|
<b>-d</b> option and send us the resulting output (or |
|
|
relevant parts thereof). If Wget was compiled without debug |
|
|
support, recompile it---it is <i>much</i> |
|
|
easier to trace bugs with debug support on.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<p style="margin-left:17%; margin-top: 1em">Note: please |
|
|
make sure to remove any potentially sensitive information |
|
|
from the debug log before sending it to the bug address. The |
|
|
<tt>"-d"</tt> won’t go out of its way |
|
|
to collect sensitive information, but the log <i>will</i> |
|
|
contain a fairly complete transcript of Wget’s |
|
|
communication with the server, which may include passwords |
|
|
and pieces of downloaded data. Since the bug address is |
|
|
publicly archived, you may assume that all bug reports are |
|
|
visible to the public.</p> |
|
|
|
|
|
<table width="100%" border="0" rules="none" frame="void" |
|
|
cellspacing="0" cellpadding="0"> |
|
|
<tr valign="top" align="left"> |
|
|
<td width="11%"></td> |
|
|
<td width="3%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">4.</p></td> |
|
|
<td width="3%"></td> |
|
|
<td width="83%"> |
|
|
|
|
|
|
|
|
<p style="margin-top: 1em">If Wget has crashed, try to run |
|
|
it in a debugger, e.g. <tt>"gdb `which wget` |
|
|
core"</tt> and type <tt>"where"</tt> to get |
|
|
the backtrace. This may not work if the system administrator |
|
|
has disabled core files, but it is safe to try.</p></td></tr> |
|
|
</table> |
|
|
|
|
|
<h2>SEE ALSO |
|
|
<a name="SEE ALSO"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">This is |
|
|
<b>not</b> the complete manual for <small>GNU</small> Wget. |
|
|
For more complete information, including more detailed |
|
|
explanations of some of the options, and a number of |
|
|
commands available for use with <i>.wgetrc</i> files and the |
|
|
<b>-e</b> option, see the <small>GNU</small> Info |
|
|
entry for <i>wget</i>.</p> |
|
|
|
|
|
<h2>AUTHOR |
|
|
<a name="AUTHOR"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Originally |
|
|
written by Hrvoje NikÅ¡iÄ |
|
|
<hniksic@xemacs.org>.</p> |
|
|
|
|
|
<h2>COPYRIGHT |
|
|
<a name="COPYRIGHT"></a> |
|
|
</h2> |
|
|
|
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Copyright (c) |
|
|
1996-2011, 2015, 2018-2019 Free Software |
|
|
Foundation, Inc.</p> |
|
|
|
|
|
<p style="margin-left:11%; margin-top: 1em">Permission is |
|
|
granted to copy, distribute and/or modify this document |
|
|
under the terms of the <small>GNU</small> Free Documentation |
|
|
License, Version 1.3 or any later version published by the |
|
|
Free Software Foundation; with no Invariant Sections, with |
|
|
no Front-Cover Texts, and with no Back-Cover Texts. A copy |
|
|
of the license is included in the section entitled " |
|
|
<small>GNU</small> Free Documentation License".</p> |
|
|
<hr> |
|
|
</body> |
|
|
</html> |
|
|
|