Download the entire content of a site

wget is a command line tool for downloading any object from net. It also be used to download

the entire content of a site too. Here is a example of doing so

wget -r  -l0  –no-parent  http://urltodowload

here,

– r = recursive

-l0= restricts the recursiveness to only 0 levels deep

–limit-rate = to mask your download somewhat, 20KB would be nice i think

-nc = for avoiding downloading linked files already downloaded
-w = to wait 4 seconds between retrievals
–random-wait = to make the -w time * 0-2

-A jpg,jpeg tells it to only save .jpg and .jpeg files

-U = user-agent
-S = print server response
http_proxy = does what you asked (note that i don’t know wheither wget might automaticly fall back on no proxy or not)

Now the wget command with all these switces –

wget -U “Mozilla/5.0 (X11; U; Linux i686; nl; rv:1.7.3) Gecko/20040916” -r -l 2 -A jpg,jpeg -nc –limit-rate=20K -w 4 –random-wait http://insertdomainhere.com/directory http_proxy http://username:passwrd@(ifneeded)yourproxy.com:port -S

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: