Open window

Think globally, act locally!!

Download the entire content of a site July 20, 2008

Filed under: Learnings — Sheikh Jafar Tarique @ 2:27 pm

wget is a command line tool for downloading any object from net. It also be used to download

the entire content of a site too. Here is a example of doing so

wget -r  -l0  –no-parent  http://urltodowload

here,

- r = recursive

-l0= restricts the recursiveness to only 0 levels deep

–limit-rate = to mask your download somewhat, 20KB would be nice i think

-nc = for avoiding downloading linked files already downloaded
-w = to wait 4 seconds between retrievals
–random-wait = to make the -w time * 0-2

-A jpg,jpeg tells it to only save .jpg and .jpeg files

-U = user-agent
-S = print server response
http_proxy = does what you asked (note that i don’t know wheither wget might automaticly fall back on no proxy or not)

Now the wget command with all these switces -

wget -U “Mozilla/5.0 (X11; U; Linux i686; nl; rv:1.7.3) Gecko/20040916″ -r -l 2 -A jpg,jpeg -nc –limit-rate=20K -w 4 –random-wait http://insertdomainhere.com/directory http_proxy http://username:passwrd@(ifneeded)yourproxy.com:port -S

 

Leave a Reply