[plug] Re: mapping out a website

Mageaere mageaere at hushmail.com
Thu Jun 10 22:29:41 WST 2004


Programs like wget and www.httrack.org ,which is the site morroring tool
I use, and things like google follow a protocol that allows websites
to opt out from being mirrored or spidered by these programs. Look at
http://www.gnu.org/robots.txt and it shows that these programs are not
supposed to search in certain places in this website. I don't think that
that is your problem though. 

On Wed, 09 Jun 2004 23:04:38 -0700 David Buddrige <buddrige at wasp.net.au>
wrote:
>I have been experimenting with the --spider option, but couldn't
>get it to 
>work... here's a transcript of running wget from my isp shell account:
>>
>>
>>
>[buddrige at wasp buddrige]$ wget -r --spider -o test.txt http://www.gnu.org
>[buddrige at wasp buddrige]$ cat test.txt
> --13:57:48--  http://www.gnu.org/
>          => `www.gnu.org/index.html'
>Resolving www.gnu.org... done.
>Connecting to www.gnu.org[199.232.41.10]:80... connected.
>HTTP request sent, awaiting response... 200 OK
>Length: 12,756 [text/html]
>200 OK 
>
>www.gnu.org/index.html: No such file or directory 
>
>FINISHED --13:57:58--
>Downloaded: 0 bytes in 0 files
>[buddrige at wasp buddrige]$ 
>
>I wasn't sure what to search for to do this task... will try searching
>for 
>"web mapping" on google... but also, am a bit confused as to wget's
>>
>behaviour...  was primarily interested in wget because it is [theoretically]
>>
>scriptable... 
>
>thanks 
>
>David. 




Concerned about your privacy? Follow this link to get
FREE encrypted email: https://www.hushmail.com/?l=2

Free, ultra-private instant messaging with Hush Messenger
https://www.hushmail.com/services.php?subloc=messenger&l=434

Promote security and make money with the Hushmail Affiliate Program: 
https://www.hushmail.com/about.php?subloc=affiliate&l=427



More information about the plug mailing list