[plug] Re: mapping out a website

Thu Jun 10 14:04:38 WST 2004

I have been experimenting with the --spider option, but couldn't get it to 
work... here's a transcript of running wget from my isp shell account: 

[buddrige at wasp buddrige]$ wget -r --spider -o test.txt http://www.gnu.org
[buddrige at wasp buddrige]$ cat test.txt
 --13:57:48--  http://www.gnu.org/
          => `www.gnu.org/index.html'
Resolving www.gnu.org... done.
Connecting to www.gnu.org[199.232.41.10]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12,756 [text/html]
200 OK 

www.gnu.org/index.html: No such file or directory 

FINISHED --13:57:58--
Downloaded: 0 bytes in 0 files
[buddrige at wasp buddrige]$ 

I wasn't sure what to search for to do this task... will try searching for 
"web mapping" on google... but also, am a bit confused as to wget's 
behaviour...  was primarily interested in wget because it is [theoretically] 
scriptable... 

thanks 

David. 

Mark O'Shea writes: 

> On Thu, 10 Jun 2004, David Buddrige wrote:
>> I have been asked to map out all the pages in a given intranet website.  So
>> for example, given website url: 
>>
>> http://abc.com/ 
>>
>> They want a list of every url that can be got at from the links on the
>> initial page, sort of like this: 
>>
> Would this wrok for you?:
> wget -r --spider -o logfile.txt http://abc.com/ 
> 
> Have you tried searching google for website mapping or similar? 
> 
> Regards,
> -- 
> Mark O'Shea
> _______________________________________________
> PLUG discussion list: plug at plug.linux.org.au
> http://mail.plug.linux.org.au/cgi-bin/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au