[plug] mapping out a website

Thu Jun 10 16:39:32 WST 2004

Hi David,

A reliable site-mapping perl script I use is tree.pl:
http://www.danielnaber.de/tree/

HTH
Steve

Quoting David Buddrige <buddrige at wasp.net.au>:

> Hi all, 
> 
> I have been asked to map out all the pages in a given intranet website.  So 
> for example, given website url: 
> 
> http://abc.com/ 
> 
> They want a list of every url that can be got at from the links on the 
> initial page, sort of like this: 
> 
> http://abc.com/
>   http://abc.com/page1.html
>   http://abc.com/page2.html
>      http://abc.com/page2a.html
>      http://abc.com/page2b.html
>      http://abc.com/page2c.html
>   http://abc.com/page3.html
>   http://abc.com/page4.html
>      http://abc.com/page4a.html
>      http://abc.com/page4b.html
>         http://abc.com/page4b1.html
>      http://abc.com/page4c.html 
> 
> And so on, mapping out the structure of links in the website. 
> 
> It seemed to me that this ought to be something that is scriptable - most 
> likely using wget or something... I have been experimenting with wget, 
> however I have not been able to determine a way of just getting the url's as
> 
> opposed to actually downloading the entire page... 
> 
> Does anyone know if wget can be used just to map out the tree of url's in a 
> given website, as opposed to fully downloading and mirroring the entire 
> website? 
> 
> I've been pouring over the wget manual, but to no avail... is there a 
> similar command that is more appropriate to what I am trying to do? 
> 
> thanks heaps guys 
> 
> David. 
> 
> _______________________________________________
> PLUG discussion list: plug at plug.linux.org.au
> http://mail.plug.linux.org.au/cgi-bin/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au
>