[plug] mapping out a website

David Buddrige buddrige at wasp.net.au
Thu Jun 10 13:35:59 WST 2004


Hi all, 

I have been asked to map out all the pages in a given intranet website.  So 
for example, given website url: 

http://abc.com/ 

They want a list of every url that can be got at from the links on the 
initial page, sort of like this: 

http://abc.com/
  http://abc.com/page1.html
  http://abc.com/page2.html
     http://abc.com/page2a.html
     http://abc.com/page2b.html
     http://abc.com/page2c.html
  http://abc.com/page3.html
  http://abc.com/page4.html
     http://abc.com/page4a.html
     http://abc.com/page4b.html
        http://abc.com/page4b1.html
     http://abc.com/page4c.html 

And so on, mapping out the structure of links in the website. 

It seemed to me that this ought to be something that is scriptable - most 
likely using wget or something... I have been experimenting with wget, 
however I have not been able to determine a way of just getting the url's as 
opposed to actually downloading the entire page... 

Does anyone know if wget can be used just to map out the tree of url's in a 
given website, as opposed to fully downloading and mirroring the entire 
website? 

I've been pouring over the wget manual, but to no avail... is there a 
similar command that is more appropriate to what I am trying to do? 

thanks heaps guys 

David. 




More information about the plug mailing list