[plug] mapping out a website
David Buddrige
buddrige at wasp.net.au
Thu Jun 10 13:35:59 WST 2004
Hi all,
I have been asked to map out all the pages in a given intranet website. So
for example, given website url:
http://abc.com/
They want a list of every url that can be got at from the links on the
initial page, sort of like this:
http://abc.com/
http://abc.com/page1.html
http://abc.com/page2.html
http://abc.com/page2a.html
http://abc.com/page2b.html
http://abc.com/page2c.html
http://abc.com/page3.html
http://abc.com/page4.html
http://abc.com/page4a.html
http://abc.com/page4b.html
http://abc.com/page4b1.html
http://abc.com/page4c.html
And so on, mapping out the structure of links in the website.
It seemed to me that this ought to be something that is scriptable - most
likely using wget or something... I have been experimenting with wget,
however I have not been able to determine a way of just getting the url's as
opposed to actually downloading the entire page...
Does anyone know if wget can be used just to map out the tree of url's in a
given website, as opposed to fully downloading and mirroring the entire
website?
I've been pouring over the wget manual, but to no avail... is there a
similar command that is more appropriate to what I am trying to do?
thanks heaps guys
David.
More information about the plug
mailing list