[plug] wget query

Nick Bannon nick at ucc.gu.uwa.edu.au
Fri Sep 3 18:11:22 WST 1999


On Fri, Sep 03, 1999 at 05:01:24PM +0800, Matt Kemner wrote:
> On Fri, 3 Sep 1999, Bret Busby wrote:
> > [wget] is quite useful; except where a path includes extended ASCII
> > characters, such as the tilde.
> 
> I've never had a problem with wget and websites containing ~
> Can you let me know (either in private or on the list) what website you
> are trying to download, and what errors wget is giving you?
[...]

This is very good advice.

FWIW, the way I usually use wget is ;
	wget -m --no-parent <URL>

-m for mirror, which implies the recursion, etc, --no-parent so it starts
in that location and works down, and doesn't start trying to download
the whole site. (unless I give it a URL of the whole site)

I have given it URL's with ~'s in it plenty of times (ie a user home
directory), and it downloads them fine, but, yes, it does convert them
into %7E .

Hence ;
wget -m --no-parent http://www.ucc.gu.uwa.edu.au/~nick/test/

produces the directory www.ucc.gu.uwa.edu.au ;
	containing the subdirectory %7Enick ;
		containing the subdirectory test ;
			containing the files index.html
					     file1.html
					     file2.html
					     file3.html

The reason is that URL's are tightly defined and can't just contain
any old character. "Special" characters, including ~, are escaped, or
"stuffed" by sending %, then the ASCII value of that character in hex.
If you needed to send %, you'd have to send %25 .

For the full details, refer to RFC-2396.

Nick.

-- 
  Nick Bannon  | "I made this letter longer than usual because
nick at it.net.au | I lack the time to make it shorter." - Pascal


More information about the plug mailing list