[plug] Virtual Filesystems & CMS

Fri Jun 11 14:05:20 WST 2004

Rather than some kind of spiffy content management, the Debian web
site uses WML to generate static HTML from WML source (which is
basically HTML + define-your-own-tags + #include + embedded perl).
The PLUG web site does the same ;-)  If that kind of thing appeals to
you, you might want to take a look at http://www.thewml.org/

Trevor Phillips wrote:
| Version control - how scalable and useful is an existing system (eg; 
| Subversion) to this sort of file repository? Thousands of files. Mostly HTML, 
| but with large pockets of binary files (images, PDFs). Is it worth using some 
| other system? Or should I just do my own? It wouldn't be too hard. Maybe have 
| a directory based on the file name, and store snapshot versions of the file 
| in the dir?

The last time I used Subversion, attempting to store Really Huge
amounts of data (the two things that I tried were a Linux kernel
source tree and a ~1GB subset of my digital camera photos) would bring
it to its knees.  Subversion would start taking up hundreds of MB of
RAM to do the import (can't remember whether this was the server or
client or both) and checking the repository.

I normally advocate GNU Arch (http://wiki.gnuarch.org/) whenver SVN is
mentioned but I'm not sure it'd be really appropriate for what you
want.  Arch has a lot of nice features, but a lot of them are less
important when you're really just using it as a versioned file system
(rather than taking advantage of its distributed development
features).  I'm also not sure how well it scales: there are periodic
complaints about its performance and suggestions about tweaking it on
the gnu-arch-users mailing list, and depending on what guarantees you
can make about the software writing to your web space you it has been
claimed that it's much faster than the competition.  We're using it
for the PLUG web site, but that's on a much smaller scale (we
currently have 614 files in the output/ tree, summing to 13 Mb).

Also, Arch isn't particularly efficient at storing changes to large
binary files (it stores a complete copy of the file rather than rdiffs
or xdeltas or similar).  Whether this is relevant to you depends on
what kinds of changes (and how often they're made) you see to your
binary files.  Images aren't likely to benefit from xdeltas, and I
would imagine that even PDFs, because they are compressed, wouldn't
get much benefit from it.

I'm using rsync and cron to make (and expire) periodic snapshots of
important stuff at work.  I could give you the shell scripts that I
use, but there's little point as this article explains stuff quite
well --

        http://www.mikerubel.org/computers/rsync_snapshots/index.html

| URI Transparency - One of my pet hates is Web-apps which pretend to be a 
| website, yet have the ugliest CGI-style URLs. Ugh! It's unnecessary! (Sorry - 
| bit of a Rant of mine.)

I hate ugly URLs too! :-)  I think this one also ties in with your
"keeping everything in a database" complaint...

Cameron.