[plug] .xsession-errors
Craig Ringer
craig at postnewspapers.com.au
Thu Dec 2 12:08:51 WST 2004
On Thu, 2004-12-02 at 11:36, Matt Kemner wrote:
> Handy tip: You can truncate files with the > operand
>
> ie bash:~# > .xsession-errors
>
> This will overwrite the file with a 0 length one, instantly saving you the
> space.
Even more interestingly, programs that still have it open will be able
to append to the file fine at whatever position the end was before you
truncated it. The gap will be a hole in the file, so your
.xsession-errors will be a sparse file.
Generally that's no big deal, but can get somewhat confusing when using
text processing tools. A sparse file has a 'size' that's larger than the
actual disk space used:
[craig at rasputin craig]$ dd if=/dev/zero bs=1M seek=1000 of=sp count=1
1+0 records in
1+0 records out
[craig at rasputin craig]$ du -h sp
1.1M sp
[craig at rasputin craig]$ ls -l sp
-rw-rw-r-- 1 craig craig 1049624576 Dec 2 11:44 sp
Here's a Python example that shows how a file can continue writing at
the old position even after a file is truncated, and how the disk space
still gets saved. While all this happens in one process, it's just as
easy for another process (say, 'echo'), to truncate the file. This is
pasted from an interactive session, though I've added comments after the
fact:
# First, open our pretend log for writing
>>> writer = file("log", "w")
# and write a bunch of dummy data to it
>>> writer.writelines( ["."*79+"\n"]*1000 )
# then make sure the buffer is written out to disk
>>> writer.fsync()
# and show the position we're at in the file. The next write()
# call will add the data after this position in the file.
>>> writer.tell()
80000L
# Truncate the file to zero length
# note that if we did 'retval = os.system("> log")'
# it'd have the same effect. It doesn't matter what process
# truncates the file.
>>> writer.truncate(0)
# Note that we're still at the same position, even though the file
# is now zero bytes long.
>>> writer.tell()
80000L
# ... and takes up no disk space
>>> retval = os.system("du -h log")
0 log
# Now, append some text at the old position. Note that there's
# literally 'nothing' between position zero in the file and
# position 80,000
>>> writer.write("this comes after position 80,000 in the file\n")
>>> writer.flush()
# So, what happened?
# The file grew by the amount of data we wrote (45 characters):
>>> retval = os.system("ls -l log")
-rw-rw-r-- 1 craig craig 80045 Dec 2 11:50 log
# and at the place we said:
>>> writer.tell()
80045L
# so what the heck is in between position zero and 80,000? A hole.
This is a filesystem trick on *NIX that I really like. Files can be
"sparse" - that is, they can have holes in them that are treated as
giant runs of zeros, and aren't actually stored on disk. Space for the
data is allocated as real data is written. This means that you can, for
example, have four 40GB disk images for an OS emulator on a 20GB disk,
and only use the amount of space _actually_ used in those disk images.
(It's not that simple for OS disk images, but close enough for now).
It also means that you can truncate a log file while programs have it
open, free up the disk space, but still have them writing happily to the
file as if nothing had happened. Some programs are smart enough to
notice you've truncated the file and seek to the new end position, but
many aren't - and they don't have to care.
In case it ever matters, note that IIRC this will NOT work if the
program use mmap() to access the file - I'm pretty sure sparse files and
mmap() file access do not play well together.
This is also a reason why you should not rely on 'ls -l' or 'ls -s' to
tell you the truth about the disk space a file takes up, and use du
instead.
Well, I guess I've complicated a nice, simple process enough for one day
;-)
Even if you get nothing else out of this, remember that you can create a
sparse file with dd using the 'seek=' argument, so if you need a 20GB
file, you don't necessarily have to wait while all 20 GB copies from
/dev/zero and is written to disk.
--
Craig Ringer
More information about the plug
mailing list