[plug] xfs errors?

Sat Jul 16 20:20:18 AWST 2022

Yeah, XFS will do things like try to put an entire tree for the disk into
ram. It's sometimes impossible to run xfs_repair without a bigger disk for
swap overflow. (Been there, fsck'd that).

If it's not making sounds, and you physically swap the drives - deep dive
into your journalling options but it's probably just truncated a bunch of
small files after a few bad unmounts / unsynced disconnects.

It *will* truncate files in it's default config on most kernels, but it
knows it has so it's often not a big deal.

I can't remember if it will try to reuse those sectors if it gets crammed
for space, I suspect it won't until they have been verified clear by an
xfs_repair.

But if it's a big drive with a lot of files, it's gonna be a little while.
The algorithm is likely single threaded and aiming for correctness rather
then throughput efficiency.

/Chris

On Sat, 16 July 2022, 8:10 pm Brad Campbell, <brad at fnarfbargle.com> wrote:

> On 16/7/22 20:03, Chris Hoy Poy wrote:
> > Yeah.
> >
> > XFS dumps a lot more detail out about this stuff. I've had good luck
> recovering files from xfs when it hits this point.
> >
> > The bad sounds are the worrying indicator, nothing good ever comes of
> that.
>
> Oh, when I said scary sounding it was in reference to the xfs_repair
> output. The drive is physically fine and passes a SMART long once a week.
> Every drive in every system I have/maintain gets at least a full media
> check weekly. All RAID gets a full scrub monthly. As they say, once caught.
>
> I'm more concerned in these errors xfs_repair is being vocal about. I've
> had to bring extra swap on line now, as it's eaten all 64G of physical RAM
> and is now >20G into the swap.
> Thankfully I had some spare space on a reasonably quick nvme because it's
> hitting that hard.
>
> Now I understand why xfs_repair died with a segfault when trying to run it
> on a 4G Raspberry Pi.
>
>
> >
> > If you haven't been running regular scrubs , and the volume is not full
> - then some bad sectors have turned up on old remnants , lucky you.
> Sometimes it's hardware, sometimes it's software doing dumb things or disks
> being disconnected at the wrong time. XFS is a journalling system, but
> often it's only journalling metadata , not full data. That's generally
> enough? "It depends".
> >
> > A regular read scrub is never terrible, as disk sectors will die
> silently until you need them.
> >
> > I've also had a few experiences where xfs drives have dropped a bunch of
> bad sectors, which the drive has remapped, and xfs_repair fixed the issues
> and the drive has been fine for years.
> >
> > Would I trust the drive with critical data? No. Redundancy is your
> friend.
> >
> > XFS and ext4 are among the two most well tested and utilised file
> systems on the kernel.org <http://kernel.org> infra, but spurious
> hardware problems are not unknown and sometimes meaningless. Doesn't mean
> you can trust the drive :-) (ugh drives. So untrustworthy to start with).
> >
> > /Chris
> >
> >
> > On Sat, 16 July 2022, 7:40 pm Brad Campbell, <brad at fnarfbargle.com
> <mailto:brad at fnarfbargle.com>> wrote:
> >
> >     G'day All,
> >
> >     Back in 2020 I did a bit of a shootout between ext4 and xfs for an
> rsync rotating backup repository.
> >     Hedging bets I ended up with one 4TB drive with each and they've
> been doing nightly backups since ~Feb 2020.
> >
> >     Let me be clear here  : * I'm not having issues with either. *
> >
> >     As in, the backups work, all files appear coherent, I've had no
> reports of problems from the kernel and frankly it all looks good.
> >
> >     Last night I unmounted both drives and ran e2fsck and xfs_repair
> respectively just as a "Let's see how it's all doing".
> >
> >     e2fsck ran to completion without an issue. xfs_repair has been
> spitting out errors constantly for about the last 18 hours.
> >
> >     Fun stuff like : entry at block 214 offset 176 in directory inode
> 1292331586 has illegal name "/606316974.14676_0.srv:2,a": entry at block
> 214 offset 216 in directory inode 1292331586 has illegal name
> "/606318637.23354_0.srv:2,a": entry at block 214 offset 256 in directory
> inode 1292331586 has illegal name "/606318639.23364_0.srv:2,a": entry at
> block 214 offset 296 in directory inode 1292331586 has illegal name
> "/606318640.23369_0.srv:2,a": entry at block 214 offset 336 in directory
> inode 1292331586 has illegal name "/606318646.23391_0.srv:2,a": entry at
> block 214 offset 376 in directory inode 1292331586 has illegal name
> "/606319148.26097_0.srv:2,a": entry at block 214 offset 416 in directory
> inode 1292331586 has illegal name "/606319150.26107_0.srv:2,a": entry at
> block 214 offset 456 in directory inode 1292331586 has illegal name
> "/606319152.26158_0.srv:2,a": entry at block 3 offset 3816 in directory
> inode 1292331587 has illegal name "/606350201.7742_1.srv:2,Sa": entry
> >     at block 3 of
> >      fset 3856 in directory inode 1292331587 has illegal name
> "/606369099.14439_1.srv:2,Sa": imap claims a free inode 1292346502 is in
> use, correcting imap and clearing inode
> >     cleared inode 1292346502
> >     imap claims a free inode 1292439884 is in use, correcting imap and
> clearing inode
> >     cleared inode 1292439884
> >     imap claims a free inode 1292442224 is in use, correcting imap and
> clearing inode
> >     cleared inode 1292442224
> >
> >     It started with a continuous whine about indoes with bad magic and
> lots of scary sounding stuff during stage 3 and has settled down to this in
> stage 4.
> >
> >     From the file names I'm seeing, I suspect they're deleted files and
> directories. As you'd imagine, 2 and a half years of rotating backups sees
> lots of stuff added, linked and deleted.
> >
> >     I can stop xfs_repair, mount and check the filesystem contents. It
> all looks good. When I unmount and re-run xfs_repair it pretty much picks
> up where it left off. I've had to add an extra 32G of ram in the machine
> and even then I've had to limit xfs_repair to ~58G because it was using all
> 64G of ram and heading towards 20G of swap.
> >
> >     I'm new at xfs. Generally when e2fsck reports anything like this the
> filesystems is toast. In this case I can't find anything missing or
> corrupt, but xfs_repair is going bonkers.
> >
> >     This is an xfs V4 filesystem, and I've upgraded to xfsprogs 5.18,
> but it's all the same really.
> >
> >     I've made an emergency second backup of the systems this drive was
> backing up in case it all goes south but despite the spew of errors the
> actual filesystem looks perfectly fine. Has anyone seen anything similar?
> >
> >     Regards,
> >     Brad
> >     _______________________________________________
> >     PLUG discussion list: plug at plug.org.au <mailto:plug at plug.org.au>
> >     http://lists.plug.org.au/mailman/listinfo/plug <
> http://lists.plug.org.au/mailman/listinfo/plug>
> >     Committee e-mail: committee at plug.org.au <mailto:
> committee at plug.org.au>
> >     PLUG Membership: http://www.plug.org.au/membership <
> http://www.plug.org.au/membership>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20220716/c1f2937b/attachment.html>