[plug] xfs errors?

Brad Campbell brad at fnarfbargle.com
Sat Jul 16 20:09:39 AWST 2022


On 16/7/22 20:03, Chris Hoy Poy wrote:
> Yeah.
> 
> XFS dumps a lot more detail out about this stuff. I've had good luck recovering files from xfs when it hits this point. 
> 
> The bad sounds are the worrying indicator, nothing good ever comes of that.

Oh, when I said scary sounding it was in reference to the xfs_repair output. The drive is physically fine and passes a SMART long once a week. Every drive in every system I have/maintain gets at least a full media check weekly. All RAID gets a full scrub monthly. As they say, once caught.

I'm more concerned in these errors xfs_repair is being vocal about. I've had to bring extra swap on line now, as it's eaten all 64G of physical RAM and is now >20G into the swap.
Thankfully I had some spare space on a reasonably quick nvme because it's hitting that hard.

Now I understand why xfs_repair died with a segfault when trying to run it on a 4G Raspberry Pi.


> 
> If you haven't been running regular scrubs , and the volume is not full - then some bad sectors have turned up on old remnants , lucky you. Sometimes it's hardware, sometimes it's software doing dumb things or disks being disconnected at the wrong time. XFS is a journalling system, but often it's only journalling metadata , not full data. That's generally enough? "It depends".
> 
> A regular read scrub is never terrible, as disk sectors will die silently until you need them.
> 
> I've also had a few experiences where xfs drives have dropped a bunch of bad sectors, which the drive has remapped, and xfs_repair fixed the issues and the drive has been fine for years.
> 
> Would I trust the drive with critical data? No. Redundancy is your friend. 
> 
> XFS and ext4 are among the two most well tested and utilised file systems on the kernel.org <http://kernel.org> infra, but spurious hardware problems are not unknown and sometimes meaningless. Doesn't mean you can trust the drive :-) (ugh drives. So untrustworthy to start with).
> 
> /Chris
> 
> 
> On Sat, 16 July 2022, 7:40 pm Brad Campbell, <brad at fnarfbargle.com <mailto:brad at fnarfbargle.com>> wrote:
> 
>     G'day All,
> 
>     Back in 2020 I did a bit of a shootout between ext4 and xfs for an rsync rotating backup repository.
>     Hedging bets I ended up with one 4TB drive with each and they've been doing nightly backups since ~Feb 2020.
> 
>     Let me be clear here  : * I'm not having issues with either. *
> 
>     As in, the backups work, all files appear coherent, I've had no reports of problems from the kernel and frankly it all looks good.
> 
>     Last night I unmounted both drives and ran e2fsck and xfs_repair respectively just as a "Let's see how it's all doing".
> 
>     e2fsck ran to completion without an issue. xfs_repair has been spitting out errors constantly for about the last 18 hours.
> 
>     Fun stuff like : entry at block 214 offset 176 in directory inode 1292331586 has illegal name "/606316974.14676_0.srv:2,a": entry at block 214 offset 216 in directory inode 1292331586 has illegal name "/606318637.23354_0.srv:2,a": entry at block 214 offset 256 in directory inode 1292331586 has illegal name "/606318639.23364_0.srv:2,a": entry at block 214 offset 296 in directory inode 1292331586 has illegal name "/606318640.23369_0.srv:2,a": entry at block 214 offset 336 in directory inode 1292331586 has illegal name "/606318646.23391_0.srv:2,a": entry at block 214 offset 376 in directory inode 1292331586 has illegal name "/606319148.26097_0.srv:2,a": entry at block 214 offset 416 in directory inode 1292331586 has illegal name "/606319150.26107_0.srv:2,a": entry at block 214 offset 456 in directory inode 1292331586 has illegal name "/606319152.26158_0.srv:2,a": entry at block 3 offset 3816 in directory inode 1292331587 has illegal name "/606350201.7742_1.srv:2,Sa": entry
>     at block 3 of
>      fset 3856 in directory inode 1292331587 has illegal name "/606369099.14439_1.srv:2,Sa": imap claims a free inode 1292346502 is in use, correcting imap and clearing inode
>     cleared inode 1292346502
>     imap claims a free inode 1292439884 is in use, correcting imap and clearing inode
>     cleared inode 1292439884
>     imap claims a free inode 1292442224 is in use, correcting imap and clearing inode
>     cleared inode 1292442224
> 
>     It started with a continuous whine about indoes with bad magic and lots of scary sounding stuff during stage 3 and has settled down to this in stage 4.
> 
>     From the file names I'm seeing, I suspect they're deleted files and directories. As you'd imagine, 2 and a half years of rotating backups sees lots of stuff added, linked and deleted.
> 
>     I can stop xfs_repair, mount and check the filesystem contents. It all looks good. When I unmount and re-run xfs_repair it pretty much picks up where it left off. I've had to add an extra 32G of ram in the machine and even then I've had to limit xfs_repair to ~58G because it was using all 64G of ram and heading towards 20G of swap.
> 
>     I'm new at xfs. Generally when e2fsck reports anything like this the filesystems is toast. In this case I can't find anything missing or corrupt, but xfs_repair is going bonkers.
> 
>     This is an xfs V4 filesystem, and I've upgraded to xfsprogs 5.18, but it's all the same really.
> 
>     I've made an emergency second backup of the systems this drive was backing up in case it all goes south but despite the spew of errors the actual filesystem looks perfectly fine. Has anyone seen anything similar?
> 
>     Regards,
>     Brad
>     _______________________________________________
>     PLUG discussion list: plug at plug.org.au <mailto:plug at plug.org.au>
>     http://lists.plug.org.au/mailman/listinfo/plug <http://lists.plug.org.au/mailman/listinfo/plug>
>     Committee e-mail: committee at plug.org.au <mailto:committee at plug.org.au>
>     PLUG Membership: http://www.plug.org.au/membership <http://www.plug.org.au/membership>
> 



More information about the plug mailing list