Debian with Automated Snapper Rollbacks is a short tutorial about setting up a Debian linux system with automated BTRFS snapshots of the system and easy rollback to previous auto-generated snapshots. Once it's setup, it'll automatically take pre/post snapshots when you run `apt` and you can boot them from grub.
Monday, July 10. 2023
Debian Apt Btrfs Auto-Snapshot Retrofit
Sunday, September 15. 2019
linux: serious corruption issue with btrfs
From Debian Bug report logs - #940105:
There were some reports over the last weeks from users on linux-btrfs which suffered from catastrophic btrfs corruption.
The bug which is apparently a regression introduced in 5.2 has now been found[0] an a patch is available[1].
Since it's unclear how long it will take to be part of a stable release and when Debian will pick this up in unstable, please consider to cherry-pick the patch.
- [0] lore.kernel.org
- [1] patchwork.kernel.org
Sunday, September 24. 2017
BTRFS on Debian
Debian has a BTRFS Wiki. One item there, which affected me, is that kernel 4.11 has issues and will cause corruption. I am now on kernel 4.12. I'm not sure if having duplicated metadata would have prevented some of the pain of recovery. To see if metadata is redundant:
btrfs fi df / Data, single: total=14.00GiB, used=12.63GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=520.00MiB, used=317.27MiB GlobalReserve, single: total=31.22MiB, used=0.00B
This is on laptop with a single ssd. It has been written elsewhere, that even if metadata duplication is requested, the ssd may deduplicate it anyway.
So... regular maintenance and scanning is recommended.
For maintenance, the wiki article suggests regular defragmentation (the -t 32M is not needed since Debian 9 (Stretch):
sudo ionice -c idle btrfs filesystem defragment -f -t 32M -r $PATH
The -f parameter is recommended for flushing after each file, particularly when there are snapshots or reflinked files.
One way to find btrfs formatted file systems:
# grep btrfs /etc/fstab UUID=b5714bf3-eec4-431d-8e3e-6b062f7e5c55 / btrfs noatime,nodiratime 0 0 UUID=affc8ed9-c1c0-403d-8ba1-b8ca68d2d7d7 /var btrfs noatime,nodiratime 0 0 UUID=b662aa71-5b72-4028-a10a-e286c56b87cf /home/rpb btrfs noatime,nodiratime 0 0
To check for errors:
# btrfs dev stats /home [/dev/nvme0n1p2].write_io_errs 0 [/dev/nvme0n1p2].read_io_errs 0 [/dev/nvme0n1p2].flush_io_errs 0 [/dev/nvme0n1p2].corruption_errs 0 [/dev/nvme0n1p2].generation_errs 0
To manually initiate an online scrub and monitor status:
# btrfs scrub start /mnt scrub started on /mnt, fsid ab27f528-d417-4ff9-9eb4-b59ad940290f (pid=14535)
# btrfs scrub status /mnt scrub status for ab27f528-d417-4ff9-9eb4-b59ad940290f scrub started at Sun Sep 24 19:55:56 2017, running for 00:00:10 total bytes scrubbed: 2.08GiB with 0 errors
A scrub with detailed results running in foreground:
# btrfs scrub start -B -d -R / scrub device /dev/nvme0n1p2 (id 1) done scrub started at Sun Oct 8 12:16:54 2017 and finished after 00:00:05 data_extents_scrubbed: 373524 tree_extents_scrubbed: 20306 data_bytes_scrubbed: 13566894080 tree_bytes_scrubbed: 332693504 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 25579 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 15590227968
Useful BTRFS pages:
- archlinux with an entry on doing a btrfs scrub using a timer service
- Marc's Public Blog - Linux Btrfs Blog Posts: with some entries about mouting a system with errors and bypassing checksum problems.
- Working with btrfs and common troubleshooting by the Container Linux people.
Monday, April 6. 2015
BTRFS Troubleshooting
Have had couple instances where the user-interface (KDE) of my Linux workstation, which is based upon Debian Testing / Jessie, has become non-responsive. Yet, I was still able to SSH into the machine. I see systemd, some IRQ processes, and VirtualBox had high utilization. Both or all three times, I can't remember the count now, the issue occurred when debugging a program I've been writing which uses OpenGL. At the same time, I had a VirtualBox running with Windows 10 running. So there were many things running, any of which might cause issues. It was probably OpenGL related, but have not yet come up with a mechanism of proving this one way or another.
I am also running BTRFS on the machine. In looking general BTRFS and NFS configurations, I saw the mailing list article at: BTRFS hangs - possibly NFS related?. In that article, a couple of troubleshooting commands are shown. They represent sysrq flags. I will have to examine them if/when my issue re-asserts itself:
echo 1 > /proc/sys/kernel/sysrq echo w > /proc/sysrq-trigger dmesg
The results may show 'SysRq : Show Blocked State' entries. These will be places to further examine for issues.
In the same article, some other things to think about:
- With the right tools CPU/load can be categorized into several areas, low- priority/niced, normal, kernel, IRQ, soft-IRQ, IO-wait, steal, guest, although steal and guest are VM related (steal is CPU taken by the hypervisor or another guest if measured from within a guest, and thus not available to it, guest is of course guests, when measured from the hypervisor) and will be zero if you're not running them, and irq and soft-irq won't show much either in the normal case. And of course niced doesn't show either unless you're running something niced.
- or simply use the alt-srq-w combo if you're on x86 and have it available, there's more about magic-srq in the kernel's Documentation/ sysrq.txt file)
- If you don't have a tool that shows all that, one available tool that does is htop. It's a "better" top, ncurses/semi-gui-based so run it in a terminal window or text-login VT.
- Of course you can see which threads are using all that CPU-time "load" that isn't, while you're at it.
- Also check out iotop, to see what processes are actually doing IO and the total IO speed. Both these tools have manpages...
A work around for the original poster's problem was to use:
btrfs filesystem sync /mnt/btrfs
He even went so far as to put that into the crontab and ran once a minute.
Some BTRFS documentation and help links:
- Oracle Tech Network Article: How I Use the Advanced Capabilities of Btrfs by Margaret Bierman with Lenz Grimmer