Have had couple instances where the user-interface (KDE) of my Linux workstation, which is based upon Debian Testing / Jessie, has become non-responsive. Yet, I was still able to SSH into the machine. I see systemd, some IRQ processes, and VirtualBox had high utilization. Both or all three times, I can't remember the count now, the issue occurred when debugging a program I've been writing which uses OpenGL. At the same time, I had a VirtualBox running with Windows 10 running. So there were many things running, any of which might cause issues. It was probably OpenGL related, but have not yet come up with a mechanism of proving this one way or another.
I am also running BTRFS on the machine. In looking general BTRFS and NFS configurations, I saw the mailing list article at: BTRFS hangs - possibly NFS related?. In that article, a couple of troubleshooting commands are shown. They represent sysrq flags. I will have to examine them if/when my issue re-asserts itself:
echo 1 > /proc/sys/kernel/sysrq echo w > /proc/sysrq-trigger dmesg
The results may show 'SysRq : Show Blocked State' entries. These will be places to further examine for issues.
In the same article, some other things to think about:
- With the right tools CPU/load can be categorized into several areas, low- priority/niced, normal, kernel, IRQ, soft-IRQ, IO-wait, steal, guest, although steal and guest are VM related (steal is CPU taken by the hypervisor or another guest if measured from within a guest, and thus not available to it, guest is of course guests, when measured from the hypervisor) and will be zero if you're not running them, and irq and soft-irq won't show much either in the normal case. And of course niced doesn't show either unless you're running something niced.
- or simply use the alt-srq-w combo if you're on x86 and have it available, there's more about magic-srq in the kernel's Documentation/ sysrq.txt file)
- If you don't have a tool that shows all that, one available tool that does is htop. It's a "better" top, ncurses/semi-gui-based so run it in a terminal window or text-login VT.
- Of course you can see which threads are using all that CPU-time "load" that isn't, while you're at it.
- Also check out iotop, to see what processes are actually doing IO and the total IO speed. Both these tools have manpages...
A work around for the original poster's problem was to use:
btrfs filesystem sync /mnt/btrfs
He even went so far as to put that into the crontab and ran once a minute.
Some BTRFS documentation and help links:
- Oracle Tech Network Article: How I Use the Advanced Capabilities of Btrfs by Margaret Bierman with Lenz Grimmer