Debugging Kernel hard locks with netconsole

There is already plenty of literature in this subject, I just thought I would throw in my vote.

Currently I am building a pair of kernel tools. As i have learned, debugging more complex kernel modules could be considered an art of it's own. A rather large and fuzzy section of my code is causing a hard lock of my entire test system. Unfortunately because this is a hard lock (entire kernel is in an corrupted and unrecoverable state) none of my debugging output makes it to the daemons responsible for displaying them before the entire system grinds to a screeching halt.

After some googling I came to the conclusion that I needed an external logging mechanism, a highly-recommended one being to use a null-modem serial cable and tell the test kernel to output all messages to the serial port as well. Long story short, after a trip to The Source, about 4 hours of troubleshooting and $30 later my dev machine was inexplicably still not outputting system logs.

Shortly before giving up for the day, I found several articles about a kernel module called "netconsole". This module basically works very low in the network card driver stack to spew kernel message out onto the network via UDP. After less than 30 minutes I had this tool working beautifully and identified the line of code causing the lockup.

There are several tutorials on how to use this so I wont get too into it, but essentially I only had to do three simple steps.

Simple Procedure (Debian)

1. Set the options for the module (port/ip/ etc. we want to send messages to/from on the network) on my test machine. I did this by adding a file in /etc/modules.d called netconsole.conf which looked something like this (I run Debian testing by the way):
options netconsole netconsole=<from_port>@<from_ip>/<out_eth_port>,<to_port>@<to_ip>/<to_mac_addr>
2. Load the module, you may also want to set the module to load on start if you have not already.
> modprobe netconsole
3. Log data from remote computer. On the machine with the IP/mac you specified as the "to" location, use netcat to dump out what you get on that port:
> nc -l -u -p <to_port>
At this point you should be getting kernel messages from the other machine!

Firewall

If you aren't getting any messages, make sure your firewall isn't blocking the port you are using to send the logs.

Conclusion

In conclusion, netconsole is very effective at what it does, is simpler and cheaper to setup than using a serial console. The only downside I can see is if you are trying to debug a network card driver. All-in-all, super useful.

References