Troubleshooting

A Quick and Practical Reference for tcpdump

When it comes to tcpdump most admins fall into two categories; they either know tcpdump and all of its flags like the back of their hand, or they kind of know it but need to use a reference for anything outside of the basic usage. The reason for this is because tcpdump is a pretty advanced command and it is pretty easy to get into the depths of how networking works when using it.

Using sysdig to Troubleshoot like a boss

If you haven’t seen it yet there is a new troubleshooting tool out called sysdig. It’s been touted as strace meets tcpdump and well, it seems like it is living up to the hype. I would actually rather compare sysdig to SystemTap meets tcpdump, as it has the command line syntax of tcpdump but the power of SystemTap. In this article I am going to cover some basic and cool examples for sysdig, for a more complete list you can look over the sysdig wiki.

Managing DNS locally with /etc/hosts

Before the advent of a distributed domain name system; networked computers used local files to map hostnames to IP addresses. On Unix systems this file was named /etc/hosts or “the hosts file”. In those days, networks were small and managing a file with a handful of hosts was easy. However as the networks grew so did the methods of mapping hostnames and IP addresses. In modern days with the internet totaling at somewhere around 246 million domain names (as of 2012) the hosts file has been replaced with a more scalable distributed DNS service.

EMC PowerPath: superblock could not be read

Recently while working on a system that uses EMC PowerPath, I ran into a little issue after rebooting. The Issue fsck.ext3: No such file or directory while trying to open /dev/emcpowera1 /dev/emcpowera1: The superblock could not be read or does not describe a correct ext2 filesystem. The Cause The root cause of this issue is pretty simple when a Linux system boots it performs file system checks on file systems listed within the /etc/fstab file.

Adding and Troubleshooting Static Routes on Red Hat based Linux Distributions

Adding static routes in Linux can be troublesome, but also absolutely necessary depending on your network configuration. I call static routes troublesome because they can often be the cause of long troubleshooting sessions wondering why one server can’t connect to another. This is especially true when dealing with teams that may not fully understand or know the remote servers IP configuration. The Default Route Linux, like any other OS has a routing table that determines what is the next hop for every packet.

Advanced Linux System Statistics and Diagnostics with SystemTap

In one of the first posts of this blog I covered some basic SystemTap functionality from an email that I sent to members of my team, but I have always felt that I haven’t given SystemTap as thorough of an article as this incredible tool deserves. Today I want to correct that. For today’s article I will show how to compile SystemTap scripts on one server while running the compiled module on a production server without installing debug-info or devel packages in production.

Troubleshooting High I/O Wait in Linux

Linux has many tools available for troubleshooting some are easy to use, some are more advanced. I/O Wait is an issue that requires use of some of the more advanced tools as well as an advanced usage of some of the basic tools. The reason I/O Wait is difficult to troubleshoot is due to the fact that by default there are plenty of tools to tell you that your system is I/O bound, but not as many that can narrow the problem to a specific process or processes.

When Zombies Invade Linux: What are Zombie Processes and What to do about them

Zombies don’t just appear in scary movies anymore, sometimes they also appear on your Linux systems; but don’t fret they are mostly harmless. What is a Zombie Process? Before we get started I wanted to first cover what exactly a Zombie process is. Linux and Unix both have the ability for a process to create a sub process otherwise known as a “Child Process”. Once a process creates a new sub process the first process then becomes a “Parent Process” as it has spawned a child process during its execution.

Linux Troubleshooting with strace

Today I want to cover one of the best troubleshooting tools in any sysadmins arsenal; strace. Strace is a command that will trace the system calls and signals from a specified command. What does that mean in layman’s terms? Strace will output all of the inner workings of a process you run it against. If a process opens a file or binds a port, strace will print that action; it is a great utility for troubleshooting when a process is not behaving as expected and you can’t find any reason in the commands output or log files.

How to check if a cron job ran

Cron is a time based scheduled task daemon that runs on most common Unix/Linux distributions. Because cronjobs are time based sometimes it is necessary to validate that the job ran at the scheduled time. Sometimes people will configure a cron to send the output of the script to a user via system mail or redirect the output to a file; however not all crons are setup the same and many times they may be configured to send output to /dev/null hindering any ability to validate the job ran.