jeudi 12 juillet 2012

Debugging with strace

There are times when your software is failing but it doesn't output any error messages. Maybe you are getting a 500 error when you load a cgi script but the log is not telling you why. Strace can often help in situations like this.

Strace does not trace the internals of your application, it only knows about syscalls, or calls into the kernel. This can be enough in many cases to show you when your application might be trying to look for a file that is not there, or when it cannot allocate memory or some other error.

Generally I call strace with option -f so it will follow child processes, and then the name of the command to trace. To save the output to a file, use -o.

Code:
[josephw@myhost ~]$ strace -f -o strace.out ls /
bin   dev  home    lib         media  mnt  proc  sbin     space  sys  usr
boot  etc  initrd  lost+found  misc   opt  root  selinux  srv    tmp  var


[josephw@myhost ~]$ less strace.out
6368  execve("/bin/ls", ["ls", "/"], [/* 17 vars */]) = 0
6368  uname({sys="Linux", node="dss03.spry.com", ...}) = 0
6368  brk(0)                            = 0x8cc5000
6368  access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
6368  open("/etc/ld.so.cache", O_RDONLY) = 3
6368  fstat64(3, {st_mode=S_IFREG|0644, st_size=36486, ...}) = 0
6368  old_mmap(NULL, 36486, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f8f000
6368  close(3)                          = 0
6368  open("/lib/tls/librt.so.1", O_RDONLY) = 3

....output trimmed....
The first column shows the process id, and then you see the syscall and its result. Here it opens /etc/ld.so.cache in readonly mode, and the result is file descriptor 3. After this it calls fstat64 and old_mmap on file descriptor 3. Then it closes file descriptor 3. Once it has been closed, it can be reused as you see in the next call where it opens /lib/tls/librt.so.1.

In cases where you want to trace a process that is already running, pass the -p option to strace. I will run strace on Apache.
Code:
[josephw@myhost ~]$ ps aux |grep http
apache    6457  0.0  0.5 18764 5456 ?        S    Oct31   0:00 /usr/sbin/httpd
apache    6498  0.0  0.5 18764 5464 ?        S    Oct31   0:00 /usr/sbin/httpd
apache    6500  0.0  0.5 18768 6000 ?        S    Oct31   0:00 /usr/sbin/httpd
apache    6505  0.0  0.5 18768 6080 ?        S    Oct31   0:00 /usr/sbin/httpd
apache    6507  0.0  0.5 18764 5464 ?        S    Oct31   0:00 /usr/sbin/httpd
Here I have the process IDs for apache running on my box. Which one to trace? Pick one, then keep reloading your web site until the process you are tracing is the one that serves up your page. If the process does not belong to you, you will probably need to be root in order to trace it.
Code:
[root@myhost ~]# strace -f -o strace.out -p 6457 &
[root@myhost ~]# tail -f strace.out

....output trimmed....

21361 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT
 (No such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2", 0xbfeca780) = -1 ENOENT (No su
ch file or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No
such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686", 0xbfeca780) = -1 ENOENT (No such fi
le or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No
such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2", 0xbfeca780) = -1 ENOENT (No such fi
le or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls", 0xbfeca780) = -1 ENOENT (No such file or
 directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No
 such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2", 0xbfeca780) = -1 ENOENT (No such f
ile or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such
 file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686", 0xbfeca780) = -1 ENOENT (No such file o
r directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such
 file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2", 0xbfeca780) = -1 ENOENT (No such file o
r directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so", O_RDONLY) = 3

....output trimmed....
I ran strace in the background by using & so I could view the output of strace.out with tail -f. This way I could know when my process was the one that was doing the work. Then I reloaded my site about 5 times until it cycled around to process 6457.

In the beginning of the output shown, it may look like there is an error, because it keeps showing "No such file or directory" while trying to load libperl.so. But you can see that it is looking for the library in multiple paths and eventually it does find it in the directory /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/. In this case, the page was loaded without any errors. I will do it again, but this time I will try to load a page that does not exist, in this case http://10.2.4.15/test.
Code:
....output trimmed....

6457  lstat64("/usr/local/www/test", 0xbffb3104) = -1 ENOENT (No such file or directory)

....output trimmed....

6457  read(18, 0xbffb3270, 512)         = -1 ECONNRESET (Connection reset by peer)
6457  close(18)                         = 0
6457  read(5, 0xbffb347f, 1)            = -1 EAGAIN (Resource temporarily unavailable)
6457  semop(7307267, 0x1c9740, 1 <unfinished ...>
It tried to find /usr/local/www/test but could not so it failed in the end.
 
Source : http://forums.spry.com/howtos/1502-debugging-strace.html