April, 2010

27
Apr 10

Unix signal programming in Ruby

I recently thought I would be able to use Unix signals to solve a problem in a Ruby program I’m writing. It turned out not to be workable, but was a fun journey into Unix signal handling and how they work (or don’t) with MRI.

I’m not a signals expert – corrections and opinions are very welcome! Also, I started to feel a bit out of my depth when patching the longjmp() calls. I’d love to find out more about this stuff.

SignalPhoto by David Blaikie

Brief signals primer

Signals are used to alert processes or threads about a particular event. Synchronous signals are usually the result of errors in executing some instruction (such as an illegal address reference) and are delivered to the thread that caused the error. Asynchronous signals are external to the execution context and are probably the ones you’re more familiar with – they can be sent between processes using things like kill or delivered when needed by the kernel.

When a signal is generated it is immediately put into the “pending” state. If the process has a thread that has not blocked signals of that type, it is delivered straight away. If that type of signal is blocked by all threads in the process, it remains pending until they are unblocked in one of the threads, at which point it is delivered immediately. Delivered signals can be ignored (often the default response) or processed by a signal handler. In Ruby, we define a SIGUSR1 handler like this:

Signal.trap("USR1") do
  puts "USR1 caught"
end

Why might we want to block signals?

Blocking signals is often used when we have a section of code that must not be interrupted. To enable this, each thread maintains a signal mask. This is the list of signal types that the thread is blocking, which we can examine and change using pthread_sigmask() (sigprocmask() in single-threaded programs). A new thread inherits the signal mask from the parent. However, each thread does not have its own set of signal handlers – these are shared throughout the process. Asynchronous signals that are delivered to the process can be processed by any thread that has not blocked those signals.

Signals in MRI

Unfortunately, MRI isn’t really very friendly to Unix programmers wanting to play with the signal mask, as we’ll see.

MRI defines the Signal module, that only contains two methods: Signal.trap and Signal.list, which provides the mapping of signal names to numbers for your platform. Since none of the other libc signal handling functions are defined, I created a library to provide them (and some other system calls sometime). syscalls is built using the lovely FFI library. This mirrors the libc functions closely, with a couple of Ruby-style shortcuts added.

Ruby 1.8

As Joe Damato found, MRI 1.8 with pthreads enabled is rather rt_sigprocmask() happy. It seemed obvious that all that mucking about with the signal mask would cause strange behaviour when blocking signal, but let’s see how:

require "syscalls/signal"

mask = Syscalls::Sigset_t.new.to_ptr
Syscalls.sigemptyset(mask)
Syscalls.sigaddset(mask, "USR1")

puts "Block and roll!"
Syscalls.sigprocmask(Syscalls::SIG_SETMASK, mask, nil)

puts "Looks fine so far - let's raise an exception..."

begin
  raise
rescue
end

puts "Aw-naw!"

What’s going on?

Using strace with ruby 1.8.6 (2009-08-04 patchlevel 383) [x86_64-linux] gives us:

write(1, "Block and roll!\n", 16Block and roll!
)       = 16
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [USR1], NULL, 8) = 0
write(1, "Looks fine so far - let's raise "..., 48Looks fine so far - let's raise an exception...
) = 48
rt_sigprocmask(SIG_BLOCK, NULL, [USR1], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [USR1], NULL, 8) = 0
write(1, "Aw-naw!\n", 8Aw-naw!
)                = 8

Line 3 is from Ruby calling getcontext – it is pretty harmless since it is passing SIG_BLOCK along with an empty set of signals, which adds nothing to the existing signal mask. Line 4 is our own call to sigprocmask – note we’re using SIG_SETMASK, which replaces the existing mask. So far, so good. However, on lines 7-9 Ruby stores the old mask (our SIGUSR1), replaces it with an empty mask and then immediately replaces that with our SIGUSR1 mask again.

But, the mask is only empty for a fraction of a second – I think I’ll be alright!

Think again! It’s tempting to think that this wouldn’t be a problem in most real-world situations, but you may recall that when a signal cannot be delivered because it’s blocked it is put into a pending state. When that signal type is unblocked, the signal is immediately delivered. This means there can actually be plenty of time to queue up a signal to cause a problem here.

The REE stuff below all applies to the other MRI 1.8 flavours I tested too – that’s 1.8.{6,7} on 64-bit Linux.

REE 1.8.7-2010.01

REE has Joe’s --disable-ucontext patch applied, which meant a lot fewer sigprocmask()s to wade through! In fact, it nearly worked – just our old SIG_SETMASK friend set during the exception handling:

write(1, "Block and roll!\n", 16Block and roll!
)       = 16
rt_sigprocmask(SIG_SETMASK, [USR1], NULL, 8) = 0
write(1, "Looks fine so far - let's raise "..., 48Looks fine so far - let's raise an exception...
) = 48
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "Aw-naw!\n", 8Aw-naw!
)                = 8

Time to dig around and see why Ruby is doing that.

Calling raise resulted in a call to rb_longjmp(), which appears to be a reimplementation of siglongjmp() (or _longjmp() – I don’t know which). This in turn calls rb_trap_restore_mask(), which sets the signal mask back to the mask that was stored when Ruby starts up or the last call to Signal.trap was made.

Patching Ruby 1.8 – including 1.8.6, 1.8.7 and REE

This might be dangerous or not even sensible. Let me know if you find out!

As far as I can tell, simply removing the call to rb_trap_restore_mask() shouldn’t break anything since the places that the trap_last_mask variable is set are very limited. It may not the best place for the fix (if rb_longjmp() is actually siglongjmp(), this might break the reimplementation), but it does at least appear to work.

Here’s the truly tiny patch.

Ruby 1.9

Ruby 1.9.1-p376 is a slightly more tricky case. As you’ll be aware, MRI 1.9 maps each Ruby thread to a native C thread and uses the GIL to ensure only one runs at any one time. The interpreter uses a thread to trigger an interrupt in order to schedule threads. This thread is created on initialisation of the interpreter and means that even very simple programs have two native threads running.

As we can see in the abbreviated strace below, the signal mask is empty when the timer thread is created and this will be inherited. This means that if we block a signal type in our main Ruby thread, they will still be able to be delivered and handled by the timer thread.

Below that we can see rb_trap_restore_mask() emptying the mask when it is called from rb_longjmp(). The sigaltstack() and following sigaction()call on lines 4-5 tell Ruby to handle segfaults on a different stack.

rt_sigaction(SIGHUP, {0x48b9f0, [], SA_RESTORER|SA_SIGINFO, 0x3deca0f0f0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, {0x48b9f0, [], SA_RESTORER|SA_SIGINFO, 0x3deca0f0f0}, {SIG_DFL, [], 0}, 8) = 0
...
sigaltstack({ss_sp=0x1b0baf0, ss_flags=0, ss_size=16384}, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
rt_sigaction(SIGSEGV, {0x48bce0, [], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x3deca0f0f0}, {SIG_DFL, [], 0}, 8)
...
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
...
clone(Process 23020 attached
child_stack=0x7f3b2c188ff0, flags=CLONE_VM|CLONE_FS| CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM| CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3b2c1899e0, tls=0x7f3b2c189710, child_tidptr=0x7f3b2c1899e0) = 23020
...
[pid 23019] write(1, "Block and roll!", 15Block and roll!) = 15
[pid 23019] write(1, "\n", 1
)           = 1
[pid 23019] rt_sigprocmask(SIG_SETMASK, [USR1], NULL, 8) = 0
[pid 23019] write(1, "Looks fine so far - let's raise "..., 47Looks fine so far - let's raise an exception...) = 47
[pid 23019] write(1, "\n", 1
)           = 1
[pid 23019] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 23019] write(1, "Aw-naw!", 7Aw-naw!)      = 7
[pid 23019] write(1, "\n", 1
)           = 1

Patching Ruby 1.9.1

First off, we need to mask all signals as soon as the timer thread is created. This makes sure that all signals can only be delivered to the main Ruby thread (until we create some more). I suppose this will actually slow down signal delivery by some small amount.

Ruby itself blocks some signals for a time to allow sections of code to run without interrupts, using rb_disable_interrupt() and rb_enable_interrupt(). These functions mask and unmask all signals. We need to make Ruby save the existing signal mask while it blocks all signals, then restore the old mask, rather than unblocking everything.

Here’s the patch. Again, I don’t think this has any negative side-effects, but it also might not be a good idea. Let me know if you find out!

Further reading


2
Apr 10

Airport antics

Sarah doesn’t like flying with me. It might not be very relaxing, but I am good at getting luggage through airports. Today, I was flying alone to meet her.

Arriving at Edinburgh I was glad I had plenty of time to spare, since the Ryanair check-in desk team took no prisoners. With two bags, one weighing 8kgs and one 21kgs, I was looking at £120 excess baggage charge, which didn’t please me much at all. Unfortunately, one bag was a telescope tripod wrapped in a camping mat with duct tape which made distribution difficult. The foot-under-the-scales trick worked nicely first time around and brought the second bag down to an apparent 17kgs, but I still had to ditch 2kgs. I duct taped my kilt shoe bag to the tripod and started pulling out heavy clothes.

On the second weighing I couldn’t hold my foot still enough and the bag weight was fluctuating wildly. The woman was sharp and started calling her supervisor over because the scales were ‘malfunctioning’. I decided to stop in case I got caught and was left with another 4kgs to ditch. The usual appeals to me being wee and skinny were ignored, of course – “Sorry, it’s Ryanair!” was the familiar response. Fifteen minutes later I escaped, feeling much more heavily laden then when I arrived.

Security were friendly, chatting about the metal in my back and the telescope in my hand luggage.

The plane boarding desk were tougher. I realised early that I was way over the hand baggage limit, with loads of clothes in plastic bags, two rucksacks and a pair of binoculars. I decided to go last in the queue to try to make the staff feel more sorry for me and hopefully more lenient. Sound reasoning, but the gamble didn’t pay off. The woman told me my first bag was fine, then £35 per bag after that and that I was carrying four bags (and that I had to hurry up)! Some onlookers in the next queue gasped and watched for my next move.

Only me left to get on the plane. Rapid-style I started re-packing, again.

As I started putting on clothes more people started watching and laughing. First a hoodie, then a kilt, one of Sarah’s jackets went on and one round my waist. The ensemble was topped off nicely with a kilt jacket and the binoculars stuck up my top. Feeling chuffed I went to pay my £35, boiling, sweating and with some giggles from the crowd only to be told it was cash only – “And you’ll have to REALLY run, because we’re already late!”.

As I sprinted to the cash machine (for some reason carrying both rucksacks!), the rest of the airport lounge seemed to enjoy the sight. It felt like miles. They enjoyed it on the return leg too!

At the desk I was hurried along, I paid and elatedly started to head through the doors as loud applause and laughter broke out from the lounge. I turned, smiled and gave it a bit of the “Fucking-yeeeeaahh!” style arms! Quality – what a way to board a plane, feeling like a superhero!

On the runway I kept hearing “Hurry!” and “Run, run!”. I pegged it up the steps and onto the plane to be greeted by more passengers laughing at the sight of me.

All really rather good fun.