2009-02-20

Fiber

I haven't been here for quite a while. That's simply because I'm not a blogging kind of guy.

Let's have a look at one of the new features in Ruby 1.9.

class Fiber
You can think of a fibre as of a separate thread, but not a thread that is running all the time in the background, but rather one that is responsible for some specialised tasks and is activated only to do some job and return some results.

Also, a fibre is not a thread.

Let's skip to some code:

FILE=__FILE__

require 'fiber'

reader=Fiber::new\
{
File::open(FILE){|f| f.each_line{|l| Fiber.yield(l) unless l.strip.empty?}}
}

puts "Let's get a line: #{reader.resume}"
puts "Let's get anothe line: #{reader.resume}"
puts "And the rest:"
puts reader.resume while reader.alive?


This produces the following output:

Let's get a line: FILE=__FILE__
Let's get anothe line: require 'fiber'
And the rest:
reader=Fiber::new\
{
File::open(FILE){|f| f.each_line{|l| Fiber.yield(l) unless l.strip.empty?}}
}
puts "Let's get a line: #{reader.resume}"
puts "Let's get anothe line: #{reader.resume}"
puts "And the rest:"
puts reader.resume while reader.alive?
#<File:0xb114bc>


So first: let's have a look at the fibre code (I use the word fibre and not fiber because I prefer the British English; the truth is each time I want to use the class
Fiber
I first spell it
Fibre
and must correct later; I'll make an alias one day). The fibre opens the file and calls
Fiber.yield
passing each consecutive nonempty line to it. Think of it like this: it creates a virtual array containing all the elements that you put there, and
Fiber.yield
is putting and element in it. So we have a virtual array containing all the nonempty lines of code from
__FILE__
.

Of course the array does not exist - it is only a way to imagine how the fibre works. This fact saves memory - you don't have to load all the lines to memory before accessing them. Think of how to write this simple program without a fibre (assume that the file you're going to display is like 1G and you cannot simply load it all to memory) - I'm quite sure that using a fibre is one of the best ways to do it.

Now, how do we read from this virtual array? To read an element we call
reader.resume
. Simple, isn't it? Analyse the output to see that it did what we had expected: first it printed the first line, then the second (nonempty) line, and then the rest.

There's only one mysterious thing at the end:
#<File:0xb114bc>
. The explanation is: the virtual array created by a fibre is filled by all calls to
Fiber.yield
, and when the fibre finishes, its final value (the value of the last operation within the fibre block) is also added to the array. In our case, the last (and only) operation is opening the file and
File::open
returns the created file stream, so it was also added to the array. One can like this feature or not, but one has to live with it. So if we didn't want this line of output, we can change the end of the code to this:
puts "And the rest:"
loop\
{
l=reader.resume
break unless reader.alive?
puts l
}

Now it works like it should. More lines of code but oh well.

fiber.resume(*args)
We've seen that if you pass an argument to
Fiber.yield
then it becomes the value of the
fiber.resume
call. This enables you to pass data from the body of the fibre to the outer world. Passing data is also possible in the opposite direction, and is by no means harder. As you might have already guessed: arguments passed to
fiber.resume
are the value of
Fiber.yield
. So if we wanted a writer instead of a reader:
FILE="test.txt"

require 'fiber'

writer=Fiber::new\
{
File::open(FILE,"w")\
{ |f|
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}

writer.resume "Line 1"
writer.resume "Line 2"
writer.resume "Line 3"
writer.resume

Why we have to create a
loop
inside the fibre and break from it? Simply because now it's the outer world that decides when to finish the fibre. It signals the fibre to close the file (and add the
"---"
just for our information that the file was closed properly). If we remove the last line of the code (the one that calls the writer with no argument), we'll see that the file won't have
---
added at the end. It will be properly closed due to the finaliser hidden inside
File::open
but it will be closed and released no sooner than the whole program ends so in general it is a good idea to force file close manually.

But wait, there's no line 1 in the file! Yes, it's not there and that's why: the first call to
writer.resume
did not correspond with a call to
Fiber.yield
from within the fibre because at the time of this call the fibre has not yet been started, so it was not waiting on
Fiber.yield
but at the beginning of its block. So the line 1 just activated the fibre, but did not save the line to the file.

What's the solution? First: to get the value of the first call to
writer.resume
you have to add arguments to the fibre block itself. So one of the solutions is like this:
writer=Fiber::new\
{ |l0|
File::open(FILE,"w")\
{ |f|
f.puts(l0)
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}


But it doesn't look to nice, nor it is. In our case a best solution might be to use the first call as a special case and pass the file name in it, like this:
require 'fiber'

writer=Fiber::new\
{ |file|
File::open(file,"w")\
{ |f|
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}

writer.resume FILE
writer.resume "Line 1"
writer.resume "Line 2"
writer.resume "Line 3"
writer.resume

For most cases I'd use this form.

Of course there are much more uses of
Fiber
, also such kinds that use passing values in both directions simultaneously, not just in one of them, like in the above examples.

Producer - Consumer
Here's one more way of looking at the whole fibre thing: it's sort of the producer - consumer pattern, with a queue of size limited to zero. In this way the element is produced no sooner than it is needed and most of the time there are zero elements waiting on the queue. Only as the fibre is not a thread, there are no synchronisation problems and so on and so on.

I hereby certify the Fiber class for everyday use.

No comments: