2008-09-16

StringBuffer

Today a potentially useful code sample:
class StringBuffer
. The class is a buffer into which you can write strings using various printing methods, and from which you can read all the data in the order of arrival. This object can be useful in one-way communication between two threads (or two such objects in two-way comunication), it is thread-safe.

The class will use
StringIO
- a built-in object that has several printing methods (we will add two more) and which allows random access reading and writing. We will remember the reading and writing points in the buffer, and use them during read and write commands. The buffer will also be emptied once it gets too long, to prevent high (infinite) memory usage.

Here's the code, with comments inside this time.
require 'pp' # allows method pp (pretty print); test in irb
require 'thread'

require 'thread'

# Thread synchronizer. A thread calls +wait+ and is stopped
# until the +timeout+ passes or until the method +signal+
# of this object is called to release all waiting threads.
class Synchronizer

def initialize
@waiting=[]
@mutex=Mutex::new
end

attr_reader :mutex # in case somebody wants to use it

def wait(timeout=nil)
thr=Thread.current
begin
# be sure to add myself to the list of waiting threads
@mutex.synchronize{@waiting<<thr}
# sleep given time or forever (yes, it does it)
sleep(*[timeout].compact)
ensure
# be sure to remove myself
@mutex.synchronize{@waiting.delete(thr)}
end
end

def signal
# wake up all waiting threads
@mutex.synchronize{@waiting.each{|t| t.wakeup}}
end

# Check if the thread is currently waiting
def waiting_thread?(thr)
raise ArgumentError,"Argument must be Thread!"\
unless thr.is_a? Thread
@waiting.include?(thr)
end

end

# We add two methods to the +StringIO+.
class StringIO

# Inspect into the buffer (self).
def p(*args)
puts args.map{|a| a.inspect}
end

# Pretty print into buffer (self).
def pp(*args)
args.each{|a| PP.pp(a,self)}
nil
end

end

class StringBuffer

# List of writing methods.
WRITE_METHODS=[:write,:<<,:print,:puts,:putc,:printf,:p,:pp]

# List of reading methods.
READ_METHODS=[:read,:gets,:getc,:readchar,:readline]

# If this much data is in the buffer, empty it.
TRUNC_LENGTH=1000

# dynamically define all writing methods
WRITE_METHODS.each\
{ |wm|
class_eval(
<<-METHOD
def #{wm}(*args)
# safely (mutex)
@mutex.synchronize\
{
# move the +StringIO+ pointer to the end
@buff.pos=@buff.length
# call the same method on the internal buffer
ret=@buff.#{wm}(*args)
# signal the synchronizer in case
# some thread was waiting for data
@synchronizer.signal unless empty?
ret
}
end
METHOD
)
}

# dynamically define read methods
READ_METHODS.each\
{ |rm|
class_eval(
<<-METHOD
def #{rm}(*args)
@mutex.synchronize\
{
# move pointer to the saved position of last read
@buff.pos=@r
# perform the read
ret=@buff.#{rm}(*args)
# save the new pointer
@r=@buff.pos
# call +trunc+ if there is at least +TRUNC_LENGTH+
# bytes of unnecessary data
trunc if @r>=TRUNC_LENGTH
ret
}
end
METHOD
)
}

def initialize
@buff=StringIO::new
# read position
@r=0
@mutex=Mutex::new
# synchronizer for threads waiting for data
@synchronizer=Synchronizer::new
end

attr_reader :synchronizer

def length
@buff.length-@r
end

def eof?
length==0
end

alias empty? eof?

def wait_for_data(timeout=nil)
@synchronizer.wait(timeout) if empty?
self unless empty?
end

private

def trunc
@buff.string=@buff.string[@r..-1]
@r=0
end

end
A small test
irb(main):002:0> s=StringBuffer::new
=> #<StringBuffer:0x2bf4f44 @mutex=#<Mutex:0x2bf4ef4>, @r=0,
# @buff=#<StringIO:0x2bf4f1c>,
# @synchronizer=#<Synchronizer:0x2bf4ee0 @mutex=#<Mutex:0x2bf4e54>,
# @waiting=[]>>
irb(main):003:0> Thread::new{loop{sleep 1;s.print "X"}}
=> #<Thread:0x2bf0b60 sleep>
irb(main):004:0> loop{s.wait_for_data;puts s.read}
XXXXXXXXXX
X
X
X
X
X
X
X
The first line with lots of
X
'es is because this many of them had been accumulated in the buffer before I called the command in line
004
.

2008-09-06

class Array

A handful of methods that could be added to the class
Array
:
class Array

def sum
s=0
each{|e| s+=e}
s
end

def mul
m=1
each{|e| m*=e}
m
end

def mean
sum.to_f/length
end

def map_with_index
i=-1
map{|e| yield(e,i+=1)}
end

def map_with_index!
i=-1
map!{|e| yield(e,i+=1)}
end

def any_with_index?
each_with_index{|e,i| return true if yield(e,i)}
false
end

def all_with_index?
each_with_index{|e,i| return false unless yield(e,i)}
true
end

def find_index
each_with_index{|v,i| return i if yield(v)}
end

def find_indices
ret=[]
each_with_index{|v,i| ret<<i if yield(v)}
ret
end

def select_by_index(*indices)
ret=[]
indices.each{|ind| ret<<self[ind]}
ret
end

alias find_indexes find_indices

def to_hash
raise "Cannot convert to Hash!" unless all?\
{ |e|
e.respond_to? :length and e.length==2 and e.respond_to? :[]
}
h={}
each{|e| h[e[0]]=e[1]}
h
end

def keys_to_hash
h={}
each{|e| h[e]=yield(e)}
h
end

def keys_with_index_to_hash
h={}
each_with_index{|e,i| h[e]=yield(e,i)}
h
end

def with(a2)
ensure_same_length(a2)
map_with_index{|e,i| [e,a2[i]]}
end

def with_to_hash(a2)
ensure_same_length(a2)
h={}
each_with_index{|e,i| h[e]=a2[i]}
h
end

def count_all
h={}
each\
{ |e|
h[e]||=0
h[e]+=1
}
h
end

def group_by
h={}
each\
{ |e|
g=yield(e)
h[g]||=[]
h[g]<< e
}
h.map{|g,ee| ee}
end

alias contain? include?
alias has? include?

def rand
a=to_a
a[Kernel.rand(a.length)] unless a.empty?
end

private

def ensure_same_length(arg)
raise ArgumentError,"Argument must be of the same length!"\
unless arg.respond_to? :length and length==arg.length\
and arg.respond_to? :[]
end

end
They are not perfect, but I use them quite a lot.

Some of them (or similar methods) will be present in Ruby 1.9. For example there will be a method
inject
(or
reduce
) working like this:
[1,4,5].reduce(:*)           #=> 20 # 1*4*5
["a","b","dd"].reduce(:+) #=> "abdd"
As you see, they are better than my
sum
, because they work for any type for which the operation is defined. This reduce is not hard to implement, too, but it will probably work a bit faster when included in Ruby core.

Ruby 1.9 is also going to have
group_by
, working exactly like mine, as far as I know.

Move to Enumerable
One more enhancement that can be done in the above code is to move all the methods in the module
Enumerable
(just write
module Enumerable
instead of
class Array
at the top). It allows you to use these methods also with other enumerable types, like
Hash
. You'll have to test the methods, though, as not all of them make sense when used with structures where the elements are not ordered.

Add to load path
If you create some files that you'd like to be easily accessible in your Ruby programs, you can add the path to your files to Ruby load path, so that you will be able to
require
your files without giving the full path. Under Windows, just go to environment variables, and add
RUBYLIB = P:/ath/To/Your/Dir
The path will be automatically added to Ruby load path each time Ruby starts, which can be verified by typing
$:
(or
$LOAD_PATH
) in irb and looking for your path.

If you want some of your files to be loaded even without the need to
require
them, then you can add them to the environment variable RUBYOPT. This variable can already contain -rubygems. If you want the file P:/ath/To/Your/Dir/start.rb to be loaded at startup, change the variable to
-rubygems -rstart
Each word starting with -r makes ruby load a file named by the rest of the word. Ruby will find your file because you already added file path to Ruby load path. If you want to load more files at startup, it is best to
require
them from within your first file.

As you might have guessed, there is file named ubygems that the original content of the variable caused to load. The strange name is in fact chosen only to make the whole command sound reasonable. All it does is load rubygems.rb, which initialises the Gems engine, enabling programs to use additional libraries.

2008-09-01

{block}

Blocks. The most powerful out of the basic features of Ruby.

Block is a way to pass a bit of code into a function, to let the function execute it if it wants to, and as many times as it wants to. You know already some examples like
[1,2,5].each{|e| puts e}
, where the function
each
calls the block three times - once for each element of
self
.

Let's learn how to write a function that takes a block. I'd like to have a method of the class
Array
that converts the array to
Hash
, where the original array elements become keys, and the values are computed inside the block. Example of how it is supposed to work:
[1,5,3].keys_to_hash{|k| k**2}
#=> {1=>1,5=>25,3=>9}

["Ruby","Al2","O3","Cr"].keys_to_hash{|k| k.length}
#=> {"Al2"=>3,"O3"=>2,"Ruby"=>4,"Cr"=>2}
# remember that Hash does not maintain the order of elements
# so they might get reordered when written irb
So, our function definitely takes a block, and executes it once for each element, and collects the return values of the block as values in the hash. The code that does it is like this:
class Array
def keys_to_hash
raise LocalJumpError,"Block not given!" unless block_given?
h={}
each\
{ |e|
h[e]=yield(e)
}
h
end
end
First we raise an exception if the method was called without a block. This line is not obligatory, as the exception would be raised anyway at the moment when we try to execute the block, so I raise it here mostly to show you how to check if a block is given.

Then we create an empty hash, and then for each element of
self
(works like
self.each
) we write an element to the hash, using the current element
e
as the key, and
yield(e)
as the value. As you must have guessed by now, the keyword
yield
is a call to the passed block.

Finally we return the created
h
as the function result. You can check that the function works as expected.

Just one more example:
class Array
def each_consequent(n)
for i in (0..length-n)
yield(*self[i,n])
end
self
end
end

[2,3,5,7,11,13,17,19,23].each_consequent(3)\
{ |a,b,c|
puts "#{a} #{b} #{c}"
}

# output:
2 3 5
3 5 7
5 7 11
7 11 13
11 13 17
13 17 19
17 19 23
I'll explain just the most suspicious part here:
self[i,n]
is an array (subarray of
self
) and we call
yield
with
*
before the array to make the array splash into the three block arguments
|a,b,c|
. This splash operator is not always necessary, but it's nice to include it to make it clear that the arguments get splashed.

If block is an object...
There are in general two ways of passing a block to a next function. Let's define two functions that behave exactly like
each
:
class Array

def my_each1
each{|a| yield(a)}
end

def my_each2(&b)
each(&b)
end

end
The first one makes a trivial block itself - the block is created just to call the original block coming to
my_each1
with the argument. The second one uses the
&
operator to make the block be assigned into the variable
b
. Inside
my_each2
the variable
b
is a
Proc
object. You could call it by hand inside the function, using
b.call(arg)
or for short
b[arg]
, but in our example it is instead passed to
each
, and the operator
&
makes it sort-of-unsplash back into a block. Two other ways to do it (not very elegant, though):
p=Proc::new{|a| yield(a)}; each(&p)
, or another ugly way:
each{|a| b.call(a)}
. I give these example just to touch your brain and make you understand!

If the block is not passed into a function declared with a block parameter, like
my_each2
, the value of
b
is
nil
, and you don't have to call
block_given?
to check it.

...then we can store it
Now another useful trick. If we can receive a block as an object, or wrap it into a new
Proc
, then it's an object, and can be stored in a variable. Look:
class K

def store_block(&b)
@b=b
end

def call_block(*args)
@b.call(*args)
end

end

k=K::new
k.store_block{|a,b| puts "#{a}::#{b}"} # no output to the console
k.call_block("Al2O3","Cr") # output: Al2O3::Cr
k.call_block("Hi","there") # output: Hi::there
So, we saved the passed block, and called it later. Note one very useful trick: if we receive the arguments as
*args
and pass them on as
*args
as well, then any set of arguments, no matter how many of them you pass to
call_block
, will get forwarded to the block call. (Of course now calling
k.call_block(1,2,3)
will print just
"1::2"
because our block takes two arguments, which means it ignores the third one; but the argument gets lost in the block, and not in
call_block
).

This block saving is not useless. You can for example call a method that saves a block, and executes it later as a callback to an event that happens inside the object. This is a very useful behaviour.

Passing more blocks
Unfortunatelly, Ruby doesn't support passing more blocks to a function. You can have only one parameter with
&
, and there is only one
yield
too. But Ruby does allow passing multiple regular arguments, so what's the problem? Let's write a function that sort of takes two blocks, and calls one of them with the result returned by the call to the other with the argument 5, or opposite:
def random_caller(b1,b2)
raise ArgumentError,"Arguments must be Procs"\
unless b1.is_a? Proc and b2.is_a? Proc
if rand(2).zero?
b1.call(b2.call(5))
else
b2.call(b1.call(5))
end
end

q=lambda\
{
random_caller(lambda{|x| x+2},lambda{|x| x**2})
}
q[] #=> 27
q[] #=> 27
q[] #=> 49
q[] #=> 27
q[] #=> 27
First we check if what we really got are procs. Then we randomly call one of them with
5
and the other with the result of the first one, or the opposite, and return the result.

Now the call. The structure
lambda{|arg| exp}
is more or less the same as
Proc::new{|arg| exp}
and
proc{|arg| exp}
. So
random_caller(lambda{|x| x+2},lambda{|x| x**2})
is a call to our function, and we can expect the result of the call to be either
(5+2)**2
which is
49
, or
(5**2)+2
which is
27
.

Now we must call our function multiple times. We could do it like this:
5.times{random_caller(lambda{|x| x+2},lambda{|x| x**2})}
But, as a part of this tutorial, I made the call to the function into another proc, and stored it in
q
. As you see, you don't even have to pass a block to a function to store it somewhere. You can create a proc just like that, and store it in a local variable, and then call it using
q[]
or
q.call
.

Scope
The scope visible to a block is its declaration scope. What is very interesting, even when the scope is no longer accessible, because the control left the function, it still exists if a lambda was declared there and can use it. This example illustrates the complicated words I just said:
def create_blocks
x=nil
getter=lambda{x}
setter=lambda{|v| x=v}
[setter,getter]
end

s,g=*create_blocks
s[6] # or s.call(6)
g #=> 6
s[:R]
g #=> :R
The scope from inside
create_blocks
is not lost, even though the control left the method and will never return. The variable
x
is still accessible by the lambdas declared in the scope.

Other sources
Here are some link to learn more about gotchas in Ruby's blocks.
Ruby blocks gotchas
Proc vs lambda
Wikipedia - Closure (in many other languages the Ruby clock thing is called closure, or probably more like the closures are called blocks in Ruby)
Wikipedia Smalltalk (this blocks are pretty modern and fresh programming things, aren't they? well, they're not; have a look at Smalltalk (1980))