2008-09-16

StringBuffer

Today a potentially useful code sample:
class StringBuffer
. The class is a buffer into which you can write strings using various printing methods, and from which you can read all the data in the order of arrival. This object can be useful in one-way communication between two threads (or two such objects in two-way comunication), it is thread-safe.

The class will use
StringIO
- a built-in object that has several printing methods (we will add two more) and which allows random access reading and writing. We will remember the reading and writing points in the buffer, and use them during read and write commands. The buffer will also be emptied once it gets too long, to prevent high (infinite) memory usage.

Here's the code, with comments inside this time.
require 'pp' # allows method pp (pretty print); test in irb
require 'thread'

require 'thread'

# Thread synchronizer. A thread calls +wait+ and is stopped
# until the +timeout+ passes or until the method +signal+
# of this object is called to release all waiting threads.
class Synchronizer

def initialize
@waiting=[]
@mutex=Mutex::new
end

attr_reader :mutex # in case somebody wants to use it

def wait(timeout=nil)
thr=Thread.current
begin
# be sure to add myself to the list of waiting threads
@mutex.synchronize{@waiting<<thr}
# sleep given time or forever (yes, it does it)
sleep(*[timeout].compact)
ensure
# be sure to remove myself
@mutex.synchronize{@waiting.delete(thr)}
end
end

def signal
# wake up all waiting threads
@mutex.synchronize{@waiting.each{|t| t.wakeup}}
end

# Check if the thread is currently waiting
def waiting_thread?(thr)
raise ArgumentError,"Argument must be Thread!"\
unless thr.is_a? Thread
@waiting.include?(thr)
end

end

# We add two methods to the +StringIO+.
class StringIO

# Inspect into the buffer (self).
def p(*args)
puts args.map{|a| a.inspect}
end

# Pretty print into buffer (self).
def pp(*args)
args.each{|a| PP.pp(a,self)}
nil
end

end

class StringBuffer

# List of writing methods.
WRITE_METHODS=[:write,:<<,:print,:puts,:putc,:printf,:p,:pp]

# List of reading methods.
READ_METHODS=[:read,:gets,:getc,:readchar,:readline]

# If this much data is in the buffer, empty it.
TRUNC_LENGTH=1000

# dynamically define all writing methods
WRITE_METHODS.each\
{ |wm|
class_eval(
<<-METHOD
def #{wm}(*args)
# safely (mutex)
@mutex.synchronize\
{
# move the +StringIO+ pointer to the end
@buff.pos=@buff.length
# call the same method on the internal buffer
ret=@buff.#{wm}(*args)
# signal the synchronizer in case
# some thread was waiting for data
@synchronizer.signal unless empty?
ret
}
end
METHOD
)
}

# dynamically define read methods
READ_METHODS.each\
{ |rm|
class_eval(
<<-METHOD
def #{rm}(*args)
@mutex.synchronize\
{
# move pointer to the saved position of last read
@buff.pos=@r
# perform the read
ret=@buff.#{rm}(*args)
# save the new pointer
@r=@buff.pos
# call +trunc+ if there is at least +TRUNC_LENGTH+
# bytes of unnecessary data
trunc if @r>=TRUNC_LENGTH
ret
}
end
METHOD
)
}

def initialize
@buff=StringIO::new
# read position
@r=0
@mutex=Mutex::new
# synchronizer for threads waiting for data
@synchronizer=Synchronizer::new
end

attr_reader :synchronizer

def length
@buff.length-@r
end

def eof?
length==0
end

alias empty? eof?

def wait_for_data(timeout=nil)
@synchronizer.wait(timeout) if empty?
self unless empty?
end

private

def trunc
@buff.string=@buff.string[@r..-1]
@r=0
end

end
A small test
irb(main):002:0> s=StringBuffer::new
=> #<StringBuffer:0x2bf4f44 @mutex=#<Mutex:0x2bf4ef4>, @r=0,
# @buff=#<StringIO:0x2bf4f1c>,
# @synchronizer=#<Synchronizer:0x2bf4ee0 @mutex=#<Mutex:0x2bf4e54>,
# @waiting=[]>>
irb(main):003:0> Thread::new{loop{sleep 1;s.print "X"}}
=> #<Thread:0x2bf0b60 sleep>
irb(main):004:0> loop{s.wait_for_data;puts s.read}
XXXXXXXXXX
X
X
X
X
X
X
X
The first line with lots of
X
'es is because this many of them had been accumulated in the buffer before I called the command in line
004
.

2008-09-06

class Array

A handful of methods that could be added to the class
Array
:
class Array

def sum
s=0
each{|e| s+=e}
s
end

def mul
m=1
each{|e| m*=e}
m
end

def mean
sum.to_f/length
end

def map_with_index
i=-1
map{|e| yield(e,i+=1)}
end

def map_with_index!
i=-1
map!{|e| yield(e,i+=1)}
end

def any_with_index?
each_with_index{|e,i| return true if yield(e,i)}
false
end

def all_with_index?
each_with_index{|e,i| return false unless yield(e,i)}
true
end

def find_index
each_with_index{|v,i| return i if yield(v)}
end

def find_indices
ret=[]
each_with_index{|v,i| ret<<i if yield(v)}
ret
end

def select_by_index(*indices)
ret=[]
indices.each{|ind| ret<<self[ind]}
ret
end

alias find_indexes find_indices

def to_hash
raise "Cannot convert to Hash!" unless all?\
{ |e|
e.respond_to? :length and e.length==2 and e.respond_to? :[]
}
h={}
each{|e| h[e[0]]=e[1]}
h
end

def keys_to_hash
h={}
each{|e| h[e]=yield(e)}
h
end

def keys_with_index_to_hash
h={}
each_with_index{|e,i| h[e]=yield(e,i)}
h
end

def with(a2)
ensure_same_length(a2)
map_with_index{|e,i| [e,a2[i]]}
end

def with_to_hash(a2)
ensure_same_length(a2)
h={}
each_with_index{|e,i| h[e]=a2[i]}
h
end

def count_all
h={}
each\
{ |e|
h[e]||=0
h[e]+=1
}
h
end

def group_by
h={}
each\
{ |e|
g=yield(e)
h[g]||=[]
h[g]<< e
}
h.map{|g,ee| ee}
end

alias contain? include?
alias has? include?

def rand
a=to_a
a[Kernel.rand(a.length)] unless a.empty?
end

private

def ensure_same_length(arg)
raise ArgumentError,"Argument must be of the same length!"\
unless arg.respond_to? :length and length==arg.length\
and arg.respond_to? :[]
end

end
They are not perfect, but I use them quite a lot.

Some of them (or similar methods) will be present in Ruby 1.9. For example there will be a method
inject
(or
reduce
) working like this:
[1,4,5].reduce(:*)           #=> 20 # 1*4*5
["a","b","dd"].reduce(:+) #=> "abdd"
As you see, they are better than my
sum
, because they work for any type for which the operation is defined. This reduce is not hard to implement, too, but it will probably work a bit faster when included in Ruby core.

Ruby 1.9 is also going to have
group_by
, working exactly like mine, as far as I know.

Move to Enumerable
One more enhancement that can be done in the above code is to move all the methods in the module
Enumerable
(just write
module Enumerable
instead of
class Array
at the top). It allows you to use these methods also with other enumerable types, like
Hash
. You'll have to test the methods, though, as not all of them make sense when used with structures where the elements are not ordered.

Add to load path
If you create some files that you'd like to be easily accessible in your Ruby programs, you can add the path to your files to Ruby load path, so that you will be able to
require
your files without giving the full path. Under Windows, just go to environment variables, and add
RUBYLIB = P:/ath/To/Your/Dir
The path will be automatically added to Ruby load path each time Ruby starts, which can be verified by typing
$:
(or
$LOAD_PATH
) in irb and looking for your path.

If you want some of your files to be loaded even without the need to
require
them, then you can add them to the environment variable RUBYOPT. This variable can already contain -rubygems. If you want the file P:/ath/To/Your/Dir/start.rb to be loaded at startup, change the variable to
-rubygems -rstart
Each word starting with -r makes ruby load a file named by the rest of the word. Ruby will find your file because you already added file path to Ruby load path. If you want to load more files at startup, it is best to
require
them from within your first file.

As you might have guessed, there is file named ubygems that the original content of the variable caused to load. The strange name is in fact chosen only to make the whole command sound reasonable. All it does is load rubygems.rb, which initialises the Gems engine, enabling programs to use additional libraries.

2008-09-01

{block}

Blocks. The most powerful out of the basic features of Ruby.

Block is a way to pass a bit of code into a function, to let the function execute it if it wants to, and as many times as it wants to. You know already some examples like
[1,2,5].each{|e| puts e}
, where the function
each
calls the block three times - once for each element of
self
.

Let's learn how to write a function that takes a block. I'd like to have a method of the class
Array
that converts the array to
Hash
, where the original array elements become keys, and the values are computed inside the block. Example of how it is supposed to work:
[1,5,3].keys_to_hash{|k| k**2}
#=> {1=>1,5=>25,3=>9}

["Ruby","Al2","O3","Cr"].keys_to_hash{|k| k.length}
#=> {"Al2"=>3,"O3"=>2,"Ruby"=>4,"Cr"=>2}
# remember that Hash does not maintain the order of elements
# so they might get reordered when written irb
So, our function definitely takes a block, and executes it once for each element, and collects the return values of the block as values in the hash. The code that does it is like this:
class Array
def keys_to_hash
raise LocalJumpError,"Block not given!" unless block_given?
h={}
each\
{ |e|
h[e]=yield(e)
}
h
end
end
First we raise an exception if the method was called without a block. This line is not obligatory, as the exception would be raised anyway at the moment when we try to execute the block, so I raise it here mostly to show you how to check if a block is given.

Then we create an empty hash, and then for each element of
self
(works like
self.each
) we write an element to the hash, using the current element
e
as the key, and
yield(e)
as the value. As you must have guessed by now, the keyword
yield
is a call to the passed block.

Finally we return the created
h
as the function result. You can check that the function works as expected.

Just one more example:
class Array
def each_consequent(n)
for i in (0..length-n)
yield(*self[i,n])
end
self
end
end

[2,3,5,7,11,13,17,19,23].each_consequent(3)\
{ |a,b,c|
puts "#{a} #{b} #{c}"
}

# output:
2 3 5
3 5 7
5 7 11
7 11 13
11 13 17
13 17 19
17 19 23
I'll explain just the most suspicious part here:
self[i,n]
is an array (subarray of
self
) and we call
yield
with
*
before the array to make the array splash into the three block arguments
|a,b,c|
. This splash operator is not always necessary, but it's nice to include it to make it clear that the arguments get splashed.

If block is an object...
There are in general two ways of passing a block to a next function. Let's define two functions that behave exactly like
each
:
class Array

def my_each1
each{|a| yield(a)}
end

def my_each2(&b)
each(&b)
end

end
The first one makes a trivial block itself - the block is created just to call the original block coming to
my_each1
with the argument. The second one uses the
&
operator to make the block be assigned into the variable
b
. Inside
my_each2
the variable
b
is a
Proc
object. You could call it by hand inside the function, using
b.call(arg)
or for short
b[arg]
, but in our example it is instead passed to
each
, and the operator
&
makes it sort-of-unsplash back into a block. Two other ways to do it (not very elegant, though):
p=Proc::new{|a| yield(a)}; each(&p)
, or another ugly way:
each{|a| b.call(a)}
. I give these example just to touch your brain and make you understand!

If the block is not passed into a function declared with a block parameter, like
my_each2
, the value of
b
is
nil
, and you don't have to call
block_given?
to check it.

...then we can store it
Now another useful trick. If we can receive a block as an object, or wrap it into a new
Proc
, then it's an object, and can be stored in a variable. Look:
class K

def store_block(&b)
@b=b
end

def call_block(*args)
@b.call(*args)
end

end

k=K::new
k.store_block{|a,b| puts "#{a}::#{b}"} # no output to the console
k.call_block("Al2O3","Cr") # output: Al2O3::Cr
k.call_block("Hi","there") # output: Hi::there
So, we saved the passed block, and called it later. Note one very useful trick: if we receive the arguments as
*args
and pass them on as
*args
as well, then any set of arguments, no matter how many of them you pass to
call_block
, will get forwarded to the block call. (Of course now calling
k.call_block(1,2,3)
will print just
"1::2"
because our block takes two arguments, which means it ignores the third one; but the argument gets lost in the block, and not in
call_block
).

This block saving is not useless. You can for example call a method that saves a block, and executes it later as a callback to an event that happens inside the object. This is a very useful behaviour.

Passing more blocks
Unfortunatelly, Ruby doesn't support passing more blocks to a function. You can have only one parameter with
&
, and there is only one
yield
too. But Ruby does allow passing multiple regular arguments, so what's the problem? Let's write a function that sort of takes two blocks, and calls one of them with the result returned by the call to the other with the argument 5, or opposite:
def random_caller(b1,b2)
raise ArgumentError,"Arguments must be Procs"\
unless b1.is_a? Proc and b2.is_a? Proc
if rand(2).zero?
b1.call(b2.call(5))
else
b2.call(b1.call(5))
end
end

q=lambda\
{
random_caller(lambda{|x| x+2},lambda{|x| x**2})
}
q[] #=> 27
q[] #=> 27
q[] #=> 49
q[] #=> 27
q[] #=> 27
First we check if what we really got are procs. Then we randomly call one of them with
5
and the other with the result of the first one, or the opposite, and return the result.

Now the call. The structure
lambda{|arg| exp}
is more or less the same as
Proc::new{|arg| exp}
and
proc{|arg| exp}
. So
random_caller(lambda{|x| x+2},lambda{|x| x**2})
is a call to our function, and we can expect the result of the call to be either
(5+2)**2
which is
49
, or
(5**2)+2
which is
27
.

Now we must call our function multiple times. We could do it like this:
5.times{random_caller(lambda{|x| x+2},lambda{|x| x**2})}
But, as a part of this tutorial, I made the call to the function into another proc, and stored it in
q
. As you see, you don't even have to pass a block to a function to store it somewhere. You can create a proc just like that, and store it in a local variable, and then call it using
q[]
or
q.call
.

Scope
The scope visible to a block is its declaration scope. What is very interesting, even when the scope is no longer accessible, because the control left the function, it still exists if a lambda was declared there and can use it. This example illustrates the complicated words I just said:
def create_blocks
x=nil
getter=lambda{x}
setter=lambda{|v| x=v}
[setter,getter]
end

s,g=*create_blocks
s[6] # or s.call(6)
g #=> 6
s[:R]
g #=> :R
The scope from inside
create_blocks
is not lost, even though the control left the method and will never return. The variable
x
is still accessible by the lambdas declared in the scope.

Other sources
Here are some link to learn more about gotchas in Ruby's blocks.
Ruby blocks gotchas
Proc vs lambda
Wikipedia - Closure (in many other languages the Ruby clock thing is called closure, or probably more like the closures are called blocks in Ruby)
Wikipedia Smalltalk (this blocks are pretty modern and fresh programming things, aren't they? well, they're not; have a look at Smalltalk (1980))

2008-08-28

include Module

Hello. Today about including a module, and about modules in general.

One potential use of a module, as a namespace for a set of functions, you have seen here: Fibonacci numbers - lazy evaluation. Now it's time to show the main use of modules, the one for which they are introduced in Ruby: mixins.

Mixins
Or, more descriptive: (mix-in)s, things that you mix-in. Let's skip the theory for now and go to an example.

Let's say we have a class whose objects we want to make comparable. Let it be
class Person
, and let
p1<p2
if person
p1
is younger than person
p2
. The traditional approach here is like this:
class Person

def initialize(name,age)
# skip validation for simplicity
@name=name
@age=age
end

attr_reader :name,:age

def <(p2)
@age<p2.age
end

end
OK, now we can compare two people like:
t=Person::new("Tom",23.3)
z=Person::new("Zuz",23.2)
t<z #=> false # as expected
But if we try
t>z
or
t==z
or
t>=z
..., it will be a
NoMethodError
, because only the method
<
has been defined. Of course we can define all the 6 comparison methods, but that wouldn't make today's post, would it? Let's do it using a module
Comparable
, already existing in Ruby, and the operator
<=>
.

<=>
The method
<=>
works just like
comapreTo
in Java - it takes an argument and compares
self
with the argument, yielding
-1
,
0
or
1
if
self
is less than, equal, or greater than the argument, respectively. This strange-looking operator is defined for all built-in comparable types in Ruby, try
5<=>7
for instance. Let's define it, bearing in mind that it is already defined for standard types!
class Person
def <=>(p2)
@age<=>p2.age
end
end
That was trivial. Now we could define all the operator like this:
class Person
def <(p2);(self<=>p2)<0;end
def >(p2);(self<=>p2)>0;end
def ==(p2);(self<=>p2)==0;end
def >=(p2);(self<=>p2)>=0;end
def <=(p2);(self<=>p2)<=0;end
end
Remember one thing: In Ruby, if something looks inefficient, it probably is. So, the above code looks inefficient. First, because it has very low entropy, meaning it repeats the same thing over and over again, and second, because if we define another class which we also want to be comparable, we'll have to copy the 5 lines without any difference duplicating the code and lowering the entropy even more.

One more word - we don't define
!=
because it is automagically defined as the opposite to
==
and even cannot be redefined.

module
Let's do it like it should be done! Let's define the five comparison methods in a module, and let's mix the module into our class like this:
module Comparable
def <(p2);(self<=>p2)<0;end
def >(p2);(self<=>p2)>0;end
def ==(p2);(self<=>p2)==0;end
def >=(p2);(self<=>p2)>=0;end
def <=(p2);(self<=>p2)<=0;end
end

class Person
include Comparable
end

class OtherComparableClass
include Comparable
end
Isn't that better? Now it's going to turn out even more better when I tell you the module
Comparable
is already defined in Ruby, with the five functions just like we defined them here, so when you want to make a class comparable, you just
include Comparable
and
def <=>(other)
, and all works! The module has also a bonus:
between?(min,max)
, working like expected (both ends inclusive).

Now note one thing: the module uses the method
<=>
even though it is not define in it, nor in its ancestors, nor anywhere. But module is a trusty animal: it trusts you that you won't include it unless you define all the missing methods it uses!

The biggest thing in the world
Let's play for a moment with
Comparable
. Let's define an object that claims to be the biggest object in the world.
biggest=Object::new

class << biggest
include Comparable
def <=>(other)
other.equal?(self) ? 0 : 1
end
end

biggest>5 #=> true
biggest<=1000000 #=> false
biggest>["X"] #=> true
biggest>biggest #=> false
What we did: we defined the object, then we sort of declared sort of class that our
biggest
is sort of instance (in fact it's the object's eigenclass, but let's leave it for another post). Just understand that declaring the methods like we do it here is exactly like declaring them in the object's regular class, only they are accessible only for our
bigger
, and not for all the ``Object`` instances. It's defining methods just for one object (as there is only one biggest object, of course!).

The
other.equal?(self) ? 0 : 1
part might need an explanation. If we just returned
1
, then the object would be greater than all objects including itself, so
biggest>biggest
would yield
true
, and
biggest<biggest
would return
false
. We want the object to know that it is as big as it is, so when the object is compared with itself, we want it to know they are equal. That's why we compare it with itself. Now, why we use
equal?
and not
==
? Well, the method
==
defined inside
Comparable
calls
<=>
which in turn calls
==
and so on until
SystemStackError
. But the function
equal?
works in another way: it checks if the two object are the same
instances
:
a="abc"
a=="abc" #=> true
a.equal?("abc") #=> false # other instance of String
a.equal?(a) #=> true
a.equal?(a.dup) #=> false
So that's the method we're looking for - it will return
true
only if we compare
biggest
with
biggest
itself.

One word of a summary: if you define a method within a class, you have to instantiate the class to make the method really accessible to the world. But if you define a method within a module, you first have to include the module in a class, and then to instantiate the class, to be able to call the method.

Kernel and puts
What's this
Kernel
? It's a module that is included in the class
Object
, and thus in all the classes you ever define, as they all are descendants of the class
Object
. Now there comes the explanation how come you can write
puts
and it works.

If you open irb, or take an empty .rb file, you are inside a class. Check it:
irb(main):099:0> self
=> main
irb(main):100:0> self.class
=> Object
So, were inside some
Object
instance. So what happens if we write
puts
? We call our object's private method
puts
. We can call it even though it is private because we are inside the object (but if you try
self.puts
you'll see it is private; by the way, it is not like in Java here -
self
is just as any other object so you cannot call private methods on
self
, you can only do it without prefixing them with
self
). But the truth is that the method is not defined in the object - it is defined inside the
Kernel
module (as a private function), and in this way it got to our
main
object. And to any other object, too:
class K
def kk
puts "in K"
end
end
Now, the call to
puts
that you do when writing
k=K::new; k.kk
is a call to
k
's private method
puts
, which is there also because of mixing in
Kernel
into the class
K
(into the
K
's ancestor:
Object
, precisely speaking). So, if we had a class
StupidClass
, and we were tired of all the noise this class' instances do by
:puts
ing stupid things, we can mute all class' instances:
class StupidClass
def puts(*args)
# ignore
end
end
Other classes, including our
main
, will still be able to
puts
messages.

But if we don't want to mute the messages completely, but just to make them less noisy, we can do this:
class StupidClass
def puts(*args)
Kernel.puts(*args.map{|a| a.to_s.downcase})
end
end

stupid_object.puts "AbC",5,:XX
# Output:
abc
xx
Of course we have to call the
Kernel
's static method
puts
and not just write
puts
, because it would make a self-reference. Also note that the static method
Kernel.puts
is not the one that is included in the class
Object
.
Kernel
has two
puts
methods:
irb(main):009:0> Kernel.private_instance_methods.grep /puts/
=> ["puts"] # instance method, called by Object::new.puts
irb(main):010:0> Kernel.singleton_methods.grep /puts/
=> ["puts"] # static method, called by Kernel.puts
The two methods are independent - none of them calls the other.

2008-08-24

Object **arr;

What? What do these asterisks do? We're having Ruby here, and the whole old pointers things from C++ are gone, aren't they?

Well, they are not, sorry. Remember one thing, please. This pointers thing is not a part of C++ or any other particular language. Pointers are how computers work. Each (reasonable) programming language either has pointers, or is inefficient. Ruby has them too. The difference is that in Ruby we don't use asterisks to denote we're using them.

What's "wrong"

Have a look at the example that proves we have pointers in Ruby:
a=[2,3,5,7]   #=> [2, 3, 5, 7]
b=a #=> [2, 3, 5, 7]
a<<11 #=> [2, 3, 5, 7, 11]
a #=> [2, 3, 5, 7, 11]
b #=> [2, 3, 5, 7, 11]
So, as you see,
a
and
b
are just pointers. And when we make the substitution in the second line, we just make them point to the same object in memory (array, in our example), so when we modify the object using an in place changing method (like
reverse!
,
clear
and others), the change will be also visible through the other variable.

We observe exactly the same behaviour when we use a string instead of an array:
a="abc"       #=> "abc"
b=a #=> "abc"
a<<"x" #=> "abcx"
a #=> "abcx"
b #=> "abcx"
Note that if you use
+=
instead of
<<
, a new instance of the string with the
"x"
appended is created and assigned to
a
, so if you want to make a string buffer and append to it some lines, it's probably better to use
<<
, because it does not create another object.

So, what if we would like to have an independent copy of a given array or a string? Ruby comes with a function
dup
that makes a copy of an object. (In fact there's also a function
clone
that behaves similarily, for today let's assume the functions do exactly the same (they don't), and let's use
dup
.) Change the line
b=a
to
b=a.dup
in both examples above and you will see it works like expected - modifying
a
does not modify
b
and vice versa.

Happy? So, is that all for today? Not quite. Have a look:
a=["a","b"]
b=a.dup
a<<"c"
b #=> ["a", "b"] # as expected
a[0]<<"x"
a             #=> ["ax", "b", "c"]
b #=> ["ax", "b"] # oops!
What happened? Well, now
b
is a copy of
a
, which means they are two separate arrays, so when we add
"c"
to one of them, the other does not get modified, we already know that. But in Ruby everything is an
Object
, so the elements of the arrays are objects too, and, what worse, they are the same objects. When we did
b=a.dup
, we created a separate array, but the elements of the array are pointers to the same strings as the elements of the original array. Our copy is not deep, we separated the top-level objects but not the elements of the array. So when we modified in place one of the elements, it got modified also in the other array, even though the arrays are separate objects.

Another way to check what happens:
x=Object::new       #=> #<Object:0x2d8ce10>
x.dup #=> #<Object:0x2d91dfc>
[x] #=> [#<Object:0x2d8ce10>]
[x].dup #=> [#<Object:0x2d8ce10>]
As you see,
dup
on an object makes a new object (the addresses differ), but
dup
on an array makes a new array but does not make a copy of the elements - it just makes a new array with its elements pointing to the original objects.

How to "fix" it
OK, so let's change the behaviour of
dup
for
Array
so that it makes a deep copy calling
dup
recursively on all its elements. Good idea? Yes, but NO NO NO!

Never do such a thing as changing the behaviour of a standard function! Someone can depend on how it works now, so you cannot change it! Remember well this lesson! Even if you think it's broken, don't fix it in this way!

Sorry for shouting, but it is important. Let's define a new function with the functionality we've just described. Here's how we do it:
class Object
def deep_dup
dup
end
end

class Array
def deep_dup
map{|e| e.deep_dup}
end
end

class Hash
def deep_dup
h={}
each{|k,v| h[k.deep_dup]=v.deep_dup}
h
end
end
First we define
deep_dup
for "normal" object as a simple copy, as they do not need any special treatment. Then we redefine (override) the function for
Array
and also for
Hash
, as there's exactly the same case with hashes as with arrays. You can check that after changing
dup
to
deep_dup
in the examples above, everything will work as expected.

Of course there are also other classes like
Set
(available after
require 'set'
) that might need overriding
deep_dup
for them to work.

So why use dup?
Now another lesson: why at all use this strange-working
dup
if we have such a nice
deep_dup
? Well, the answer is simple: as
deep_dup
copies everything, it might use much more memory (and time) than
dup
. That's why I said that if a language doesn't have pointers (that is, if a normal substitution works just like our
deep_dup
), it is inefficient. Because the solution is not to stop using
dup
. The solution is to use it carefully and wisely. And to remember, that there are pointers under the nice skin of Ruby.

Other methods
If we need a deep copy, the approach described above is probably the best one, but not the only one. One of the most secure methods to create a complete deep copy of an object is to serialise it to a "soul-less" string and deserialise it back. Then we can be absolutely sure that no part of the new object will be a part of the original one, as deserialisation for sure creates the object from scratch.

Ruby provides two easy serialisation methods:
YAML
and
Marshal
.
require 'yaml'

class Object

def m_dup
Marshal.load(Marshal.dump(self))
end

def y_dup
YAML.load(YAML.dump(self))
end

end
Both
YAML
and
Marshal
have a method
dump
that returns a string representation of the object passed (you can see how the strings look like by calling
dump
on various objects in irb), and the method
load
that does the reverse.

Differences:
- as you can see,
Marshal
's string is shorter so probably it's better and more efficient.
-
YAML
does not always work. I don't know why this is happening, but some complicated structures with sets, arrays, strings and hashes fail to load from the dumped string.

Both methods are most probably worse (slower) than our
deep_dup
, because they need to parse the string.

Last problem - self-references
There's one point, however, where the serialisation method works, and our
deep_dup
fails. It is when an array is an element of itself:
a=[]
a<<a
a #=> [[...]] # the three dots denote a self-reference
a.deep_dup
SystemStackError: stack level too deep
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
... 15577 levels...
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):115
from (null):0

Aww, a failure, because duplicating
a
needs duplicating
a
first, and so on. That's why our method is not so perfect, and will fail also on examples like this:
a=[1,2,{:a=>4}]
a[2][:b]=a
a #=> [1, 2, {:a=>4, :b=>[...]}]
It can be fixed and handled just as it is handled in serialisation (it works without problems with such objects), and also in
inspect
(if it wasn't, you'd have an infinite output after creating such an object in irb), but I'll leave this as an exercise for the reader.

One more method
This last method is so bad that you should never use it, I just mention it to give you another knol to think about.
class Object
def i_dup
eval(inspect)
end
end
It calls the method
inspect
to create a human-readable representation of the object, just like irb does after each command, and then passes the string to
eval
that simply executes the string.

You should know by yourself why this method is bad, but just to make it clear:
- it only works for objects "made of"
Array
,
Hash
,
Numeric
,
String
,
Range
and
Symbol
instances (maybe some more I forgot about now), it won't work for
Object::new
,
- it doesn't handle self-reference (it fails when it sees the three dots).

That's all for today, I hope you learnt something new.