2008-08-28

include Module

Hello. Today about including a module, and about modules in general.

One potential use of a module, as a namespace for a set of functions, you have seen here: Fibonacci numbers - lazy evaluation. Now it's time to show the main use of modules, the one for which they are introduced in Ruby: mixins.

Mixins
Or, more descriptive: (mix-in)s, things that you mix-in. Let's skip the theory for now and go to an example.

Let's say we have a class whose objects we want to make comparable. Let it be
class Person
, and let
p1<p2
if person
p1
is younger than person
p2
. The traditional approach here is like this:
class Person

def initialize(name,age)
# skip validation for simplicity
@name=name
@age=age
end

attr_reader :name,:age

def <(p2)
@age<p2.age
end

end
OK, now we can compare two people like:
t=Person::new("Tom",23.3)
z=Person::new("Zuz",23.2)
t<z #=> false # as expected
But if we try
t>z
or
t==z
or
t>=z
..., it will be a
NoMethodError
, because only the method
<
has been defined. Of course we can define all the 6 comparison methods, but that wouldn't make today's post, would it? Let's do it using a module
Comparable
, already existing in Ruby, and the operator
<=>
.

<=>
The method
<=>
works just like
comapreTo
in Java - it takes an argument and compares
self
with the argument, yielding
-1
,
0
or
1
if
self
is less than, equal, or greater than the argument, respectively. This strange-looking operator is defined for all built-in comparable types in Ruby, try
5<=>7
for instance. Let's define it, bearing in mind that it is already defined for standard types!
class Person
def <=>(p2)
@age<=>p2.age
end
end
That was trivial. Now we could define all the operator like this:
class Person
def <(p2);(self<=>p2)<0;end
def >(p2);(self<=>p2)>0;end
def ==(p2);(self<=>p2)==0;end
def >=(p2);(self<=>p2)>=0;end
def <=(p2);(self<=>p2)<=0;end
end
Remember one thing: In Ruby, if something looks inefficient, it probably is. So, the above code looks inefficient. First, because it has very low entropy, meaning it repeats the same thing over and over again, and second, because if we define another class which we also want to be comparable, we'll have to copy the 5 lines without any difference duplicating the code and lowering the entropy even more.

One more word - we don't define
!=
because it is automagically defined as the opposite to
==
and even cannot be redefined.

module
Let's do it like it should be done! Let's define the five comparison methods in a module, and let's mix the module into our class like this:
module Comparable
def <(p2);(self<=>p2)<0;end
def >(p2);(self<=>p2)>0;end
def ==(p2);(self<=>p2)==0;end
def >=(p2);(self<=>p2)>=0;end
def <=(p2);(self<=>p2)<=0;end
end

class Person
include Comparable
end

class OtherComparableClass
include Comparable
end
Isn't that better? Now it's going to turn out even more better when I tell you the module
Comparable
is already defined in Ruby, with the five functions just like we defined them here, so when you want to make a class comparable, you just
include Comparable
and
def <=>(other)
, and all works! The module has also a bonus:
between?(min,max)
, working like expected (both ends inclusive).

Now note one thing: the module uses the method
<=>
even though it is not define in it, nor in its ancestors, nor anywhere. But module is a trusty animal: it trusts you that you won't include it unless you define all the missing methods it uses!

The biggest thing in the world
Let's play for a moment with
Comparable
. Let's define an object that claims to be the biggest object in the world.
biggest=Object::new

class << biggest
include Comparable
def <=>(other)
other.equal?(self) ? 0 : 1
end
end

biggest>5 #=> true
biggest<=1000000 #=> false
biggest>["X"] #=> true
biggest>biggest #=> false
What we did: we defined the object, then we sort of declared sort of class that our
biggest
is sort of instance (in fact it's the object's eigenclass, but let's leave it for another post). Just understand that declaring the methods like we do it here is exactly like declaring them in the object's regular class, only they are accessible only for our
bigger
, and not for all the ``Object`` instances. It's defining methods just for one object (as there is only one biggest object, of course!).

The
other.equal?(self) ? 0 : 1
part might need an explanation. If we just returned
1
, then the object would be greater than all objects including itself, so
biggest>biggest
would yield
true
, and
biggest<biggest
would return
false
. We want the object to know that it is as big as it is, so when the object is compared with itself, we want it to know they are equal. That's why we compare it with itself. Now, why we use
equal?
and not
==
? Well, the method
==
defined inside
Comparable
calls
<=>
which in turn calls
==
and so on until
SystemStackError
. But the function
equal?
works in another way: it checks if the two object are the same
instances
:
a="abc"
a=="abc" #=> true
a.equal?("abc") #=> false # other instance of String
a.equal?(a) #=> true
a.equal?(a.dup) #=> false
So that's the method we're looking for - it will return
true
only if we compare
biggest
with
biggest
itself.

One word of a summary: if you define a method within a class, you have to instantiate the class to make the method really accessible to the world. But if you define a method within a module, you first have to include the module in a class, and then to instantiate the class, to be able to call the method.

Kernel and puts
What's this
Kernel
? It's a module that is included in the class
Object
, and thus in all the classes you ever define, as they all are descendants of the class
Object
. Now there comes the explanation how come you can write
puts
and it works.

If you open irb, or take an empty .rb file, you are inside a class. Check it:
irb(main):099:0> self
=> main
irb(main):100:0> self.class
=> Object
So, were inside some
Object
instance. So what happens if we write
puts
? We call our object's private method
puts
. We can call it even though it is private because we are inside the object (but if you try
self.puts
you'll see it is private; by the way, it is not like in Java here -
self
is just as any other object so you cannot call private methods on
self
, you can only do it without prefixing them with
self
). But the truth is that the method is not defined in the object - it is defined inside the
Kernel
module (as a private function), and in this way it got to our
main
object. And to any other object, too:
class K
def kk
puts "in K"
end
end
Now, the call to
puts
that you do when writing
k=K::new; k.kk
is a call to
k
's private method
puts
, which is there also because of mixing in
Kernel
into the class
K
(into the
K
's ancestor:
Object
, precisely speaking). So, if we had a class
StupidClass
, and we were tired of all the noise this class' instances do by
:puts
ing stupid things, we can mute all class' instances:
class StupidClass
def puts(*args)
# ignore
end
end
Other classes, including our
main
, will still be able to
puts
messages.

But if we don't want to mute the messages completely, but just to make them less noisy, we can do this:
class StupidClass
def puts(*args)
Kernel.puts(*args.map{|a| a.to_s.downcase})
end
end

stupid_object.puts "AbC",5,:XX
# Output:
abc
xx
Of course we have to call the
Kernel
's static method
puts
and not just write
puts
, because it would make a self-reference. Also note that the static method
Kernel.puts
is not the one that is included in the class
Object
.
Kernel
has two
puts
methods:
irb(main):009:0> Kernel.private_instance_methods.grep /puts/
=> ["puts"] # instance method, called by Object::new.puts
irb(main):010:0> Kernel.singleton_methods.grep /puts/
=> ["puts"] # static method, called by Kernel.puts
The two methods are independent - none of them calls the other.

2008-08-24

Object **arr;

What? What do these asterisks do? We're having Ruby here, and the whole old pointers things from C++ are gone, aren't they?

Well, they are not, sorry. Remember one thing, please. This pointers thing is not a part of C++ or any other particular language. Pointers are how computers work. Each (reasonable) programming language either has pointers, or is inefficient. Ruby has them too. The difference is that in Ruby we don't use asterisks to denote we're using them.

What's "wrong"

Have a look at the example that proves we have pointers in Ruby:
a=[2,3,5,7]   #=> [2, 3, 5, 7]
b=a #=> [2, 3, 5, 7]
a<<11 #=> [2, 3, 5, 7, 11]
a #=> [2, 3, 5, 7, 11]
b #=> [2, 3, 5, 7, 11]
So, as you see,
a
and
b
are just pointers. And when we make the substitution in the second line, we just make them point to the same object in memory (array, in our example), so when we modify the object using an in place changing method (like
reverse!
,
clear
and others), the change will be also visible through the other variable.

We observe exactly the same behaviour when we use a string instead of an array:
a="abc"       #=> "abc"
b=a #=> "abc"
a<<"x" #=> "abcx"
a #=> "abcx"
b #=> "abcx"
Note that if you use
+=
instead of
<<
, a new instance of the string with the
"x"
appended is created and assigned to
a
, so if you want to make a string buffer and append to it some lines, it's probably better to use
<<
, because it does not create another object.

So, what if we would like to have an independent copy of a given array or a string? Ruby comes with a function
dup
that makes a copy of an object. (In fact there's also a function
clone
that behaves similarily, for today let's assume the functions do exactly the same (they don't), and let's use
dup
.) Change the line
b=a
to
b=a.dup
in both examples above and you will see it works like expected - modifying
a
does not modify
b
and vice versa.

Happy? So, is that all for today? Not quite. Have a look:
a=["a","b"]
b=a.dup
a<<"c"
b #=> ["a", "b"] # as expected
a[0]<<"x"
a             #=> ["ax", "b", "c"]
b #=> ["ax", "b"] # oops!
What happened? Well, now
b
is a copy of
a
, which means they are two separate arrays, so when we add
"c"
to one of them, the other does not get modified, we already know that. But in Ruby everything is an
Object
, so the elements of the arrays are objects too, and, what worse, they are the same objects. When we did
b=a.dup
, we created a separate array, but the elements of the array are pointers to the same strings as the elements of the original array. Our copy is not deep, we separated the top-level objects but not the elements of the array. So when we modified in place one of the elements, it got modified also in the other array, even though the arrays are separate objects.

Another way to check what happens:
x=Object::new       #=> #<Object:0x2d8ce10>
x.dup #=> #<Object:0x2d91dfc>
[x] #=> [#<Object:0x2d8ce10>]
[x].dup #=> [#<Object:0x2d8ce10>]
As you see,
dup
on an object makes a new object (the addresses differ), but
dup
on an array makes a new array but does not make a copy of the elements - it just makes a new array with its elements pointing to the original objects.

How to "fix" it
OK, so let's change the behaviour of
dup
for
Array
so that it makes a deep copy calling
dup
recursively on all its elements. Good idea? Yes, but NO NO NO!

Never do such a thing as changing the behaviour of a standard function! Someone can depend on how it works now, so you cannot change it! Remember well this lesson! Even if you think it's broken, don't fix it in this way!

Sorry for shouting, but it is important. Let's define a new function with the functionality we've just described. Here's how we do it:
class Object
def deep_dup
dup
end
end

class Array
def deep_dup
map{|e| e.deep_dup}
end
end

class Hash
def deep_dup
h={}
each{|k,v| h[k.deep_dup]=v.deep_dup}
h
end
end
First we define
deep_dup
for "normal" object as a simple copy, as they do not need any special treatment. Then we redefine (override) the function for
Array
and also for
Hash
, as there's exactly the same case with hashes as with arrays. You can check that after changing
dup
to
deep_dup
in the examples above, everything will work as expected.

Of course there are also other classes like
Set
(available after
require 'set'
) that might need overriding
deep_dup
for them to work.

So why use dup?
Now another lesson: why at all use this strange-working
dup
if we have such a nice
deep_dup
? Well, the answer is simple: as
deep_dup
copies everything, it might use much more memory (and time) than
dup
. That's why I said that if a language doesn't have pointers (that is, if a normal substitution works just like our
deep_dup
), it is inefficient. Because the solution is not to stop using
dup
. The solution is to use it carefully and wisely. And to remember, that there are pointers under the nice skin of Ruby.

Other methods
If we need a deep copy, the approach described above is probably the best one, but not the only one. One of the most secure methods to create a complete deep copy of an object is to serialise it to a "soul-less" string and deserialise it back. Then we can be absolutely sure that no part of the new object will be a part of the original one, as deserialisation for sure creates the object from scratch.

Ruby provides two easy serialisation methods:
YAML
and
Marshal
.
require 'yaml'

class Object

def m_dup
Marshal.load(Marshal.dump(self))
end

def y_dup
YAML.load(YAML.dump(self))
end

end
Both
YAML
and
Marshal
have a method
dump
that returns a string representation of the object passed (you can see how the strings look like by calling
dump
on various objects in irb), and the method
load
that does the reverse.

Differences:
- as you can see,
Marshal
's string is shorter so probably it's better and more efficient.
-
YAML
does not always work. I don't know why this is happening, but some complicated structures with sets, arrays, strings and hashes fail to load from the dumped string.

Both methods are most probably worse (slower) than our
deep_dup
, because they need to parse the string.

Last problem - self-references
There's one point, however, where the serialisation method works, and our
deep_dup
fails. It is when an array is an element of itself:
a=[]
a<<a
a #=> [[...]] # the three dots denote a self-reference
a.deep_dup
SystemStackError: stack level too deep
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
... 15577 levels...
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):87:in `deep_dup'
from (irb):87:in `map'
from (irb):87:in `deep_dup'
from (irb):115
from (null):0

Aww, a failure, because duplicating
a
needs duplicating
a
first, and so on. That's why our method is not so perfect, and will fail also on examples like this:
a=[1,2,{:a=>4}]
a[2][:b]=a
a #=> [1, 2, {:a=>4, :b=>[...]}]
It can be fixed and handled just as it is handled in serialisation (it works without problems with such objects), and also in
inspect
(if it wasn't, you'd have an infinite output after creating such an object in irb), but I'll leave this as an exercise for the reader.

One more method
This last method is so bad that you should never use it, I just mention it to give you another knol to think about.
class Object
def i_dup
eval(inspect)
end
end
It calls the method
inspect
to create a human-readable representation of the object, just like irb does after each command, and then passes the string to
eval
that simply executes the string.

You should know by yourself why this method is bad, but just to make it clear:
- it only works for objects "made of"
Array
,
Hash
,
Numeric
,
String
,
Range
and
Symbol
instances (maybe some more I forgot about now), it won't work for
Object::new
,
- it doesn't handle self-reference (it fails when it sees the three dots).

That's all for today, I hope you learnt something new.

2008-08-17

Lazy evaluation

Let's make an array of all Fibonacci numbers. All of them? Yes, all of them!

Of course it's impossible, we would have to spend infinite time computing them and use infinite memory to store them. But we can trick users of our array into thinking it does have all the numbers precomputed in it. We will use lazy evaluation.

How it works? Well, our array will not really be an Array from which we can read fields at random. It will be an object, and each time user wants to get some number from it, we will peek into the user's request, and compute the number, if not already computed. This simple concept is the definition of lazy evaluation (opposed to eager evaluation, which is precomputing values before they are needed).

The object that will look like the array will be a module. We don't want to create a class, because we wouldn't want to instantiate it. It will work just like this:
Fi[10]
will yield
55
, for instance (we assume
Fi[0]==0 and Fi[1]==1
).

Let's start with the code, as usually.
module Fi

class << self

def initialize
@fi=[0,1]
end

def [](n)
raise ArgumentError,"Incorrect index!" unless n.is_a? Integer and n>=0
get(n)
end

def each
i=0
loop\
{
yield(get(i))
i+=1
}
end

private

def get(n)
want(n)
@fi[n]
end

def want(n)
until @fi[n]
@fi<<@fi[-1]+@fi[-2]
end
end

end

initialize

end
Note that this is not a good implementation, a good implementation would use Matrix form to compute them much faster, but we're talking about laziness here, so let's leave the good implementation to the eager folks (well, if you do it, it will probably increase your Ruby skills, so try, if you dare). Also what I want to demonstrate doesn't get lost between the matrices in this simple implementation, so let's stick with it for now.

So, we declared a
module
, and then there's this funny
class << self
in it. Well, for now, just remember that that's the way (one of the ways) to turn a module into sort of a singleton class - a structure that has functions and maintains its state, but cannot be freely instantiated. Also note how it is initialized: we defined initializer inside the class, but we have to call in explicitely after the class definition in the module (in fact we could call this function anything else).

After this introduction, just remember that the module can be treated like an object. We define this object's most important method:
[]
(defining a method
[](params)
enables us to use the object as an Array - one more teaspoon of the syntactic sugar).

After validating the argument, we call the function
get
, which calls
want
and after that reads the value from the field
@fi
, which is the underlying real array of Fibonacci numbers. The function
want
simply ensures that there are at least
n
numbers counted by counting consecutive numbers until the one we want is inserted into the array (
@fi[-1]
is of course the last element of the array - another teaspoon, very handy).

Time to test our module:
irb(main):042:0> Fi[10]
=> 55
Exactly what we wanted! Note that when we ask for
Fi[100000]
, the console hungs for a moment before giving us the result, but when we want the some number again, it works immediately (if you see a delay, it's caused exclusively by the time needed to print the output, to see the real time the computation takes, try
Fi[100000];nil
to suppress the output).

Now get back to our code and have a look at the
each
method. Arrays have methd
each
that calls the passed block once for each element. Our object is a bit like Array so why not add this option? The only problem is, of course, that the function will never finish, unless user breaks it like this:
Fi.each\
{ |f|
puts f
break if f>100000
}
In fact the
break
breaks the loop inside the method, and lets it finish.

Again, our
each
is implemented badly. It should precompute some number of numbers ahead to reduce the number of calls to
get
. Well, calling
get
and not
[]
is already an optimisation, because thanks to this we don't validate the argument each time inside
each
.

We could also implement
each_with_index
and others, also
map
could be implemented, but it would have to behave differently from what it does for Arrays, because if we
break
inside
map
of an Array object, it returns
nil
, and our
map
would have to return the already computed part of the array. But definitely the most important change would be to introduce the matrix form to the calculations, so that if we ask for just one number, not all previous numbers would have to be calculated. I leave it as an exercise for the reader.

2008-08-16

IO::popen

Today a short post about running external program from Ruby. I'll start with giving you a nice link: 6 Ways to Run Shell Commands in Ruby. Today I'm going to concentrate on only one of the methods described there:
IO::popen
.

A small Ruby program I use when I access a PostgreSQL database on my local machine. What it does (in order):
- ensure PostgreSQL server is not running
- start the server
- start the pgAdmin
- wait for the pgAdmin to finish
- stop the server.
POSTGRESQL_DIR="C:/Program Files/PostgreSQL/8.2"
SERVER_PATH="\"#{POSTGRESQL_DIR}/bin/pg_ctl.exe\""
DATA_DIR="\"#{POSTGRESQL_DIR}/data/\""

START_COMMAND="#{SERVER_PATH} start -D #{DATA_DIR}"
STOP_COMMAND="#{SERVER_PATH} stop -D #{DATA_DIR}"
ADMIN_PATH="\"#{POSTGRESQL_DIR}/bin/pgAdmin3.exe\""

Process.waitpid(IO::popen(STOP_COMMAND).pid)
puts "PostgreSQL Server stopped."
IO::popen(START_COMMAND)
puts "PostgreSQL Server started."
adm_pid=IO::popen(ADMIN_PATH).pid
puts "pgAdmin started."
puts "Waiting for pgAdmin to finish..."
Process.waitpid(adm_pid)
puts "pgAdmin closed."
Process.waitpid(IO::popen(STOP_COMMAND).pid)
puts "PostgreSQL Server stopped."
As you see, I use
IO::popen
to spawn a process, and also I get the process'
pid
to wait for it to finish using
Process.waidpid(pid)
.

One more thing I was trying to do is to redirect the Server console output to the program's own console. I succeeded with this, but for some reason I couldn't make the
Process::pid
work when I was redirecting the output. Anybody knows why?

Here's how I was redirecting output:
Thread::new(IO::popen(START_COMMAND))\
{ |srv|
until srv.eof?
print srv.readpartial(1024)
end
}
As I say, the redirection was working fine, but if there was no input from the server process and the pgAdmin finished at that time,
Process::pid
was still waiting (it interrupted when there came some input from the server console, though).

I tried to use several other variations with
sysread
,
readpartial
and
select
, but none was better, some even worse than this one. Thanks for any suggestions.

One day I'll post about how to connect to the database from Ruby, too.

2008-08-15

rescue

About exceptions today. First - if you're new to exceptions, probably 'exception' means to you 'an ugly error message, full of Access denied, memory under 0x462F3ED4 cannot be read stuff, contact the sucker who sold you this product.' Time to change this way of thinking!

Why the hell they invented exceptions
Imagine a complex system, let's say, a system to operate some machine in a factory. The system is divided into layers, that is (for example, in order) direct motor control, motor control proxy, motor controller, control logic, user commands layer, several other layers, the machine operation layer, several more layers, and finally the user interface, where user wants the machine to do something.

So you can imagine, that when user presses a button, then a function from user interface layer calls a function from the next layer, that calls a function from the machine operation logic layer, that... that finally sends a signal to the inverter to run the motor. And now imagine that the motor is completely broken, burnt, stalled or stolen. It's a critical error, so user must be immediately informed about this, and no further action of any of the layers is needed.

The naïve solution: each of the functions should return an integer with error code, and each function, when calling the next one in the chain, should check its return value, and if it signalises an error, it should return immediately with the same error code, until the last function (the first that was called) gets the message. It's a good solution, meaning that it can work. But it's a very bad solution:
- The code becomes ugly and long.
- Functions cannot return any other value because they already return the error code, and it complicates even further.
- If a function has to do something no matter if it succeeded or not (like close a file or release resources), and it complicates even further than further.
- The code becomes ugly and long.
- The code becomes ugly and long.
Have a look at an example:
def prepare()
if motor_in_bad_mood?
return 225 # the error code for the problem
end
inverter.init()
return 0
end

def do_it()
if ufo_stole_the_cables?
return 843 # the error code
end
inverter.operate()
# further operation
return 0
end

def almost_do_it()
f=allocate_resources()
if (ret=prepare())!=0
f.free_resources()
return ret
end
if (ret=do_it())!=0
f.free_resources()
return ret
end
f.free_resources()
return 0
end

def user_says_do_it()
if (ret=almost_do_it())!=0
puts "The operation returned the error code! (#{ret})"
end
end
If you're not completely blind, you see the ugliness of this code. (Of course in Ruby we could do some improvements, but imagine it is C. And remember there is a lot more functions that call each the next one.)

Any idea for a solution? Well, the best one is, if the most inner function could inform the most outer function that something's completely wrong. But how to do it? And what if we don't want to inform the most outer function, for example because the problem is not so critical, and can be resolved by the program?

Let's finally present the solution with exceptions, or
Exceptions
, as we should call them now.
class CriticalMotorError < RuntimeError
end

def prepare()
if motor_in_bad_mood?
raise CriticalMotorError,225
end
inverter.init()
end

def do_it()
if ufo_stole_the_cables?
raise CriticalMotorError,843
end
inverter.operate()
# further operation
end

def almost_do_it()
f=allocate_resources()
begin
prepare()
do_it()
ensure
f.free_resources()
end
end

def user_says_do_it()
begin
almost_do_it()
rescue CriticalMotorError => e
puts "The operation returned the error code! (#{e.message})"
rescue RuntimeException => e
puts "Something even worse happened: #{e.message}."
end
end
Notice any differences? Let's explain what happened. First, we declared our error class, descendant of the standard error class in Ruby,
RuntimeError
(which is a descendant of
Exception
used for most error that happen during program runtime). Our exception class doesn't do anything special, it just inherits from the ancestors.

Now, when an error occurs, we
raise
an exception (in other languages the keyword throw is often used here). That means that we create a new instance of our exception, pass it an argument (exception simply stores it as an error message and does nothing with it), and then
raise
thus created exception. Raising means that all further actions in the current function are aborted, and the control returns to the higher function, but here also all actions are aborted, and the function exits immediately, and all functions in the chain exit in a row. If we just threw an exception and then didn't take care of it, it would exit all the functions, and finally also exit the program with an error message (try to type this in irb:
def x;0/0;end;def y;x;end;def z;y;end;z
to see this behaviour in action).

But here we don't want the program to exit. So we make a trap. A trap is the
begin
and
end
in
user_says_do_it
, and the
rescue
. Basically, if things that are called after
begin
throw an exception, and the exception is mentioned in one of the
rescue
clauses, then the rest of the block after
begin
is skipped and the control goes to the right
rescue
clause (the first that mentions the actual class of the exception), and then resumes after
end
, and continues to run the program normally (the exception is cancelled once the control enters the
rescue
clause, and the exit-immediately madness stops).

What we do in the
rescue
clause, we print an error message, including the message (error code) that we read from within the object
e
which is our exception. Simple, elegant, painless.

As you see, there's one more trap, inside
almost_do_it
. It also detects exceptions raised from the block, but it doesn't rescue them, it just isn't interested in what really happened, or it decides it doesn't have power to serve any errors correctly, so it just lets the exceptions pass through, also skipping the rest of the block (so if the exception was raised by
prepare
,
do_it
won't even try to execute. But here's the trick: the
ensure
block gets executed no matter what happened inside the
begin
block. It executes both when the block completes normally, and when it is interrupted by an exception, but
ensure
doesn't cancel the exception, it just stops for a moment to do what it has to do, and goes on with the unrolling madness.

As you see, the exceptions are very useful, simple, elegant, powerful and in general good. Use them!! Learn them, use them, think about them, or else you're not a programmer for me.

Final remarks about exceptions
The two following examples are equivalent:
begin
try_something()
puts "Success." # of course this line is executed only if no exception was raised
rescue SomeError => e
error()
# possibly more rescue clauses
end
And the second, looks a bit nicer for me:
begin
try_something()
rescue SomeError => e
error()
# possibly more rescue clauses
else
puts "Success."
end

Another remark.
begin
# ...
rescue RuntimeError
# ...
rescue CriticalMotorError
# ...
end
This is useless, because our
CriticalMotorError
is also a
RuntimeError
, so the first
rescue
will be triggered, and always when a
rescue
is triggered, all following
rescue
s are skipped and not even checked.

Also, as you see, the
=> exception_var
can be omitted. The exception class name can be omitted also, and it defaults to
StandardError
. For standard exception classes, check QuickRef, the part Exceptions, Catch, and Throw (
catch
and
throw
are not so good tools, though).

The last remark. If a function in the calling chain thinks it cannot serve the exception, but thinks that it could add some additional info to the error, it can
rescue
it and either throw a new error with some more data (possibly of some other class than the original exception), or it can do something like this:
begin
# ...
rescue SomeException => e
puts "The exception was rescued: #{e.message}"
puts some_additional_info
raise e
end
In this way, the exception gets partially served, and then reraised so that it does not get cancelled at this point.

Use it, use it, use it!

2008-08-14

Downloading a file

Today we will download a Garfield comic strip. We will not display it, as displaying is a whole lot harder, it needs a window and so on, maybe I'll cover this subject one day, but for now just download it to your hard drive.

Let's start with code and then follow with step-by-step explanation.
require 'date'
require 'open-uri'

GARFIELD_START=Date::new(1978,6,19)

puts "When were you born? (YYYY MM DD please)"
print "?> "
t_date=gets.strip.split(/[^0-9]+/).reject{|e| e.empty?}.map{|e| e.to_i}
if t_date.length!=3
puts "YYYY MM DD, I said!"
exit
end
date=Date::new(*t_date)
if date<GARFIELD_START
puts "You are older than Garfield, so no comic strip for your Birthday."
exit
end
remote=date.strftime("http://images.ucomics.com/comics/ga/%Y/ga%y%m%d.gif")
local=date.strftime("C:/Garfield %Y-%m-%d.gif")
data=open(remote).read
File::open(local,"wb"){|f| f<<data}
puts "Comic strip for your Birthday downloaded."
First, we need two additional libraries, so we
require
them. The first one adds a lot of functionality to the class
Data
(for manipulating time, if you have not guessed that), and the other makes it possible to download files from the internet very easily (of course there are also ther ways to do it, like to issue a regular HTTP connection and so on, but leave it for another day).

GARFIELD_START=Date::new(1978,6,19)
- we define a constant (in Ruby, if the first letter of a variable name is a capital leter, then it's a constant). This is the first day for which Garfield is available on web. The constructor takes year, month and day.

Now we print a question and a prompt. And now the next line:
t_date=gets.strip.split(/[^0-9]+/).reject{|e| e.empty?}.map{|e| e.to_i}
first we call
gets
- it reads a line from standard input, that is, from console. Then we do some magic with it, and why we do it is that we want to make it possible to enter 2000 01 01 as well as 2000-01-01 or 2000/01/01 or bwah2000----01??01yeah. We want to be flexible.

So first we call
strip
to strip what user has entered of white characters at the beginning and the end (this is not really necessary here but let's do it anyway). Now we want to get from the string all digit groups. As you should already know from some previous post, this should work:
.scan(/[0-9]+/)
, but here I wanted to use another (worse) way to do it, to teach you something new. So we do not scan the string for groups of digits, we split the string by groups of non-digits instead. That means that all groups of non-digits become separators and are left out, and what was between them is returned in an array.

To test how exactly this works, simply enter something like
"bwah2000----01??01yeah".split(/[^0-9]+/)
in irb. You will notice that it works, the only problem is that the returned Array has one more element than we wanted:
["", "2000", "01", "01"]
(that's why
split
is worse than
scan
here). This is, of course, because the string began with a non-digit group, and when it became separator, what was before it became an element.

And that's why we call
.reject{|e| e.empty?}
now. What it does? It executes the block once for each element of the array, but it not only executes it, but also checks what the block returned. The block returns
true
for empty elements, and
false
for other. The method
reject
, as the name says, rejects from the array these elements, for which the return value of the block was
true
. So this will simply delete the empty elements, in our case only the first element can be empty. You can apped the call to theis function to your irb line to check it.

So finally we have three number (for correct input), but they are still not numbers. You see? They are in quotes, they are parts of the input string so they are Strings. So we want to convert them all to Integers. We do it with the last element in the chain:
.map{|e| e.to_i}
. This function again calls the block with each element in turn, and it exchanges each element in the array with what the block returned for this element. Best if you call it in irb to see.

Now we check if we finally have 3 numbers.
exit
exits the whole program.

date=Date::new(*t_date)
- here we create the
Date
element for the specified date. The asterisk before the argument is the splash operator and it makes that our array of 3 elements is not passed as Array, but as 3 separate arguments for the function.

The dates comparison does not need explanation.

Now we use
strftime
to create strings that have parts of the date in them. Best check the results in the console.

Now we do what the included file
'open-uri'
allowed us to do - we open a remote file simply by calling
open(url)
, and read data from the file. All in one line! The data is stored in a vriable as String, but here String means just that it is a string of bytes, and not something readable.

After that, we open the local file on your hard drive. We open the file with the second argument
"wb"
to denote that we want only to write to the file (and overwrite it, if already exists), and that the data we want to operate on is binary. This is very important! If you do not specify binary data and write or read binary data, something will go wrong, almost always. Remember.

Now, how do we use files. We could do it like this:
file=File::open(name,mode)
# operations on file
file.close()
But then we have to remember to close the file, especially if we write to i, or else the data won't get flushed to disk. But we can also pass a block to
File::open
, and then the method doesn't return the file, instead it calls our block and passes the newly opened file object to it, and after the block finishes, it closes the file gracefully, so that we do not have to do this. This is a good way to write to files, more elegant and safer. (Note that here the block gets executed only once. Do not associate a block with a loop, it's the called function that decides what to do with the passed block, and this thing that
File
does it is also a common behaviour.)
File::open(name,mode)\
{ |file|
# operations on file
}
So inside the block in our program, the variable
f
is the opened local file. Now we just write the data to it (
<<
is the same as
write()
), and finish the program. Check that it works!

One question might arise, why didn't I just write
f<<open(remote).read
. Well, if you had some connection error so that
open
would fail and interrupt the program, you would already have an empty file on your hard drive, and it would remain there and you would have to remove it manually (or overwrite by running program again, successfully). But when you first read data and only then open the file, then in case of error, the file opening line doesn't even get executed, and the file is not created.

2008-08-13

Non-standard standard methods

Hello there again. Today we're discussing one useless and potentially distructive thing. But it's gonna be very educational.

The mad machine
There's a story about two engineers: Trurl and Klapaucius. They built once a machine that was absolutely sure that two plus two is seven. They had a lots of problems as the machine was chasing them to prove it was right, so be aware today!

When you open irb and write
2+2
, do you suppose the stupid machine could answer
7
? Well, let's imagine I'm Trurl and you're Klapaucius. Copy the following into your irb (you can copy all lines at a go):
class Fixnum
alias q +
def +(b)
if self==2 and b==2
7
else
q(b)
end
end
end
Try now:
2+2
Oh no, RUN, the machine is MAD! ... ahh no, sorry. We just changed the way that numbers get added to each other. Let's analyse what we just did here. First, we said
class Fixnum
. That would create a class, if only the class had not already existed. But if it does exist, this instruction does not do anything, it just signalises that we're going to add or change something inside this class. Note here the difference in behaviour between "redeclaring" classes and methods: when you declare again the same method, it destroys the old one (I explained this in the previous post), but if we "redeclare" a class, we just enter the inside of the class. Opening a class in this way is never destructive.

OK, what now? Let's skip for a moment the second line. We want to redeclare the method
+
. We do it here:
def +(b)
. As you see, we can define operators just like any other methods. This also means that writing
10.+(5)
in irb will work just like
10+5
- they both simply call a method
+
of the object
10
with the argument
5
.

So what we do inside this newly defined addition? We check, if both the object whose method it is (
self
) (that is: the first operand), and the second operand
b
are 2, we return 7. And if not, we're calling a method
q
(it's a one-argument method, wait a moment) (it could be written:
self.q(b)
, because it is method of the current object, that is the first operand).

So what's that method
q
? This method is defined in this line
alias q +
, and this line means: define method
q
that does exactly the same as the method
+
. Now note it well, that the
alias
line is above the
def
line, so the new method
q
is exactly the same as the old method
+
, not the new one that we just define below! So now we've covered the method
+
with the new one, so the old one is not accessible under its name
+
, but, what a lucky coincidence - we've made a "copy" of the old method
+
by giving it a new name
q
. So now, from within the new adding method, we can call the old one under its new name, and in this way we are able to add the two numbers if they are not two and two. That's how we maintain the old functionality while adding new one under the same name (the same method name, namely).

What are you doing, irb?
Let's try a variation now. (Close the console and open a new one to get rid of the mad machine.)
class Fixnum
alias q +
def +(b)
puts "#{self}+#{b}"
q(b)
end
end
Note that we used a very useful notation here: if you type
"a String #{with_something_inside} encosed like this,"
the thing in the braces gets evaluated and the result is inserted into the string.

Now let's type something trivial in the console now. I typed this:
"Al2O3::Cr"
:
irb(main):009:0> "Al2O3::Cr"
8+1
85+1
0+1
86+1
1+1
86+1
1+1
87+1
2+1
88+1
3+1
89+1
4+1
90+1
5+1
91+1
6+1
92+1
7+1
93+1
8+1
94+1
9+1
95+1
10+1
96+1
9+1
=> "Al2O3::Cr"
irb(main):010:0>
Wow, now we know what additions were performed by irb! Usually you cannot tell what these additions really do, but the last one is probably the line number incrementation in irb. Other of them are responsible for parsing the string and displaying it.

inspect
One more trick. You must have noticed that the objects can be displayed in two ways: as an internal representation, or as a displayable thing:
irb(main):001:0> obj="a \"string\"\nsecond line"
=> "a \"string\"\nsecond line"
irb(main):002:0> puts obj
a "string"
second line
As you see, when we
puts
a string, the result is something else than when we just use it, and irb shows us how it looks after the
=>
sign. The same is true for other types than String:
irb(main):003:0> obj=["a",5,:x]
=> ["a", 5, :x]
irb(main):004:0> puts obj
a
5
x
=> nil
irb(main):005:0> obj.to_s
=> "a5x"
Even one more representation here, obtained by calling
to_s
. So, how does it work?

The method
to_s
for a String returns itself, and for Array returns concatenation of elements'
to_s
's ( Note that this is going to change in Ruby 1.9!). But when irb wants to show us a result of an operation, it doesn't use
to_s
. It uses the method
inspect
. Try it:
[1,"a"].inspect
returns something that looks like
"[1, \"a\"]"
in irb, and what is really
[1, "a"]
, that is exactly how irb presents this value after the
=>
sign. You can emulate irb's behaviour like this:
puts obj.inspect
or easier
p obj
, but you are not very likely to use it in real programs, save for debugging purposes.

Let's do a little change now:
class String
alias inspect_old inspect
def inspect
inspect_old.gsub(/(^")|("$)/,"*")
end
end
We replace the method
inspect
with a new one, that calls the old one, and then replaces the quotation marks (
"
) at the beginning and the end of the result of the inspection with asterisks. Now just try it out:
irb(main):019:0> "Susan"
=> *Susan*
Do you like it?