2009-02-28

Gmail Notifier using IMAP

A simple program to check if there are any new messages in our Gmail inbox.

We will connect to Gmail using IMAP protocol and get a list of new (unread) messages from mail server. Here's more or less the code that does it:
require 'net/imap' 

class GNotifier

GMAIL_IMAP="imap.gmail.com"

LOGIN="login@gmail.com" # or just "login"
PASSWORD="password"

def initialize
@envs={}
end

def check
begin
unless @imap
@imap=Net::IMAP::new(GMAIL_IMAP,993,true,nil,false)
@imap.login(LOGIN,PASSWORD)
end
@imap.select("INBOX")
ids=@imap.search(["NOT","SEEN"])
uids=ids.empty?? [] : @imap.fetch(ids,"UID").map{|e| e.attr["UID"]}
@envs.reject!{|uid,env| not uids.include?(uid)}
new_uids=uids-@envs.keys
if new_uids.empty?
new_envs=[]
else
new_envs=@imap.uid_fetch(new_uids,"ENVELOPE").map{|e| e.attr["ENVELOPE"]}
new_uids.each_with_index{|uid,i| @envs[uid]=new_envs[i]}
end
new_mail(new_envs) unless new_envs.empty?
rescue ThreadError, Errno::ECONNABORTED, Timeout::Error, IOError => e
@imap=nil
retry
end
end

def new_mail(new_m)
# ...
end

end

The code is just a draft but shows the most important part. Let's explain it a bit.

First,
@imap
is the instance of the IMAP connector. It is created once (in the first call to
check
and if nothing goes wrong, all subsequent calls to
check
do not create a new connection and do not log into the mail system, but use the previously created one. It is deleted and renewed in case of an error, though (in the
rescue
clause.

The field
@envs
is a hash holding envelopes of each new message in the inbox associated with this message's UID (unique identifier). At the beginning, it is empty.

So now how we fetch new mail: first we call
@imap.search
to get IDs (not UIDs, don't mix up the two) of all messages that do not have the SEEN flag set. Then we fetch these messages' UIDs (the ternary operator is here because
fetch
fails with empty
ids
).

So now we have the UIDs of all new messages, and we can compare it with the list of messages that we have already fetched. First, we
@envs.reject!
all messages that were new but now are not (this means that they have been deleted or marked as read, it doesn't matter for us). Then we compute the list of
new_uids
- UIDs of new messages that are new for the first time (they were not on our list) and for those messages we get some more info - the ENVELOPE - into
new_envs
and then add them to
@envs
. Finally, we call
new_mail
and pass all new new mail that arrived. This method can be also left unimplemented if we just want to know what new messages lie on the server (this info is in
@envs
of course) and do not necessarily want a notification when a new new message arrives.

Some technical details
When creating the connector, we could have written just
Net::IMAP::new(GMAIL_IMAP,993,true)
but it will not work in Ruby 1.9, where the last parameter (authenticate) is true by default.

The line
@imap.select("INBOX")
could be called within the conditional above it, but then somehow not all new messages can be accessed by IMAP. It sort of refreshes the inbox.

The ENVELOPE attribute that we download from the server contains information that would be on the envelope of a regular letter: sender, receiver(s), date, also subject. All accessed simply by method calls. Helpful link: Envelope.

If you prefer to download the whole message and not just the envelope then use the property BODY instead. If you want something more specific, look into the documentation, for example here: Net::IMAP. Note that Gmail does not support some of the commands, like
sort
for instance.

2009-02-22

Enumerator

Do you remember the Fiber? If not, better have a look there before reading on. I will show another use of fibres, this time we won't see them, but they are there, in the guts of the
Enumerator
.

One can use enumerators in either of the two main ways:

Virtual array
If you have read the post about fibres, you are already familiar with the virtual arrays. An enumerator is an easier and a bit more automated way to create a virtual array.

Let's say we'd like to observe a HOTPO sequence starting at any chosen number. As we know, the sequence might be infinite, so it wouldn't be very wise to create an array holding all the elements. But we can create an enumerator to iterate over them, like this:
def hotpo(v)
Enumerator::new\
{ |y|
loop\
{
y<<v
break if v==1
v=(v&1>0) ? 3*v+1 : v/2
}
}
end

hotpo(27).each{|x| print x," "}

The function takes the first value of the sequence as an argument, and returns an enumerator. We create an enumerator of this type by passing it a block and putting the sequentially computed values in the block argument. So: the
y
is the virtual array itself, say good evening.

The loop is not infinite, at least not in any of the known cases, because no infinite HOTPO sequence has been found. But if we remove the break, we could see that the program doesn't hung, even though it has the infinite loop in it! Well, it does hung, but it still produces the output, so it's not more hung than an open word processor.

This behaviour is very similar to this presented in the post about fibres, and that's because the enumerator uses them. If you have understood the fibres well, you could try to implement this kind of enumerator yourself - just as a small exercise for the reader.

View of an enumerable
The other way of using enumerators (the more standard way, I'd say) is to create them from existing enumerable objects. The idea of an enumerator is to allow only very limited access to the underlying enumerable.

For example,
an_array.each
(without passing a block) is an enumerable which can be regarded as a safe read-only view of the array. It does not allow the user to call any other method of the array but
each
and its derivate methods. So you can call
an_array.each.each{...}
or
an_array.each.map{...}
, but the call to
an_array.each_map!{...}
, even though legal, will not modify the underlying object. But still
an_array.map!.each{...}
is able to do so.

The general idea of using chained calls of enumerating methods is:
- If the first method is
each
, then any non-modifying method can be used as the second call.
- If the first method is something else, the valid second methods are
each
which simply forwards the call to the first method, and
with_index
, which does the same but also passes the element index to the block.

So the following, even though perfectly legal, makes no sense:
an_array.select.map{...}
. One of the methods should be neutral, and the neutral methods are
each
and
each_with_index
(or just
with_index
). So apart from making a save view, the main advantage of using enumerators is the possibility to call
an_array.map.with_index{|e,i| ...}
or even
an_array.select.with_index{|e,i| ...}
.

Note that the methods
all?
and
any?
do not return useful when called with no block, so you cannot make these checks with index.

2009-02-21

ASCII Art

Ruby 1.9 introduces some nice ways to write less code and make it look more mysterious at the same time. It's enough to have a look at the following pieces of ASCII art, each line is a valid expression in Ruby 1.9:

->(){}[]
0-->(){0}[]<--0
{x: :x}
{:+@=>->{:-@}}

What they mean? First, there's a new syntax for defining lambdas:
->(args){body}
, and when defining a lambda with no args and no body, and then calling it by
[]
, you obtain the first line. There's also another way to call a lambda or a proc now:
some_lambda.(args)
. This allows us to write
->()[].()
, of someone finds this even more confusing than the first option.

The second line should not be a problem now, it says
(0 - ->{0}.call) < -(-0)
. Yes, in Ruby even
--------1
is a valid expression. It's as they say in the primary school: minus and minus gives plus.

There's also a new syntax for defining hashes that have symbols as keys:
{k:val}
is equivalent to
{:k=>val}
. In our example however one has to put a whitespace between the two colons or else the interpreter is confused, because two colons is another token - for calling a function or getting a constant.

And the last line is just some creative nothing. It uses the symbols
:+@
and
:-@
which are normally used as method names for unary plus and minus. See yourself:
5.-@()
gives
-5
.

There's a lot of articles (and blog entries on various Ruby blogs) that cover the differences between Ruby 1.8 and Ruby 1.9 so if you're interested, just look for them and you'll find easily. One that is worth reading if you'd like to know more than is usually contained in short presentations:

Ruby 1.8 vs Ruby 1.9

And a nice wrap up: Useful Ruby 1.9 links

2009-02-20

Fiber

I haven't been here for quite a while. That's simply because I'm not a blogging kind of guy.

Let's have a look at one of the new features in Ruby 1.9.

class Fiber
You can think of a fibre as of a separate thread, but not a thread that is running all the time in the background, but rather one that is responsible for some specialised tasks and is activated only to do some job and return some results.

Also, a fibre is not a thread.

Let's skip to some code:

FILE=__FILE__

require 'fiber'

reader=Fiber::new\
{
File::open(FILE){|f| f.each_line{|l| Fiber.yield(l) unless l.strip.empty?}}
}

puts "Let's get a line: #{reader.resume}"
puts "Let's get anothe line: #{reader.resume}"
puts "And the rest:"
puts reader.resume while reader.alive?


This produces the following output:

Let's get a line: FILE=__FILE__
Let's get anothe line: require 'fiber'
And the rest:
reader=Fiber::new\
{
File::open(FILE){|f| f.each_line{|l| Fiber.yield(l) unless l.strip.empty?}}
}
puts "Let's get a line: #{reader.resume}"
puts "Let's get anothe line: #{reader.resume}"
puts "And the rest:"
puts reader.resume while reader.alive?
#<File:0xb114bc>


So first: let's have a look at the fibre code (I use the word fibre and not fiber because I prefer the British English; the truth is each time I want to use the class
Fiber
I first spell it
Fibre
and must correct later; I'll make an alias one day). The fibre opens the file and calls
Fiber.yield
passing each consecutive nonempty line to it. Think of it like this: it creates a virtual array containing all the elements that you put there, and
Fiber.yield
is putting and element in it. So we have a virtual array containing all the nonempty lines of code from
__FILE__
.

Of course the array does not exist - it is only a way to imagine how the fibre works. This fact saves memory - you don't have to load all the lines to memory before accessing them. Think of how to write this simple program without a fibre (assume that the file you're going to display is like 1G and you cannot simply load it all to memory) - I'm quite sure that using a fibre is one of the best ways to do it.

Now, how do we read from this virtual array? To read an element we call
reader.resume
. Simple, isn't it? Analyse the output to see that it did what we had expected: first it printed the first line, then the second (nonempty) line, and then the rest.

There's only one mysterious thing at the end:
#<File:0xb114bc>
. The explanation is: the virtual array created by a fibre is filled by all calls to
Fiber.yield
, and when the fibre finishes, its final value (the value of the last operation within the fibre block) is also added to the array. In our case, the last (and only) operation is opening the file and
File::open
returns the created file stream, so it was also added to the array. One can like this feature or not, but one has to live with it. So if we didn't want this line of output, we can change the end of the code to this:
puts "And the rest:"
loop\
{
l=reader.resume
break unless reader.alive?
puts l
}

Now it works like it should. More lines of code but oh well.

fiber.resume(*args)
We've seen that if you pass an argument to
Fiber.yield
then it becomes the value of the
fiber.resume
call. This enables you to pass data from the body of the fibre to the outer world. Passing data is also possible in the opposite direction, and is by no means harder. As you might have already guessed: arguments passed to
fiber.resume
are the value of
Fiber.yield
. So if we wanted a writer instead of a reader:
FILE="test.txt"

require 'fiber'

writer=Fiber::new\
{
File::open(FILE,"w")\
{ |f|
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}

writer.resume "Line 1"
writer.resume "Line 2"
writer.resume "Line 3"
writer.resume

Why we have to create a
loop
inside the fibre and break from it? Simply because now it's the outer world that decides when to finish the fibre. It signals the fibre to close the file (and add the
"---"
just for our information that the file was closed properly). If we remove the last line of the code (the one that calls the writer with no argument), we'll see that the file won't have
---
added at the end. It will be properly closed due to the finaliser hidden inside
File::open
but it will be closed and released no sooner than the whole program ends so in general it is a good idea to force file close manually.

But wait, there's no line 1 in the file! Yes, it's not there and that's why: the first call to
writer.resume
did not correspond with a call to
Fiber.yield
from within the fibre because at the time of this call the fibre has not yet been started, so it was not waiting on
Fiber.yield
but at the beginning of its block. So the line 1 just activated the fibre, but did not save the line to the file.

What's the solution? First: to get the value of the first call to
writer.resume
you have to add arguments to the fibre block itself. So one of the solutions is like this:
writer=Fiber::new\
{ |l0|
File::open(FILE,"w")\
{ |f|
f.puts(l0)
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}


But it doesn't look to nice, nor it is. In our case a best solution might be to use the first call as a special case and pass the file name in it, like this:
require 'fiber'

writer=Fiber::new\
{ |file|
File::open(file,"w")\
{ |f|
loop\
{
l=Fiber.yield
break unless l
f.puts l
}
f.puts "---"
}
}

writer.resume FILE
writer.resume "Line 1"
writer.resume "Line 2"
writer.resume "Line 3"
writer.resume

For most cases I'd use this form.

Of course there are much more uses of
Fiber
, also such kinds that use passing values in both directions simultaneously, not just in one of them, like in the above examples.

Producer - Consumer
Here's one more way of looking at the whole fibre thing: it's sort of the producer - consumer pattern, with a queue of size limited to zero. In this way the element is produced no sooner than it is needed and most of the time there are zero elements waiting on the queue. Only as the fibre is not a thread, there are no synchronisation problems and so on and so on.

I hereby certify the Fiber class for everyday use.