2008-08-14

Downloading a file

Today we will download a Garfield comic strip. We will not display it, as displaying is a whole lot harder, it needs a window and so on, maybe I'll cover this subject one day, but for now just download it to your hard drive.

Let's start with code and then follow with step-by-step explanation.
require 'date'
require 'open-uri'

GARFIELD_START=Date::new(1978,6,19)

puts "When were you born? (YYYY MM DD please)"
print "?> "
t_date=gets.strip.split(/[^0-9]+/).reject{|e| e.empty?}.map{|e| e.to_i}
if t_date.length!=3
puts "YYYY MM DD, I said!"
exit
end
date=Date::new(*t_date)
if date<GARFIELD_START
puts "You are older than Garfield, so no comic strip for your Birthday."
exit
end
remote=date.strftime("http://images.ucomics.com/comics/ga/%Y/ga%y%m%d.gif")
local=date.strftime("C:/Garfield %Y-%m-%d.gif")
data=open(remote).read
File::open(local,"wb"){|f| f<<data}
puts "Comic strip for your Birthday downloaded."
First, we need two additional libraries, so we
require
them. The first one adds a lot of functionality to the class
Data
(for manipulating time, if you have not guessed that), and the other makes it possible to download files from the internet very easily (of course there are also ther ways to do it, like to issue a regular HTTP connection and so on, but leave it for another day).

GARFIELD_START=Date::new(1978,6,19)
- we define a constant (in Ruby, if the first letter of a variable name is a capital leter, then it's a constant). This is the first day for which Garfield is available on web. The constructor takes year, month and day.

Now we print a question and a prompt. And now the next line:
t_date=gets.strip.split(/[^0-9]+/).reject{|e| e.empty?}.map{|e| e.to_i}
first we call
gets
- it reads a line from standard input, that is, from console. Then we do some magic with it, and why we do it is that we want to make it possible to enter 2000 01 01 as well as 2000-01-01 or 2000/01/01 or bwah2000----01??01yeah. We want to be flexible.

So first we call
strip
to strip what user has entered of white characters at the beginning and the end (this is not really necessary here but let's do it anyway). Now we want to get from the string all digit groups. As you should already know from some previous post, this should work:
.scan(/[0-9]+/)
, but here I wanted to use another (worse) way to do it, to teach you something new. So we do not scan the string for groups of digits, we split the string by groups of non-digits instead. That means that all groups of non-digits become separators and are left out, and what was between them is returned in an array.

To test how exactly this works, simply enter something like
"bwah2000----01??01yeah".split(/[^0-9]+/)
in irb. You will notice that it works, the only problem is that the returned Array has one more element than we wanted:
["", "2000", "01", "01"]
(that's why
split
is worse than
scan
here). This is, of course, because the string began with a non-digit group, and when it became separator, what was before it became an element.

And that's why we call
.reject{|e| e.empty?}
now. What it does? It executes the block once for each element of the array, but it not only executes it, but also checks what the block returned. The block returns
true
for empty elements, and
false
for other. The method
reject
, as the name says, rejects from the array these elements, for which the return value of the block was
true
. So this will simply delete the empty elements, in our case only the first element can be empty. You can apped the call to theis function to your irb line to check it.

So finally we have three number (for correct input), but they are still not numbers. You see? They are in quotes, they are parts of the input string so they are Strings. So we want to convert them all to Integers. We do it with the last element in the chain:
.map{|e| e.to_i}
. This function again calls the block with each element in turn, and it exchanges each element in the array with what the block returned for this element. Best if you call it in irb to see.

Now we check if we finally have 3 numbers.
exit
exits the whole program.

date=Date::new(*t_date)
- here we create the
Date
element for the specified date. The asterisk before the argument is the splash operator and it makes that our array of 3 elements is not passed as Array, but as 3 separate arguments for the function.

The dates comparison does not need explanation.

Now we use
strftime
to create strings that have parts of the date in them. Best check the results in the console.

Now we do what the included file
'open-uri'
allowed us to do - we open a remote file simply by calling
open(url)
, and read data from the file. All in one line! The data is stored in a vriable as String, but here String means just that it is a string of bytes, and not something readable.

After that, we open the local file on your hard drive. We open the file with the second argument
"wb"
to denote that we want only to write to the file (and overwrite it, if already exists), and that the data we want to operate on is binary. This is very important! If you do not specify binary data and write or read binary data, something will go wrong, almost always. Remember.

Now, how do we use files. We could do it like this:
file=File::open(name,mode)
# operations on file
file.close()
But then we have to remember to close the file, especially if we write to i, or else the data won't get flushed to disk. But we can also pass a block to
File::open
, and then the method doesn't return the file, instead it calls our block and passes the newly opened file object to it, and after the block finishes, it closes the file gracefully, so that we do not have to do this. This is a good way to write to files, more elegant and safer. (Note that here the block gets executed only once. Do not associate a block with a loop, it's the called function that decides what to do with the passed block, and this thing that
File
does it is also a common behaviour.)
File::open(name,mode)\
{ |file|
# operations on file
}
So inside the block in our program, the variable
f
is the opened local file. Now we just write the data to it (
<<
is the same as
write()
), and finish the program. Check that it works!

One question might arise, why didn't I just write
f<<open(remote).read
. Well, if you had some connection error so that
open
would fail and interrupt the program, you would already have an empty file on your hard drive, and it would remain there and you would have to remove it manually (or overwrite by running program again, successfully). But when you first read data and only then open the file, then in case of error, the file opening line doesn't even get executed, and the file is not created.

No comments: