Saturday, October 2, 2010

Making Pretty CLI Applications In Ruby

Sometimes we need to process large amount of data in Ruby, convert things from one format to another, process a large database, etc. And when you start implementing those things as rake tasks or something like that you need some feedback, say to see the progress and so one. Normally people would use puts

index = 0
count = things.size
things.each do |thing|
do_something_about thing

puts "#{index +=1} of #{count}"
end

But when you have several thousands or millions things to process this approach doesn't work, because it will just blow into the console, which is ugly and well... there are ways to do it much better, prettier and more professionally looking.

There are gems and libs that will help you to do it properly, but in this article I'd like to show how it actually works internally.


Strings Rewriting

One of the first things you might want to learn in order to make seriously looking CLI app is how to rewrite strings all over. It is a bit tricky in Ruby so here how it looks like. First of all you need to learn the "\r" symbol, which is called "caret return" and well it returns the caret. A simple example will demonstrate. Say you have a line of code like that

puts "one\ranother"

when you run it, you will see in the console a string like "another" and what's happening is that ruby will print "one" then return the caret and print "another" over it, so that you see the last one only. But the trouble is that if you'll write something like that

puts "one"
puts "\ranother"

It won't work and you'll see two strings in the console "one" and "another", and because of that if you'll put into your loop something like that

puts "\r#{index += 1} of #{count}"

it won't work either and you will see the same ugly roll of strings. To make it working you have to use the a combination of the print and STDOUT.flush calls, kida like that

8.times do |i|
print "\r#{i}"
STDOUT.flush
sleep 0.1
end

In this case it will print a string and stay on it. STDOUT.flush dumps the current stdout into the console, and on the next iteration, it will normally go to the beginning of the string and write it over as you needed.

But it is still not everything. If you run a piece of code like this one

%w{looooong short}.each do |str|
print "\r#{str}"
STDOUT.flush
sleep 0.5
end

You will see that on the second iteration, the previous line won't be entirely overwritten and instead of "short" you will see "shortong", to make it work properly you need to write a long enough string that contains spaces at the end, for example

%w{looooong short}.each do |str|
print "\r#{str.ljust(80)}"
STDOUT.flush
sleep 0.5
end

String#ljust makes a string of the given length by filling the remaining places with spaces. This way you will always overwrite 80 symbols of the line in the console.

To wrap it up nicely you might create a simple function and your loop will look like that

def print_r(text, size=80)
print "\r#{text.ljust(size)}"
STDOUT.flush
end

index = 0
count = things.size
things.each do |thing|
do_something_about thing

print_r "#{index +=1} of #{count}"
end
puts "\n" # <- a final new line



Displaying the progress

With the trick above you will be able to show a constantly updating status line, but there are still some meat on this bone. Showing the user things like "345 of 87654" is not particularly user friendly, because it might be a bit annoying to calculate the actual progress in your head all the time. Would be nice to show the progress in percents as well. Happily it is very simple to do using placeholders

print_r(
"%d of %d (%d%%)" %
[index+=1, count, (index.to_f/count * 100)]
)

The other usual problem with status reports is that you might have particularly large set of things, say several millions of them and your script might process several thousands of them per second. In this case hitting your console several thousands times per second will seriously slow the process down, so you might need a way to skip some steps and print reports in some periods of time. You can do that the following way

index = 0
count = things.size
step = count / 1000 # 1/10th of a percent

things.each do |thing|
if (index += 1) % step == 0
print_r "....."
end
end

As you can see we defined the step variable and then skip all the non-round iterations. In this particular case it will make the script to update the report every 1/10th of a percent of the job done. Which is in most cases is not a big drawback and still provides the user with progress updates.

You also might think of ETA calculations, but you probably can figure it out on your own now, it's very simple.


Add Some Colors

And the last thing I'd like to show is how to make colors in the console, which might make your application look even cooler. Some developers already know how to do that, but some don't. So here it is.

Basically it is very simple and in some ways similar to HTML tags. You use things called escape sequences which are just some markers like tags, you have an opening one, and you have a closing one, like that

puts "\e[32mGREEN TEXT\e[0m"
puts "\e[31mRED TEXT\e[0m"
puts "\e[36mBLUE TEXT\e[0m"

As you can see, the closing sequence is always the same and the opening one differs only by a number, and this number is basically describes the properties of the following text. It might be a color, or a blinking effect, you can nest them just like normal HTML tags and so one. You can find full list of options on wikipedia

The only trouble with those things is that the format of escape sequences differs from a platform to platform. The example above is for OSX terminal. How to make those things working in DOS and Linux you can find that on the same wikipedia page.


This is about it. Now go and make the world prettier!

No comments: