Category Archives: computers

One year with Scala

This is gonna be rambling personal experience. You have been warned.

A year ago I took a break from my work at Inviqa to learn Scala and functional programming. This has been a great decision, a most successful endeavour, and an amazing learning experience.

I’ve been stuck with PHP for a long time. Mind, at Inviqa we were doing as good PHP as one can. I learned lots about software engineering from a multitude of conference speakers and industry leaders. This was further helped by a very active learning & development department and a generous conference/learning budget. Still, it was PHP. The language sucks.

Learning Scala as a “better Java” kind of language was easy. Case classes, options, lots of easy wins. Learning functional programming… well, I’ve been reading the awesome Functional Programming in Scala book, and I’m still reading it one year later. Despite its hands-on approach and lots of hand-holding, it’s not easy for me to digest. Functional programming is hard.

I used to really like Python. I wrote about 50 lines of Python recently, and just couldn’t believe how difficult it was. One needs to explicitly write return to return values. When just trying things out, you don’t get quick feedback from the compiler. Either write tests or run your code – the former seemed like overdoing it for such a short program, the latter was tedious. Working with the collections was a pain.

Let’s have a look at an example. What we want to do: split a string on commas, then strip whitespace from each element.

In Python:

map(lambda i: i.strip(), re.split(',', string))

In Scala:

string.split(",").map(_.trim)

The relative conciseness of Scala is nice, sure, but the Scala code basically follows the instructions: take a string, split on a comma, trim whitespace. Python is completely cryptic in comparison.

The main drawback of Scala is not the language but the ecosystem. SBT (Scala Build Tool) is powerful but often times incomprehensible. The compilation can be slow at times. Scala’s Play Framework is not nearly as polished as PHP’s Symfony. Replacing any piece of Symfony is just a dependency injection away. Replacing a component of Play is bordering on impossible, everything depends on everything and is mostly hardcoded. I wanted to do a really simple one-line change in Play Framework routing but had to give up.

Inviqa didn’t have as much interest in Scala as I hoped, so regrettably, I stopped working for them. Now I’m part of KwiqJobs (soon to be Swarms Technologies), an amazing startup. We turn people’s waiting time into a new resource for companies. Our tech team works remotely and we meet about once a month for a week. It’s lots of fun! Also we’re hiring! :)

MySQL low cardinality index efficiency

I’ve heard the opinion that indexes on low cardinality columns don’t work well. I set out to disprove that.

First, let’s create a table with a boolean column “active” and fill it with dummy data. Please note it includes a (default, B-Tree) index on the “active” column.

root@localhost test > show create table users\G *************************** 1. row *************************** Table: users Create Table: CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `active` tinyint(1) DEFAULT NULL, PRIMARY KEY (`id`), KEY `active` (`active`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) root@localhost test > DELIMITER ;; root@localhost test > CREATE PROCEDURE insert_random(IN amount INT, IN percent_active INT) -> BEGIN -> DECLARE counter INT DEFAULT 0; -> myloop: LOOP -> if (counter = amount) THEN -> LEAVE myloop; -> end if; -> INSERT INTO users SET -> name = UUID(), -> email = UUID(), -> active = IF(FLOOR(RAND() * 100) < percent_active, 1, 0); -> SET counter = counter + 1; -> END LOOP; -> END ;; Query OK, 0 rows affected (0.00 sec) root@localhost test > DELIMITER ; root@localhost test > call insert_random(1000000, 10); Query OK, 1 row affected (36.60 sec) root@localhost test > select count(*) from users; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.18 sec)

We’ve inserted a million users into our table! That took some time. Please note the 0.18 seconds to count them all.

Now, let’s see how fast we can count the active users:

root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.04 sec)

Cool, pretty fast. Can you explain that?

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ref possible_keys: active key: active key_len: 2 ref: const rows: 208462 filtered: 100.00 Extra: Using index 1 row in set, 1 warning (0.00 sec)

Now let’s drop the index and run the test again:

root@localhost test > alter table users drop key active; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.24 sec)

Wow, a lot slower… I’ve ran the selects a couple of times with consistent results to verify no caching was influencing the results. Let’s see the explain:

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 912236 filtered: 10.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

As you hopefully know: “Using index” good, “Using where” bad.

TL;DR A boolean column consisting of 10% TRUE and 90% FALSE queried for TRUE values using an index takes 0.04 sec, while not using an index takes 0.24 sec. The index speeds up the query by about a factor of six.

I’ve repeated the test with worse case scenario of a binary column split 50/50. 50% true, 50% false. The numbers are a bit less consistent, but generally around 0.16 for the indexed version and 0.24 for the unindexed version. Go indexes!

My favourite vim feature

No place to be humble, I’m a vim power user.

Yes, your IDE can do things my vim can’t. But my vim can do things your IDE can’t. How many times have I watched someone using an IDE and thought “if only it were vim I’d take your keyboard and solve this in about two seconds, while you click around and keep getting lost”.

My favourite vim feature? Displaying different parts of one buffer (ie file) in two vertically split windows. Can your IDE do that? Now get off my lawn!

Open a file, split vertically, scroll each of the split windows independently:

$ vim ~/.bashrc
:vs
G

Explorations in AI – solving RoboZZle

RoboZZle is a robot programming game. You can play it in your web browser (even without Silverlight), on Android, or iPhone/Pad/Pod. Go and try it, otherwise the rest of this entry won’t make much sense.

After a while of playing the game myself, I started getting interested in creating a program to find puzzle solutions. Having completed Intro to AI by Sebastian Thrun and Peter Norvig in 2011 and never using the techniques since, this seemed like a good opportunity to explore them further. As things go, none of these techniques appear directly applicable and a simple evolutionary search seems best.

Approach

I started by writing a solution runner, the main output of which is determining whether a given RoboZZle program solves a puzzle. Next were a robozzle.com client, utilizing the awesome Python requests library, and finally solver randomly generating programs to be tested by the runner. You can browse the source code of Zlej Rob at github.

Some facts about the solver:

  1. Start with an empty program.
  2. In each generation, each of the surviving programs will have ~100 randomly generated offsprings.
  3. A program is mutated by overriding (or removing) 1 to ~3 instructions of its parent.
  4. Programs are scored: positive points for reaching a square, extra points for collecting stars, and a negative point for each program instruction.
  5. If an offspring achieved higher score than its parent, it’s added to the current program population.
  6. Program population is kept to ~100 programs with highest score.

I decided not to cross-breed programs at all, as it doesn’t feel like it would be helpful.

Surprisingly, keeping a set of all evaluated programs doesn’t eat all the memory (and prevents recalculating the same thing over and over, speeding up execution by an order of magnitude). Yay for sets of tuples!

Results

Zlej Rob has solved over a 1000 puzzles in a couple of weeks of running on my $5/mo DigitalOcean (that’s a referral link – sign up and I’ll get rich) droplet alongside this blog and a couple other things.

Zlej Rob’s good at solving one-way street puzzles with one function, bad at anything involving multiple functions, mediocre at multiple-possibilities puzzles, and passable at random walks.

Zlej Rob discovered the shortest solution for Twins 2 (which I improved by removing a redundant function call). I think that’s pretty impressive. The solution includes a lot of recursive calls, and the replay takes ages – the robot loops and loops, seemingly never getting very far.

Ideas for improvement

  1. Visualisation. Would help identifying why certain things work and others don’t. I’ve almost started writing an Angular app to do that.
  2. Smarter mutations. Mutations should include abstracting random instruction subchains into other functions (this could be very useful for multifunction problems where Zlej Rob usually fails terribly because it can’t connect the dots). It might also be better to insert instructions instead of replacing them.
  3. Program diversity (“similarity penalty”). Surviving population for each generation is around 100 programs (toyed with higher values with no positive results – if anything, having 1000 programs is terribly slow), and they can all easily end up being very similar, getting stuck in local minima.
  4. Higher score programs should have more offsprings. Could speed up certain puzzles but perhaps make less obvious solutions unatainable. Would need to ensure program diversity first.
  5. Hints. For puzzles with an “obvious” order in which squares must be visited (such as binary counting or limit your stack), mandate this order. Currently, Zlej Rob finds random solutions that sweep vast majority of the board in a haphazard manner. Forcing the order in which squares must be visited would be extremely helpful, but likely requires human participation. Perhaps that’d be cheating?

Where do we go from here?

Zlej Rob’s results have surpassed my expectations, especially considering I haven’t spent that much time on it. Getting Zlej Rob into top 10 of RoboZZle players seems possible but would likely require an order of magnitude more effort than I spent so far. Not sure if worth it?

Make Ubuntu Unity usable in five simple steps

Out of frustration with Unity, I’ve used several window managers/desktops recently. They are excellent at what they do, but I missed how everything (volume control, display brightness, keyboard switcher, sleep on laptop lid close, sound mixer, wifi applet, gnome keyring manager, automounting, etc.) just works out of the box in the default Ubuntu installation. All these things need to be taken care of separately when using a minimalistic window manager, most of them are painful, some impossible.

Unity is a strange beast. Some of its features are amazing (merged topbar with window bar, semi-maximizing windows with ctrl+windows_key+left/right), while some are awful (the unholy left menu with its terrible launcher and all the other tentacles). I figured long as I can avoid the few bad parts, I’ll be happy.

You’ll need to install compizconfig-settings-manager for most of the tweaks to work. People who say ccsm breaks things are noobs.

Get the unity sidebar out of the way:
Appearance: Behavior: Auto-hide: ON
Optionally also, Reveal location: Top Left Corner, and set sensitivity to something ridiculous so the cthulhu never pops out.

Sensible Alt+F2:
First, install gmrun, a quick and small program launcher with tab completion.
Second, unbind Alt+F2 from Unity:

gconftool-2 \
-s "/apps/compiz-1/plugins/unityshell/screen0/options/execute_command" \
-t string ""

Third, bind Alt+F2 to gmrun:
ccsm: Commands: add command “gmrun”, key bindings bind to Alt+F2

Default four desktops aren’t bourgeoise enough, get more:
ccsm: General Options: Desktop Size

Focus Follows Mouse:
ccsm: General Options: Focus & Raise: uncheck “Click To Focus”

Fullscreen Any Window:
ccsm: Extra WM Actions: Toggle Fullscreen: Alt+F11

Swap Ctrl and Caps Lock:
gnome-tweak-tool: Typing: Ctrl key position: Swap Ctrl and Caps Lock

I find the environment comfortable to use after just these five tweaks (press and hold the windows key to see many useful Unity shortcuts), though I’m still struggling with window switching. Usually bypass it by keeping one fullscreen window per desktop.

PS: Why did I write this? Further reference. Yesterday I accidentally ran “unity –reset” (do NOT ever run that) and had to google these steps (again). No more!

Dotrc aka ~/.*rc

If you don’t understand the title, you might just as well leave — this post is going to contain close to no useful information for you.

I’ve been spending a lot of my time in the shell recently. Mostly splitting my time between bash and vim, usually in screen.

I’ve always had a reasonable .bashrc, and my .vimrc used to be above average as well. But I invested some extra time to research more possibilities the dotfiles offer. You can preview and download my dotrc at github.

Here are some of the highlights, whatever I consider the “best of”.

My .bashrc is unremarkable, I just have a lot of shortcuts for the common everyday stuff. Perhaps the only thing worth noting is title setting for screen:

PROMPT_COMMAND=‘echo -ne "\033k`echo $PWD | sed "s:.*/\(.*/.*/.*\):\1:g"`\033\\"’

I actually wrote that myself, it shows the innermost three directories that you’re in. Showing running command in title is useless, as that’s in most cases either bash or vim (rarely also mysql). Showing the full path is useless, because long titles get cut off. Showing only the current directory name is not so great either, as it leaves you without context. I’ve settled for last three so far, but two might also be useful in certain situations.

Perhaps the best tip of all, reduce amount of tab hitting for completion by 50%. Put following to your .inputrc:

set show-all-if-ambiguous on

Next in line is my .vimrc (sorry, no .emacsrc, emacs sucks). Except for the usual stuff (nocompatible, colours, incsearch, etc.), I use few very useful and not very well known tricks.

set so=10 " show 10 lines of context (above and below)

“so” is short for “scrolloff”, which makes sure you have some space to breathe.

Last but not least, the Esc key is real far, hence:

set tm=400 " timeout for shortcuts

inoremap jk <esc> "pressing j and k together escapes
inoremap kj <esc>

Have I missed any useful tips & tricks?

The death of spam

Remember spam? Our email inboxes used to be full of it, it clogged online forums and was a general nuisance. My blog used to receive dozens of spam comments every day.

Now with gmail’s agressive — but usually spot on — filtering, I get a spam email in my inbox once a month. I have about 300 mails in my spam folder, which is about half of what it used to be few years ago. My blog is protected by akismet, which feels even more effective than gmail at catching spam. It also looks like spammers are giving up a little as I only received 6 spam comments on my blog during the past two weeks (that, or they think my blog is dead (it’s not, believe me!)).

So is spam really dead, or has it just moved elsewhere?

Social networks don’t suffer from spam (if you define spam as unwanted message of commercial nature) but social news sites (digg, reddit, etc.) are a different story. People try to spam them all the time. But spam on social news sites has to be disguised very well because the community tries to defend itself. If anyone even suspects that something might be spam, they downvote immediately.

Search engine spam (aka half of SEO techniques) is also prevalent and I can’t see it disappearing anytime soon. On the other hand, it’s gotten much milder and nowadays you rarely see spam-sites which are just full of ads.

All in all, I think the future is bright. Brute-force spam is easy to catch, and more clever spam sometimes even includes useful or entertaining stuff (think of viral marketing as a clever, entertaining, user-distributed spam ;)).

Yours sincerely,
Captain Obvious

Gödel, Escher, Bach

I just finished reading Gödel, Escher, Bach by Douglas R. Hofstadter. I’ll wait for you to at least read the wikipedia description of the book.

Done? Ok… The main surprise for me was how many seemingly unrelated topics the book touches. Logic (Gödel), graphic art (Escher), music (Bach) to start with. But also mathematics, molecular biology, genetics, philosophy, zen buddhism, formal systems, artificial intelligence, programming, recursion and self-reference, various paradoxes, and much much more.

It is easily the best book I’ve ever read, although I can’t claim to have understood everything. If you want to borrow my copy, I’d be glad to help (the book is sort of expensive and really hard to get in Czechia).

Firefox productivity tips

If you are like me, you spend a lot of time using firefox. Chances are that you don’t use it as effectively as I do. :)

Let’s start with the look. First, use a minimalistic theme. I use full flat, but whiteheart is also kind of cool. Such themes simply don’t get in your way, which is exactly what you need.

You probably use several toolbars which take way too much of your screen space. Right-click on a toolbar and select “customize”. Then you can drag everything around and hide the unused toolbars. I currently have three rows of toolbars:

  • First toolbar has File, Edit, View menu and others, back, forward, reload, cancel, home icons, location bar, and finally a search bar.
  • Second toolbar contains quick bookmarks – just the pictures without any text, so that I can fit in as many as possible.
  • Third toolbar is shared google toolbar (used mostly just for PageRank), webdeveloper toolbar, user agent switcher and adblock settings.

I often see people wasting their screen space by using the default configuration with menus in the first toolbar and action icons and location bar in the second one.

Speed up firefox by installing fasterfox or simply google around a bit to find out (enable pipelining, disable ipv6, etc.).

Learn to use keyboard shortcuts:

  • ctrl+T — open new tab
  • ctrl+L — go to location bar
  • ctrl+K — go to search box
  • ctrl+up/down — switch search engines when in search box (don’t forget to order them in a way that suits you)
  • alt+enter — open the search from search box in a new tab
  • ctrl+shift+T — open an accidentally closed tab (you can even repeat this shortcut — superb!)
  • ctrl+b — bookmarks
  • ctrl+h — history

If you use a mouse, install and learn to use mouse gestures. At least the few ones for back/forward and open/close tab. I personally prefer FireGestures, as that is the only gesture extension that actually knows what I mean when I perform the gestures. 8-)

Out of the helpful extensions that I use, I must mention at least Download Statusbar, Customize Google and Adblock Plus.

And I almost thought I wouldn’t make another post in March. ^^

New look again

I was thoroughly bored with the previous theme, and although I tried to revive it with the new header image, it was still bugging me. So I created a new one.

I had a draft of a new theme lying around for quite a long time, so I made few adjustments to it: made the code much longer and much less clean. But it seems to work.

Features of the new design include, but are not limited to:

  • big letters in headings (big letters rock)
  • even less images (none, except the two links to flickriver, smilies and images in posts)
  • half-fixed-width half-fluid design (the design is fixed width, but the sidebar is fluid — works well for many different widths of browser (800px — the sidebar isn’t displayed, it’s accessible through scrolling; 1024px — sidebar in one column, 1280px — two columns, more px — more columns (it is capped at three columns)))
  • emphasis on typography (lists, blockquotes, etc. are styled properly)
  • lines vertically in synch (left column, middle column and sidebar)
  • the old color scheme, I mostly like it and more importantly — couldn’t find a better one at the moment :)
  • justified text (I’m still very unsure here — justified looks way better, but left-aligned is more readable)

Bugs of the new design include, but are not limited to:

  • IE6 sometimes messes up the sidebar, not quite sure why
  • Opera doesn’t keep lines in synch when there are smileys (and I thought I had the solution, sigh…)
  • IE doesn’t align the comment date in the comment list (will look into that later)

Also, I spent ages dealing with various bugs in IE that caused things to disappear.
One such bug caused the sidebar not to appear (it was an absolutely positioned element next to a floated element — don’t ever do that), another sometimes caused titles to disappear (they were relatively positioned, now that they are static it seems ok, but I have no idea why). When repairing the sidebar, I had to move it in front of the actual content in the markup, which is wrong and I know it. I am sorry to all lynx/links users out there.

Bug reports, remarks and suggestions are welcome! ;-)