One year with Scala

This is gonna be rambling personal experience. You have been warned.

A year ago I took a break from my work at Inviqa to learn Scala and functional programming. This has been a great decision, a most successful endeavour, and an amazing learning experience.

I’ve been stuck with PHP for a long time. Mind, at Inviqa we were doing as good PHP as one can. I learned lots about software engineering from a multitude of conference speakers and industry leaders. This was further helped by a very active learning & development department and a generous conference/learning budget. Still, it was PHP. The language sucks.

Learning Scala as a “better Java” kind of language was easy. Case classes, options, lots of easy wins. Learning functional programming… well, I’ve been reading the awesome Functional Programming in Scala book, and I’m still reading it one year later. Despite its hands-on approach and lots of hand-holding, it’s not easy for me to digest. Functional programming is hard.

I used to really like Python. I wrote about 50 lines of Python recently, and just couldn’t believe how difficult it was. One needs to explicitly write return to return values. When just trying things out, you don’t get quick feedback from the compiler. Either write tests or run your code – the former seemed like overdoing it for such a short program, the latter was tedious. Working with the collections was a pain.

Let’s have a look at an example. What we want to do: split a string on commas, then strip whitespace from each element.

In Python:

map(lambda i: i.strip(), re.split(',', string))

In Scala:

string.split(",").map(_.trim)

The relative conciseness of Scala is nice, sure, but the Scala code basically follows the instructions: take a string, split on a comma, trim whitespace. Python is completely cryptic in comparison.

The main drawback of Scala is not the language but the ecosystem. SBT (Scala Build Tool) is powerful but often times incomprehensible. The compilation can be slow at times. Scala’s Play Framework is not nearly as polished as PHP’s Symfony. Replacing any piece of Symfony is just a dependency injection away. Replacing a component of Play is bordering on impossible, everything depends on everything and is mostly hardcoded. I wanted to do a really simple one-line change in Play Framework routing but had to give up.

Inviqa didn’t have as much interest in Scala as I hoped, so regrettably, I stopped working for them. Now I’m part of KwiqJobs (soon to be Swarms Technologies), an amazing startup. We turn people’s waiting time into a new resource for companies. Our tech team works remotely and we meet about once a month for a week. It’s lots of fun! Also we’re hiring! :)

ListMap in Scala

You probably don’t want to use a ListMap. It has an interface of a Map, but is “powered” (that’s too good a word) by a List. Thus pretty much any Map operations you call on it will take ages. Consider using a LinkedHashMap instead – it will preserve both the element order and your sanity.

That’s me trying to blog more.

MySQL low cardinality index efficiency

I’ve heard the opinion that indexes on low cardinality columns don’t work well. I set out to disprove that.

First, let’s create a table with a boolean column “active” and fill it with dummy data. Please note it includes a (default, B-Tree) index on the “active” column.

root@localhost test > show create table users\G *************************** 1. row *************************** Table: users Create Table: CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `active` tinyint(1) DEFAULT NULL, PRIMARY KEY (`id`), KEY `active` (`active`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) root@localhost test > DELIMITER ;; root@localhost test > CREATE PROCEDURE insert_random(IN amount INT, IN percent_active INT) -> BEGIN -> DECLARE counter INT DEFAULT 0; -> myloop: LOOP -> if (counter = amount) THEN -> LEAVE myloop; -> end if; -> INSERT INTO users SET -> name = UUID(), -> email = UUID(), -> active = IF(FLOOR(RAND() * 100) < percent_active, 1, 0); -> SET counter = counter + 1; -> END LOOP; -> END ;; Query OK, 0 rows affected (0.00 sec) root@localhost test > DELIMITER ; root@localhost test > call insert_random(1000000, 10); Query OK, 1 row affected (36.60 sec) root@localhost test > select count(*) from users; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.18 sec)

We’ve inserted a million users into our table! That took some time. Please note the 0.18 seconds to count them all.

Now, let’s see how fast we can count the active users:

root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.04 sec)

Cool, pretty fast. Can you explain that?

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ref possible_keys: active key: active key_len: 2 ref: const rows: 208462 filtered: 100.00 Extra: Using index 1 row in set, 1 warning (0.00 sec)

Now let’s drop the index and run the test again:

root@localhost test > alter table users drop key active; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.24 sec)

Wow, a lot slower… I’ve ran the selects a couple of times with consistent results to verify no caching was influencing the results. Let’s see the explain:

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 912236 filtered: 10.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

As you hopefully know: “Using index” good, “Using where” bad.

TL;DR A boolean column consisting of 10% TRUE and 90% FALSE queried for TRUE values using an index takes 0.04 sec, while not using an index takes 0.24 sec. The index speeds up the query by about a factor of six.

I’ve repeated the test with worse case scenario of a binary column split 50/50. 50% true, 50% false. The numbers are a bit less consistent, but generally around 0.16 for the indexed version and 0.24 for the unindexed version. Go indexes!

On URLs

People don’t pay any attention to URLs. Recently, a highly intelligent friend of mine inadverently posted the following link on Facebook: https://www.givingwhatwecan.org/get-involved/how-rich-am-i/?country=NLD&income=56000&adults=1&children=0

URLs are the building blocks of the world wide web. When a website changes its URLs, you get those nasty 404 errors. It would be good if website maintainers paid a little more attention to that.

As a website user, when you’re sharing a URL, look at it briefly. Does it actually contain the information you want it to contain? Can other people view this URL? Pro tip: use incognito mode to find out. Does it include superfluous information you were not intending to share? E.g. when searching google, you can easily end up with URLs such as https://www.google.com/search?q=previous+search#q=current+search

Keep URLs simple & working. Look at them briefly when sharing. It’s no rocket science.

The Acer confusion

Not knowing the following table can result in rather pointless arguments.

Latin Czech Polish
Acer Javor Klon
Acer platanoides Javor mléč Klon zwyczajny
Acer pseudoplatanus Javor klen Klon jawor

One day I’ll make a compendium of Polish-Czech insanities.
Nůžky ~ nożyczki, nožičky ~ nóżki.

My favourite vim feature

No place to be humble, I’m a vim power user.

Yes, your IDE can do things my vim can’t. But my vim can do things your IDE can’t. How many times have I watched someone using an IDE and thought “if only it were vim I’d take your keyboard and solve this in about two seconds, while you click around and keep getting lost”.

My favourite vim feature? Displaying different parts of one buffer (ie file) in two vertically split windows. Can your IDE do that? Now get off my lawn!

Open a file, split vertically, scroll each of the split windows independently:

$ vim ~/.bashrc
:vs
G

Explorations in AI – solving RoboZZle

RoboZZle is a robot programming game. You can play it in your web browser (even without Silverlight), on Android, or iPhone/Pad/Pod. Go and try it, otherwise the rest of this entry won’t make much sense.

After a while of playing the game myself, I started getting interested in creating a program to find puzzle solutions. Having completed Intro to AI by Sebastian Thrun and Peter Norvig in 2011 and never using the techniques since, this seemed like a good opportunity to explore them further. As things go, none of these techniques appear directly applicable and a simple evolutionary search seems best.

Approach

I started by writing a solution runner, the main output of which is determining whether a given RoboZZle program solves a puzzle. Next were a robozzle.com client, utilizing the awesome Python requests library, and finally solver randomly generating programs to be tested by the runner. You can browse the source code of Zlej Rob at github.

Some facts about the solver:

  1. Start with an empty program.
  2. In each generation, each of the surviving programs will have ~100 randomly generated offsprings.
  3. A program is mutated by overriding (or removing) 1 to ~3 instructions of its parent.
  4. Programs are scored: positive points for reaching a square, extra points for collecting stars, and a negative point for each program instruction.
  5. If an offspring achieved higher score than its parent, it’s added to the current program population.
  6. Program population is kept to ~100 programs with highest score.

I decided not to cross-breed programs at all, as it doesn’t feel like it would be helpful.

Surprisingly, keeping a set of all evaluated programs doesn’t eat all the memory (and prevents recalculating the same thing over and over, speeding up execution by an order of magnitude). Yay for sets of tuples!

Results

Zlej Rob has solved over a 1000 puzzles in a couple of weeks of running on my $5/mo DigitalOcean (that’s a referral link – sign up and I’ll get rich) droplet alongside this blog and a couple other things.

Zlej Rob’s good at solving one-way street puzzles with one function, bad at anything involving multiple functions, mediocre at multiple-possibilities puzzles, and passable at random walks.

Zlej Rob discovered the shortest solution for Twins 2 (which I improved by removing a redundant function call). I think that’s pretty impressive. The solution includes a lot of recursive calls, and the replay takes ages – the robot loops and loops, seemingly never getting very far.

Ideas for improvement

  1. Visualisation. Would help identifying why certain things work and others don’t. I’ve almost started writing an Angular app to do that.
  2. Smarter mutations. Mutations should include abstracting random instruction subchains into other functions (this could be very useful for multifunction problems where Zlej Rob usually fails terribly because it can’t connect the dots). It might also be better to insert instructions instead of replacing them.
  3. Program diversity (“similarity penalty”). Surviving population for each generation is around 100 programs (toyed with higher values with no positive results – if anything, having 1000 programs is terribly slow), and they can all easily end up being very similar, getting stuck in local minima.
  4. Higher score programs should have more offsprings. Could speed up certain puzzles but perhaps make less obvious solutions unatainable. Would need to ensure program diversity first.
  5. Hints. For puzzles with an “obvious” order in which squares must be visited (such as binary counting or limit your stack), mandate this order. Currently, Zlej Rob finds random solutions that sweep vast majority of the board in a haphazard manner. Forcing the order in which squares must be visited would be extremely helpful, but likely requires human participation. Perhaps that’d be cheating?

Where do we go from here?

Zlej Rob’s results have surpassed my expectations, especially considering I haven’t spent that much time on it. Getting Zlej Rob into top 10 of RoboZZle players seems possible but would likely require an order of magnitude more effort than I spent so far. Not sure if worth it?

How I didn’t stop eating food

I’ve recently read How I Stopped Eating Food and think the author is nuts. Rob claims his wonder-potion helped him clear his to-do list and got him rid of his coffee addiction. While his approach is questionable, I’ve long shared many of his goals, such as a healthy and balanced diet, while not wasting time preparing food.

Here are some dishes I’ve been cooking lately:

  • First my absolute favorite: fast, clean and super-healthy. Take broccoli/cauliflower/spinach/green beans/other vegetable. Wash, cut up in a couple of pieces, put in a plastic box, add a bit of water, and microwave between one and three minutes. Steamed vegetables retain both their taste and vitamins. Meanwhile, grate some cheese to sprinkle on top and wash side-dish tomatoes/paprika/carrots/other vegetables. Add a bit of pepper and pour olive oil over it all. Voila, that took about 7 minutes, dirty dishes amount to one plastic box and the grater.
  • Preheat oven. Wrap fish in aluminium foil and put it in the oven. Go read a book for 10 minutes. Wash side-dish vegetables and perhaps cut a slice of bread. No dirty dishes.
  • Scrambled eggs are pushing it, preparation takes like 15 minutes and there’s the dirty pan left afterwards. But sometimes I still enjoy them.

I eat tomatoes and paprika with almost every meal. For meals without much protein, I add few slices of dead animals. If the dish doesn’t have enough fat, I add generous amount of olives and snack nuts afterwards. Oh and I’ve just discovered avocados!

Breakfast is my guilty pleasure, oats (+ banana and nuts) with yogurt, often times of Greek variety, with 15 grams of saturated fat per half-a-cup, instantly getting to 75% of maximum recommended daily saturated fat intake. Oh well.

Chairs

I’ve been undergoing torture all my life. These devices of the devil bend my body 90° at the waist, and then again 90° at the knees. Halfway between bent and straight. I can hardly imagine anything less comfortable.

I’d be happy if the chair-angles were either bigger or smaller. Kneeling chairs are nice, as the body/thighs angle is about 110°, which “reduces lower back strain by promoting proper spinal alignment”. Zafu is cool, because you can sit on it in many different ways, according to your current mood. Bean bags are comfortable, they adjust their shape to your shape. Sitting on the floor leaning against a wall with legs stretched out and a small pillow under your back is almost as good as sitting on a bed of leaves leaning against an oak.

But Aeron? No thank you.

Déjà vu explained

So, Wikipedia says:

The psychologist Edward B. Titchener in his book A Textbook of Psychology (1928), wrote that déjà vu is caused by a person getting a brief glimpse of an object or situation prior to full conscious perception, resulting in a false sense of familiarity. The explanation that has mostly been accepted of déjà vu is not that it is an act of “precognition” or “prophecy”, but rather that it is an anomaly of memory, giving the false impression that an experience is “being recalled”.

What a load of bollocks. Here’s the real deal:

When you experience a déjà vu, it had already happened to you. Your experiences are fully controlled by your mind — you do not perceive some objective reality but your subjective one.

Subjective reality means you can experience different situations in a very similar way, when they put you in the same emotional state. More scientifically, the same partial chain of neurons in your brain fires in the same sequence. That’s why you remember it, that’s why you sense a déjà vu — it had already happened to you.

From the half-assed-guesses-I-am-absolutely-convinced-about department.