MySQL low cardinality index efficiency

2016-11-28

I’ve heard the opinion that indexes on low cardinality columns don’t work well. I set out to disprove that.

First, let’s create a table with a boolean column “active” and fill it with dummy data. Please note it includes a (default, B-Tree) index on the “active” column.

> show create table users\G
*************************** 1. row ***************************
Table: users
Create Table: CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
`active` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `active` (`active`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

> DELIMITER ;;
> CREATE PROCEDURE insert_random(IN amount INT, IN percent_active INT)
    -> BEGIN
    ->   DECLARE counter INT DEFAULT 0;
    ->   myloop: LOOP
    ->     if (counter = amount) THEN
    ->       LEAVE myloop;
    ->     end if;
    ->     INSERT INTO users SET
    ->       name = UUID(),
    ->       email = UUID(),
    ->       active = IF(FLOOR(RAND() * 100) < percent_active, 1, 0);
    ->     SET counter = counter + 1;
    ->   END LOOP;
    -> END ;;
Query OK, 0 rows affected (0.00 sec)

> DELIMITER ;

> call insert_random(1000000, 10);
Query OK, 1 row affected (36.60 sec)

> select count(*) from users;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.18 sec)

We’ve inserted a million users into our table! That took some time. Please note the 0.18 seconds to count them all.

Now, let’s see how fast we can count the active users:

> select count(*) from users where active = true;
+----------+
| count(*) |
+----------+
|    99887 |
+----------+
1 row in set (0.04 sec)

Cool, pretty fast. Can you explain that?

> explain select count(*) from users where active = true\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
   partitions: NULL
         type: ref
possible_keys: active
          key: active
      key_len: 2
          ref: const
         rows: 208462
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

Now let’s drop the index and run the test again:

> alter table users drop key active;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

> select count(*) from users where active = true;
+----------+
| count(*) |
+----------+
|    99887 |
+----------+
1 row in set (0.24 sec)

Wow, a lot slower… I’ve ran the selects a couple of times with consistent results to verify no caching was influencing the results. Let’s see the explain:

> explain select count(*) from users where active = true\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 912236
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

As you hopefully know: “Using index” good, “Using where” bad.

TL;DR A boolean column consisting of 10% TRUE and 90% FALSE queried for TRUE values using an index takes 0.04 sec, while not using an index takes 0.24 sec. The index speeds up the query by about a factor of six.

I’ve repeated the test with worse case scenario of a binary column split 50/50. 50% true, 50% false. The numbers are a bit less consistent, but generally around 0.16 for the indexed version and 0.24 for the unindexed version. Go indexes!

On URLs

2016-05-06

People don’t pay any attention to URLs. Recently, a highly intelligent friend of mine inadverently posted the following link on Facebook: https://www.givingwhatwecan.org/get-involved/how-rich-am-i/?country=NLD&income=56000&adults=1&children=0

URLs are the building blocks of the world wide web. When a website changes its URLs, you get those nasty 404 errors. It would be good if website maintainers paid a little more attention to that.

As a website user, when you’re sharing a URL, look at it briefly. Does it actually contain the information you want it to contain? Can other people view this URL? Pro tip: use incognito mode to find out. Does it include superfluous information you were not intending to share? E.g. when searching google, you can easily end up with URLs such as https://www.google.com/search?q=previous+search#q=current+search

Keep URLs simple & working. Look at them briefly when sharing. It’s no rocket science.

The Acer confusion

2015-01-26

Not knowing the following table can result in rather pointless arguments.

Latin Czech Polish
Acer Javor Klon
Acer platanoides Javor mléč Klon zwyczajny
Acer pseudoplatanus Javor klen Klon jawor

One day I’ll make a compendium of Polish-Czech insanities.
Nůžky ~ nożyczki, nožičky ~ nóżki.

Explorations in AI – solving RoboZZle

2013-07-22

RoboZZle is a robot programming game. You can play it in your web browser (even without Silverlight), on Android, or iPhone/Pad/Pod. Go and try it, otherwise the rest of this entry won’t make much sense.

After a while of playing the game myself, I started getting interested in creating a program to find puzzle solutions. Having completed Intro to AI by Sebastian Thrun and Peter Norvig in 2011 and never using the techniques since, this seemed like a good opportunity to explore them further. As things go, none of these techniques appear directly applicable and a simple evolutionary search seems best.

Approach

I started by writing a solution runner, the main output of which is determining whether a given RoboZZle program solves a puzzle. Next were a robozzle.com client, utilizing the awesome Python requests library, and finally solver randomly generating programs to be tested by the runner. You can browse the source code of Zlej Rob at github.

Some facts about the solver:

  1. Start with an empty program.
  2. In each generation, each of the surviving programs will have ~100 randomly generated offsprings.
  3. A program is mutated by overriding (or removing) 1 to ~3 instructions of its parent.
  4. Programs are scored: positive points for reaching a square, extra points for collecting stars, and a negative point for each program instruction.
  5. If an offspring achieved higher score than its parent, it’s added to the current program population.
  6. Program population is kept to ~100 programs with highest score.

I decided not to cross-breed programs at all, as it doesn’t feel like it would be helpful.

Surprisingly, keeping a set of all evaluated programs doesn’t eat all the memory (and prevents recalculating the same thing over and over, speeding up execution by an order of magnitude). Yay for sets of tuples!

Results

Zlej Rob has solved over a 1000 puzzles in a couple of weeks of running on my $5/mo DigitalOcean (that’s a referral link – sign up and I’ll get rich) droplet alongside this blog and a couple other things.

Zlej Rob’s good at solving one-way street puzzles with one function, bad at anything involving multiple functions, mediocre at multiple-possibilities puzzles, and passable at random walks.

Zlej Rob discovered the shortest solution for Twins 2 (which I improved by removing a redundant function call). I think that’s pretty impressive. The solution includes a lot of recursive calls, and the replay takes ages – the robot loops and loops, seemingly never getting very far.

Ideas for improvement

  1. Visualisation. Would help identifying why certain things work and others don’t. I’ve almost started writing an Angular app to do that.
  2. Smarter mutations. Mutations should include abstracting random instruction subchains into other functions (this could be very useful for multifunction problems where Zlej Rob usually fails terribly because it can’t connect the dots). It might also be better to insert instructions instead of replacing them.
  3. Program diversity (“similarity penalty”). Surviving population for each generation is around 100 programs (toyed with higher values with no positive results – if anything, having 1000 programs is terribly slow), and they can all easily end up being very similar, getting stuck in local minima.
  4. Higher score programs should have more offsprings. Could speed up certain puzzles but perhaps make less obvious solutions unatainable. Would need to ensure program diversity first.
  5. Hints. For puzzles with an “obvious” order in which squares must be visited (such as binary counting or limit your stack), mandate this order. Currently, Zlej Rob finds random solutions that sweep vast majority of the board in a haphazard manner. Forcing the order in which squares must be visited would be extremely helpful, but likely requires human participation. Perhaps that’d be cheating?

Where do we go from here?

Zlej Rob’s results have surpassed my expectations, especially considering I haven’t spent that much time on it. Getting Zlej Rob into top 10 of RoboZZle players seems possible but would likely require an order of magnitude more effort than I spent so far. Not sure if worth it?

How I didn't stop eating food

2013-05-30

I’ve recently read How I Stopped Eating Food and think the author is nuts. Rob claims his wonder-potion helped him clear his to-do list and got him rid of his coffee addiction. While his approach is questionable, I’ve long shared many of his goals, such as a healthy and balanced diet, while not wasting time preparing food.

Here are some dishes I’ve been cooking lately:

  • First my absolute favorite: fast, clean and super-healthy. Take broccoli/cauliflower/spinach/green beans/other vegetable. Wash, cut up in a couple of pieces, put in a plastic box, add a bit of water, and microwave between one and three minutes. Steamed vegetables retain both their taste and vitamins. Meanwhile, grate some cheese to sprinkle on top and wash side-dish tomatoes/paprika/carrots/other vegetables. Add a bit of pepper and pour olive oil over it all. Voila, that took about 7 minutes, dirty dishes amount to one plastic box and the grater.
  • Preheat oven. Wrap fish in aluminium foil and put it in the oven. Go read a book for 10 minutes. Wash side-dish vegetables and perhaps cut a slice of bread. No dirty dishes.
  • Scrambled eggs are pushing it, preparation takes like 15 minutes and there’s the dirty pan left afterwards. But sometimes I still enjoy them.

I eat tomatoes and paprika with almost every meal. For meals without much protein, I add few slices of dead animals. If the dish doesn’t have enough fat, I add generous amount of olives and snack nuts afterwards. Oh and I’ve just discovered avocados!

Breakfast is my guilty pleasure, oats (+ banana and nuts) with yogurt, often times of Greek variety, with 15 grams of saturated fat per half-a-cup, instantly getting to 75% of maximum recommended daily saturated fat intake. Oh well.

Chairs

2012-11-15

I’ve been undergoing torture all my life. These devices of the devil bend my body 90° at the waist, and then again 90° at the knees. Halfway between bent and straight. I can hardly imagine anything less comfortable.

I’d be happy if the chair-angles were either bigger or smaller. Kneeling chairs are nice, as the body/thighs angle is about 110°, which “reduces lower back strain by promoting proper spinal alignment”. Zafu is cool, because you can sit on it in many different ways, according to your current mood. Bean bags are comfortable, they adjust their shape to your shape. Sitting on the floor leaning against a wall with legs stretched out and a small pillow under your back is almost as good as sitting on a bed of leaves leaning against an oak.

But Aeron? No thank you.

Déjà vu explained

2012-10-13

So, Wikipedia says:

The psychologist Edward B. Titchener in his book A Textbook of Psychology (1928), wrote that déjà vu is caused by a person getting a brief glimpse of an object or situation prior to full conscious perception, resulting in a false sense of familiarity. The explanation that has mostly been accepted of déjà vu is not that it is an act of “precognition” or “prophecy”, but rather that it is an anomaly of memory, giving the false impression that an experience is “being recalled”.

What a load of bollocks. Here’s the real deal:

When you experience a déjà vu, it had already happened to you. Your experiences are fully controlled by your mind – you do not perceive some objective reality but your subjective one.

Subjective reality means you can experience different situations in a very similar way, when they put you in the same emotional state. More scientifically, the same partial chain of neurons in your brain fires in the same sequence. That’s why you remember it, that’s why you sense a déjà vu – it had already happened to you.

From the half-assed-guesses-I-am-absolutely-convinced-about department.

One week in Sheffield

2012-09-25
  • Almost got run over dozen of times. Surprisingly, on a bike it’s easier to keep left.
  • Stores are open (convenient; I’ve lived in a country where stores are closed).
  • In Sainsbury’s, everything is either discounted or 3 for the price of 2. You think people would see right through that, but I still get excited when whatever I want to buy turns out to be discounted.
  • Show me a white person in Sheffield and I’ll show you two Asians.
  • Local “pale ale” is actually very tasty.
  • Female cashiers twice your age call you “my love”.
  • Yesterday at a concert I met a guy from South England who couldn’t understand the Sheffield accent either.
  • Cheddar.
  • Hot girls are usually Asian. Or Spanish exchange students.
  • Full English breakfast is not bad. Though the beans take time getting used to.
  • Sheffield is hilly. Even though five speeds are all I need on my bike, sometimes I can’t help but secretly wish for a lower speed.
  • Ordered spicy pork noodle soup in an Asian place. It was the spicy-in, spicy-out kind.
  • There’s no light in my hotel bathroom but the pub downstairs is way cool.
  • Museums and galleries are plentiful and free.
  • Had to pack and move all my things three times in one week. Bought a pull-up bar and a bicycle to make the process more challenging.
  • Got called up on stage at a local theater play. Scared me so much that when asked where I was from, answered Argentina.
  • Peak district is awesome.

Make Ubuntu Unity usable in five simple steps

2012-09-19

Out of frustration with Unity, I’ve used several window managers/desktops recently. They are excellent at what they do, but I missed how everything (volume control, display brightness, keyboard switcher, sleep on laptop lid close, sound mixer, wifi applet, gnome keyring manager, automounting, etc.) just works out of the box in the default Ubuntu installation. All these things need to be taken care of separately when using a minimalistic window manager, most of them are painful, some impossible.

Unity is a strange beast. Some of its features are amazing (merged topbar with window bar, semi-maximizing windows with ctrl+windows_key+left/right), while some are awful (the unholy left menu with its terrible launcher and all the other tentacles). I figured long as I can avoid the few bad parts, I’ll be happy.

You’ll need to install compizconfig-settings-manager for most of the tweaks to work. People who say ccsm breaks things are noobs.

Get the unity sidebar out of the way:
Appearance: Behavior: Auto-hide: ON
Optionally also, Reveal location: Top Left Corner, and set sensitivity to something ridiculous so the cthulhu never pops out.

Sensible Alt+F2:
First, install gmrun, a quick and small program launcher with tab completion.
Second, unbind Alt+F2 from Unity:

gconftool-2 \
-s "/apps/compiz-1/plugins/unityshell/screen0/options/execute_command" \
-t string ""

Third, bind Alt+F2 to gmrun:
ccsm: Commands: add command “gmrun”, key bindings bind to Alt+F2

Default four desktops aren’t bourgeoise enough, get more:
ccsm: General Options: Desktop Size

Focus Follows Mouse:
ccsm: General Options: Focus & Raise: uncheck “Click To Focus”

Fullscreen Any Window:
ccsm: Extra WM Actions: Toggle Fullscreen: Alt+F11

Swap Ctrl and Caps Lock:
gnome-tweak-tool: Typing: Ctrl key position: Swap Ctrl and Caps Lock

I find the environment comfortable to use after just these five tweaks (press and hold the windows key to see many useful Unity shortcuts), though I’m still struggling with window switching. Usually bypass it by keeping one fullscreen window per desktop.

PS: Why did I write this? Further reference. Yesterday I accidentally ran “unity –reset” (do NOT ever run that) and had to google these steps (again). No more!

From the Atlantic to the Black Sea

2012-06-17

Eurovelo is a network of long distance bicycle routes, most of which are still under development as of 2012. Eurovelo 6 is a path from the Atlantic to the Black Sea. It leads accross Europe through France, Switzerland, Germany, Austria, Slovakia, Hungary, Serbia, Bulgaria, and Romania.


See where I slept on a larger map

Why bicycle touring?

Travelling is healthy for the mind, cycling for the body. I’ve been a fan of both for a while and wanted to combine them. Turns out there are extra benefits to bicycle touring: It is the right speed to travel. Cars are too fast, walking is too slow. Cycling is fast enough to reach places, and slow enough to see them. The locals will love you. They’re tired of buses full of tourists who jump out, take a picture of the famous castle, jump back in, and speed off. A bicycle rider, you are out there in the open where everyone’s equal. You get to see both the expected and the unexpected, the mundane and the extraordinary.

Some more advantages include being able to eat as much as you can of whatever you want without getting fat, not having problems falling asleep, getting leg muscles of steel, awesome tan, new friends, stories to tell. Today morning feels like years ago, so much has happened since. A month ago feels like yesterday, the memories are so vivid.

Expectations vs. reality

CouchSurfers

CouchSurfing is rather time consuming, so I planned to surf only in the beginning, when it was too cold, and sleep in a tent later on. However, it wasn’t just a convenient way to sleep for free, but primarily a great way to spend time with the locals and get to know the culture.

I cannot overstate the importance of CouchSurfing for my trip. All my hosts took excellent care of me and we had lots of fun.

The experiences were quite varied: I met travellers, students, teachers, photographers, programmers, (organic!) farmers, pick-up artists, baristas, flutemakers, dumpsterdivers, CEOs, builders of self-sustainable houses, psychotherapists, nuclear power plant managers, orangina tasters, hikers, bikers, artists, clowns, actors, soccer fanatics, etc. It has been extremely enriching and there just isn’t enough space here to describe it.

Bicycle tourists

A reminder that I’m not insane. Or at least not the only one. There were so many! If I felt lonely for some reason, only spending each evening with new people, bicycle tourists for the rescue! Perhaps unsurprisingly, most are either young (studying or just finished) or old (retired). Not many inbetween.

Here are the cyclists who deserve special mention for their heroic feats:

  • Olek, racing for three days, sleeping outside in rain with no tent
  • Salome, rode a fixed gear from London to Basel and now rides it as a bicycle messenger
  • Norman, from London to Venezia and then from Egypt to South Africa in 3 months
  • Martin, wanted to beat the bike around the world record, gave up and was doing London-Istanbul in two weeks with average of measly 150-270km per day (funny thing, we met in the rain after Vienna, next to the Donauinsel where everyone gets lost, just as Martin was coming back from the dead end while I was heading towards it).

Furthermore, Dad joined me for a week, Kev and Mat rode with me for two days, and Mauro and Jole let me camp with them on the Black Sea beach. Many more deserve mention, but there’s not enough space.

The Countries

It would be tiring to read my daily journals, so here are the highlights and summaries for the different countries. These reviews do not include CouchSurfers, it would send the scores through the roof (except Romania).

France
Glad to have started there. Apart from great bread, wine, and cheese, France had best signposted cycle paths with good surfaces. And apart from Serbians, the French were the most hospitable people. Once I was spotted by a group of about 25 elderly cyclists who were having a pique-nique. They waved and shouted at me until I turned my bike around and went to see them. I was provided generous amounts of wine, bread, cheese, saucisson, wine, cheese, wine, … Viva la France!

My only complaints about France would be the unreasonably high price of the pensions/chambres d’hotes, and the weather. Supposedly April is the most beautiful month. Everyone tells me I got unlucky.

Switzerland, Germany, and Austria
All three have very pictoresque towns and villages.

Switzerland is expensive! Nine francs for one beer was quite a surprise. In Romania, you can have a three course meal for that. In a four star hotel.

Germany is the country with no internet. But with cheap food and accommodation. Unless there’s a party going on, the Germans are rather reserved. Hello, thank you, good bye.

I’ll never forget the terrible headwind in Austria. Otherwise, Austrians are just Czechs who happen to speak German. Pretty cool.

Slovakia and Hungary
In and out of Slovakia. People stopped responding to my greetings. Starting to look like Eastern Europe even though you can still pay in euros.

Hungary is the country where köszönöm means thank you. First real language barrier. Good food, friendly people. Budapest is an amazing city.

Serbia
Serbians are the most hospitable people. They don’t have much, but are eager to share it all with you. Strangers bought me beers. When I was buying a couple of tomatoes and a bag of strawberries I got the bag of strawberries for free. Using Antoine’s trick, I asked for water near Belgrade, and the family offered me a tasty dinner and later a comfortable bed. They spoke Serbian, I spoke Czech, we didn’t understand each other much, and somehow it wasn’t a problem.

Apart from that, the bicycle paths go on rather reasonable roads with not much traffic and there are awesome funny/silly/wise quotes on the Eurovelo signposts.

Romania and Bulgaria
Andrei says that Romania would be a beautiful country if it wasn’t full of Romanians. That’s so wrong! Romanians are very nice, but the country sucks. There’s no internet, it’s full of rabid dogs, and there are no hotels anywhere except Constanta/Mamaia, where there’s 10km of nothing but hotels lined up next to each other. Perfect vacation.

There are no city centers, no main squares. In villages, there’s the Main Street. The occupation of Romanians is to sit on the benches in front of their houses on the Main Street and Greet Cyclists.

The kids high-five you while the elders wave and shout something incomprehensible. Unfortunately, the language barrier is insurmountable, as vast majority of Romanians only speak Romanian. I speak enough to ask for water and say thank you, so that’s what the conversations are limited to.

Bulgaria differs from Romania in two aspects: It’s about half the price and Bulgarian is a slavic language. Too bad I couldn’t spend more time there.

Where are the photos?

Well, that’s a bit of a touchy topic. I’m afraid of taking pictures of people. Most get scared of the big camera and start acting in strange ways. You need to be really careful. The Indians don’t let you take a photo of them because it takes away their soul. That’s wise. A photographer needs to be able to capture the soul without taking it away. Not sure I can do that. Not yet, perhaps.

So, I have a couple of photo galleries from my trip. Along la Loire to Orleans, rest of la Loire plus Saone and Doubs, Switzerland and Germany, Austria, Slovakia, and Hungary, and Serbia, Bulgaria, and Romania. I’m even really happy about some of the pictures. But, I didn’t take the most important ones…

Summary

This trip was by far the greatest thing that ever happened to me, thanks to all the people I met. If you read this far, do yourself a favor, quit your job, go buy a bicycle and get a ticket to Saint-Nazaire.