MySQL low cardinality index efficiency

I’ve heard the opinion that indexes on low cardinality columns don’t work well. I set out to disprove that.

First, let’s create a table with a boolean column “active” and fill it with dummy data. Please note it includes a (default, B-Tree) index on the “active” column.

root@localhost test > show create table users\G *************************** 1. row *************************** Table: users Create Table: CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `active` tinyint(1) DEFAULT NULL, PRIMARY KEY (`id`), KEY `active` (`active`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) root@localhost test > DELIMITER ;; root@localhost test > CREATE PROCEDURE insert_random(IN amount INT, IN percent_active INT) -> BEGIN -> DECLARE counter INT DEFAULT 0; -> myloop: LOOP -> if (counter = amount) THEN -> LEAVE myloop; -> end if; -> INSERT INTO users SET -> name = UUID(), -> email = UUID(), -> active = IF(FLOOR(RAND() * 100) < percent_active, 1, 0); -> SET counter = counter + 1; -> END LOOP; -> END ;; Query OK, 0 rows affected (0.00 sec) root@localhost test > DELIMITER ; root@localhost test > call insert_random(1000000, 10); Query OK, 1 row affected (36.60 sec) root@localhost test > select count(*) from users; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.18 sec)

We’ve inserted a million users into our table! That took some time. Please note the 0.18 seconds to count them all.

Now, let’s see how fast we can count the active users:

root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.04 sec)

Cool, pretty fast. Can you explain that?

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ref possible_keys: active key: active key_len: 2 ref: const rows: 208462 filtered: 100.00 Extra: Using index 1 row in set, 1 warning (0.00 sec)

Now let’s drop the index and run the test again:

root@localhost test > alter table users drop key active; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost test > select count(*) from users where active = true; +----------+ | count(*) | +----------+ | 99887 | +----------+ 1 row in set (0.24 sec)

Wow, a lot slower… I’ve ran the selects a couple of times with consistent results to verify no caching was influencing the results. Let’s see the explain:

root@localhost test > explain select count(*) from users where active = true\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: users partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 912236 filtered: 10.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

As you hopefully know: “Using index” good, “Using where” bad.

TL;DR A boolean column consisting of 10% TRUE and 90% FALSE queried for TRUE values using an index takes 0.04 sec, while not using an index takes 0.24 sec. The index speeds up the query by about a factor of six.

I’ve repeated the test with worse case scenario of a binary column split 50/50. 50% true, 50% false. The numbers are a bit less consistent, but generally around 0.16 for the indexed version and 0.24 for the unindexed version. Go indexes!

On URLs

People don’t pay any attention to URLs. Recently, a highly intelligent friend of mine inadverently posted the following link on Facebook: https://www.givingwhatwecan.org/get-involved/how-rich-am-i/?country=NLD&income=56000&adults=1&children=0

URLs are the building blocks of the world wide web. When a website changes its URLs, you get those nasty 404 errors. It would be good if website maintainers paid a little more attention to that.

As a website user, when you’re sharing a URL, look at it briefly. Does it actually contain the information you want it to contain? Can other people view this URL? Pro tip: use incognito mode to find out. Does it include superfluous information you were not intending to share? E.g. when searching google, you can easily end up with URLs such as https://www.google.com/search?q=previous+search#q=current+search

Keep URLs simple & working. Look at them briefly when sharing. It’s no rocket science.

The Acer confusion

Not knowing the following table can result in rather pointless arguments.

Latin Czech Polish
Acer Javor Klon
Acer platanoides Javor mléč Klon zwyczajny
Acer pseudoplatanus Javor klen Klon jawor

One day I’ll make a compendium of Polish-Czech insanities.
Nůžky ~ nożyczki, nožičky ~ nóżki.

My favourite vim feature

No place to be humble, I’m a vim power user.

Yes, your IDE can do things my vim can’t. But my vim can do things your IDE can’t. How many times have I watched someone using an IDE and thought “if only it were vim I’d take your keyboard and solve this in about two seconds, while you click around and keep getting lost”.

My favourite vim feature? Displaying different parts of one buffer (ie file) in two vertically split windows. Can your IDE do that? Now get off my lawn!

Open a file, split vertically, scroll each of the split windows independently:

$ vim ~/.bashrc
:vs
G

Explorations in AI – solving RoboZZle

RoboZZle is a robot programming game. You can play it in your web browser (even without Silverlight), on Android, or iPhone/Pad/Pod. Go and try it, otherwise the rest of this entry won’t make much sense.

After a while of playing the game myself, I started getting interested in creating a program to find puzzle solutions. Having completed Intro to AI by Sebastian Thrun and Peter Norvig in 2011 and never using the techniques since, this seemed like a good opportunity to explore them further. As things go, none of these techniques appear directly applicable and a simple evolutionary search seems best.

Approach

I started by writing a solution runner, the main output of which is determining whether a given RoboZZle program solves a puzzle. Next were a robozzle.com client, utilizing the awesome Python requests library, and finally solver randomly generating programs to be tested by the runner. You can browse the source code of Zlej Rob at github.

Some facts about the solver:

  1. Start with an empty program.
  2. In each generation, each of the surviving programs will have ~100 randomly generated offsprings.
  3. A program is mutated by overriding (or removing) 1 to ~3 instructions of its parent.
  4. Programs are scored: positive points for reaching a square, extra points for collecting stars, and a negative point for each program instruction.
  5. If an offspring achieved higher score than its parent, it’s added to the current program population.
  6. Program population is kept to ~100 programs with highest score.

I decided not to cross-breed programs at all, as it doesn’t feel like it would be helpful.

Surprisingly, keeping a set of all evaluated programs doesn’t eat all the memory (and prevents recalculating the same thing over and over, speeding up execution by an order of magnitude). Yay for sets of tuples!

Results

Zlej Rob has solved over a 1000 puzzles in a couple of weeks of running on my $5/mo DigitalOcean (that’s a referral link – sign up and I’ll get rich) droplet alongside this blog and a couple other things.

Zlej Rob’s good at solving one-way street puzzles with one function, bad at anything involving multiple functions, mediocre at multiple-possibilities puzzles, and passable at random walks.

Zlej Rob discovered the shortest solution for Twins 2 (which I improved by removing a redundant function call). I think that’s pretty impressive. The solution includes a lot of recursive calls, and the replay takes ages – the robot loops and loops, seemingly never getting very far.

Ideas for improvement

  1. Visualisation. Would help identifying why certain things work and others don’t. I’ve almost started writing an Angular app to do that.
  2. Smarter mutations. Mutations should include abstracting random instruction subchains into other functions (this could be very useful for multifunction problems where Zlej Rob usually fails terribly because it can’t connect the dots). It might also be better to insert instructions instead of replacing them.
  3. Program diversity (“similarity penalty”). Surviving population for each generation is around 100 programs (toyed with higher values with no positive results – if anything, having 1000 programs is terribly slow), and they can all easily end up being very similar, getting stuck in local minima.
  4. Higher score programs should have more offsprings. Could speed up certain puzzles but perhaps make less obvious solutions unatainable. Would need to ensure program diversity first.
  5. Hints. For puzzles with an “obvious” order in which squares must be visited (such as binary counting or limit your stack), mandate this order. Currently, Zlej Rob finds random solutions that sweep vast majority of the board in a haphazard manner. Forcing the order in which squares must be visited would be extremely helpful, but likely requires human participation. Perhaps that’d be cheating?

Where do we go from here?

Zlej Rob’s results have surpassed my expectations, especially considering I haven’t spent that much time on it. Getting Zlej Rob into top 10 of RoboZZle players seems possible but would likely require an order of magnitude more effort than I spent so far. Not sure if worth it?

How I didn’t stop eating food

I’ve recently read How I Stopped Eating Food and think the author is nuts. Rob claims his wonder-potion helped him clear his to-do list and got him rid of his coffee addiction. While his approach is questionable, I’ve long shared many of his goals, such as a healthy and balanced diet, while not wasting time preparing food.

Here are some dishes I’ve been cooking lately:

  • First my absolute favorite: fast, clean and super-healthy. Take broccoli/cauliflower/spinach/green beans/other vegetable. Wash, cut up in a couple of pieces, put in a plastic box, add a bit of water, and microwave between one and three minutes. Steamed vegetables retain both their taste and vitamins. Meanwhile, grate some cheese to sprinkle on top and wash side-dish tomatoes/paprika/carrots/other vegetables. Add a bit of pepper and pour olive oil over it all. Voila, that took about 7 minutes, dirty dishes amount to one plastic box and the grater.
  • Preheat oven. Wrap fish in aluminium foil and put it in the oven. Go read a book for 10 minutes. Wash side-dish vegetables and perhaps cut a slice of bread. No dirty dishes.
  • Scrambled eggs are pushing it, preparation takes like 15 minutes and there’s the dirty pan left afterwards. But sometimes I still enjoy them.

I eat tomatoes and paprika with almost every meal. For meals without much protein, I add few slices of dead animals. If the dish doesn’t have enough fat, I add generous amount of olives and snack nuts afterwards. Oh and I’ve just discovered avocados!

Breakfast is my guilty pleasure, oats (+ banana and nuts) with yogurt, often times of Greek variety, with 15 grams of saturated fat per half-a-cup, instantly getting to 75% of maximum recommended daily saturated fat intake. Oh well.

Chairs

I’ve been undergoing torture all my life. These devices of the devil bend my body 90° at the waist, and then again 90° at the knees. Halfway between bent and straight. I can hardly imagine anything less comfortable.

I’d be happy if the chair-angles were either bigger or smaller. Kneeling chairs are nice, as the body/thighs angle is about 110°, which “reduces lower back strain by promoting proper spinal alignment”. Zafu is cool, because you can sit on it in many different ways, according to your current mood. Bean bags are comfortable, they adjust their shape to your shape. Sitting on the floor leaning against a wall with legs stretched out and a small pillow under your back is almost as good as sitting on a bed of leaves leaning against an oak.

But Aeron? No thank you.

Déjà vu explained

So, Wikipedia says:

The psychologist Edward B. Titchener in his book A Textbook of Psychology (1928), wrote that déjà vu is caused by a person getting a brief glimpse of an object or situation prior to full conscious perception, resulting in a false sense of familiarity. The explanation that has mostly been accepted of déjà vu is not that it is an act of “precognition” or “prophecy”, but rather that it is an anomaly of memory, giving the false impression that an experience is “being recalled”.

What a load of bollocks. Here’s the real deal:

When you experience a déjà vu, it had already happened to you. Your experiences are fully controlled by your mind — you do not perceive some objective reality but your subjective one.

Subjective reality means you can experience different situations in a very similar way, when they put you in the same emotional state. More scientifically, the same partial chain of neurons in your brain fires in the same sequence. That’s why you remember it, that’s why you sense a déjà vu — it had already happened to you.

From the half-assed-guesses-I-am-absolutely-convinced-about department.

One week in Sheffield

  • Almost got run over dozen of times. Surprisingly, on a bike it’s easier to keep left.
  • Stores are open (convenient; I’ve lived in a country where stores are closed).
  • In Sainsbury’s, everything is either discounted or 3 for the price of 2. You think people would see right through that, but I still get excited when whatever I want to buy turns out to be discounted.
  • Show me a white person in Sheffield and I’ll show you two Asians.
  • Local “pale ale” is actually very tasty.
  • Female cashiers twice your age call you “my love”.
  • Yesterday at a concert I met a guy from South England who couldn’t understand the Sheffield accent either.
  • Cheddar.
  • Hot girls are usually Asian. Or Spanish exchange students.
  • Full English breakfast is not bad. Though the beans take time getting used to.
  • Sheffield is hilly. Even though five speeds are all I need on my bike, sometimes I can’t help but secretly wish for a lower speed.
  • Ordered spicy pork noodle soup in an Asian place. It was the spicy-in, spicy-out kind.
  • There’s no light in my hotel bathroom but the pub downstairs is way cool.
  • Museums and galleries are plentiful and free.
  • Had to pack and move all my things three times in one week. Bought a pull-up bar and a bicycle to make the process more challenging.
  • Got called up on stage at a local theater play. Scared me so much that when asked where I was from, answered Argentina.
  • Peak district is awesome.

Make Ubuntu Unity usable in five simple steps

Out of frustration with Unity, I’ve used several window managers/desktops recently. They are excellent at what they do, but I missed how everything (volume control, display brightness, keyboard switcher, sleep on laptop lid close, sound mixer, wifi applet, gnome keyring manager, automounting, etc.) just works out of the box in the default Ubuntu installation. All these things need to be taken care of separately when using a minimalistic window manager, most of them are painful, some impossible.

Unity is a strange beast. Some of its features are amazing (merged topbar with window bar, semi-maximizing windows with ctrl+windows_key+left/right), while some are awful (the unholy left menu with its terrible launcher and all the other tentacles). I figured long as I can avoid the few bad parts, I’ll be happy.

You’ll need to install compizconfig-settings-manager for most of the tweaks to work. People who say ccsm breaks things are noobs.

Get the unity sidebar out of the way:
Appearance: Behavior: Auto-hide: ON
Optionally also, Reveal location: Top Left Corner, and set sensitivity to something ridiculous so the cthulhu never pops out.

Sensible Alt+F2:
First, install gmrun, a quick and small program launcher with tab completion.
Second, unbind Alt+F2 from Unity:

gconftool-2 \
-s "/apps/compiz-1/plugins/unityshell/screen0/options/execute_command" \
-t string ""

Third, bind Alt+F2 to gmrun:
ccsm: Commands: add command “gmrun”, key bindings bind to Alt+F2

Default four desktops aren’t bourgeoise enough, get more:
ccsm: General Options: Desktop Size

Focus Follows Mouse:
ccsm: General Options: Focus & Raise: uncheck “Click To Focus”

Fullscreen Any Window:
ccsm: Extra WM Actions: Toggle Fullscreen: Alt+F11

Swap Ctrl and Caps Lock:
gnome-tweak-tool: Typing: Ctrl key position: Swap Ctrl and Caps Lock

I find the environment comfortable to use after just these five tweaks (press and hold the windows key to see many useful Unity shortcuts), though I’m still struggling with window switching. Usually bypass it by keeping one fullscreen window per desktop.

PS: Why did I write this? Further reference. Yesterday I accidentally ran “unity –reset” (do NOT ever run that) and had to google these steps (again). No more!