How I Data Mined the top 300 paids apps to create Tehula's icon

July 13, 2012

This morning, I released Tehula, my first iPhone app which lets the user ask their friends where they are just by sending a regular text message – you can read more in the previous post. Although I had a few ideas what the icon should actually look like in terms of shape, being not much of an artist I had no clue what colors would be the most effective to attract users. All I knew was that the icon would be a blend of a position indicator and a silhouette, in order to illustrate the basic functionality of the app.

Icon basic concept, in black and white.

However, as a data scientist, I knew I could at least try to find what had worked for others. Tehula being a utility, I truly believed it should visually blend in with successful apps, so I embarked on a quest to find the characteristics of the top 300 paid apps icons – the following analysis was done on June 26th, 2012, and therefore may slightly differ from what would be obtained at a subsequent date.

The first step was to gather a list of the top paid apps on the app store. As it turns out, Apple provides an RSS feed of the top 300 paid apps, of which I collected the apps unique identifiers. I then used the iTunes search api to fetch a link to the actual high resolution icon of each of the apps. This whole process can be done with the following python code.

# list-icons-urls.py
import re, feedparser

pattern = re.compile('/id([0-9]+)\\?')
entries = feedparser.parse('http://itunes.apple.com/us/rss/toppaidapplications/limit=300/xml')['entries']
for entry in entries:
  uid = pattern.search(entry['link']).groups()[0]
  info = json.load(urllib2.urlopen("http://itunes.apple.com/lookup?id=%s" % uid))
  print info['results'][0]['artworkUrl512']

Finally, I piped the whole list of icons to wget and waited for the icons to be downloaded.

python list-icons-urls.py > icons-urls.txt
wget -i icons-urls.txt

The first thing I was interested in was to find which color was the most popular among all those icons. I admit there must be a little bit of cargo culting on my part right here, but since I have no background in color theory I thought I’d choose the main color of Tehula’s icon by looking at what was chosen for others.

Distribution of hues of the aggregate of colors present in the top 300 iOS apps icons.

What we see above is that there are three clearly popular colors among those icons: pure red, an orangeish tint, and cyan-blue. Note that this figure does not take saturation and lightness into account, when we actually look at the icons themselves, the orange hue is actually often dark and unsaturated due to the presence of browns. Also note that on the figure I have only represented the hues which were not associated to black/white/gray pixels as this does not make any sense. In the end, this left me with a choice between red and blue, and I went with my gut feeling that a shade of blue would better represent the simplicity and utility of the app.

For exhaustivity’s sake, I’ll just add to that that the distributions of luminosity and saturations are not specially interesting: a vast majority of pixels are fully saturated or grayscale, and the luminosity ends up being a gaussian bell skewed to higher luminosity, pretty boring stuff right here.

Now, the more interesting part is that not only did I want Tehula’s icon to blend in with the other icons in terms of colors, I also wanted set it apart from other icons in some way. This is why I thought of a way to quantify the visual complexity of the icons: I regrouped all pixels in each icon in 11 categories (red, orange, yellow, green, cyan, blue, purple, magenta, white, gray, black) by selecting the category closest to the pixel in terms of color difference (in CIELab). This not only gives a color profile for each icon, but it also gave a number of distinct colors used by each of those (I have filtered out the colors which appear in less than 5% of pixels as they just introduce noise in the histogram).

Distribution of the number of distinct color categories per icons – here I only represent those values for icons with a blue or cyan dominant, since the this is the hue which was chosen for Tehula's icon.

As you can see above, more than 75% of the icons which have a blue or cyan dominant feature at least 3 different colors – including the blue/cyan. Since I wanted the icon to pop out in some way or the other, I chose to go with two colors – another effect is that this allows for some kind of visual minimalism which relates to the simplicity of the app’s use. The major secondary color when blue or cyan are dominant is white (19 out of 76 icons), and in a way this makes sense as it allows for the most contrast.

Once I had the colors, the rest was pretty straightforward. I’m pretty happy with the way the icon turned out, as a non-designer, I feel I’ve created an visual identity which is good enough and which represents the ease of use of the app. And by the way, you should definitely check out Tehula!

I am a data scientist interested in social network analysis and applied graph theory. In my free time, I run Tehula, a useful and dead simple location querying service. You shouldn't follow me on Twitter, I barely tweet.