# PALETTE EXTRACTION

While working on a different project, I found myself needing to choose an appropriate backdrop color for the display of each one of a collection of wallpaper images. Naturally, it is desirable that the color chosen complements the palette of the image itself: preferably being one of the more dominant colors present.

While manually choosing the color is doable for a collection with a few members, I was dealing with one containing hundreds of images. For such quantities an automated procedure is needed. Furthermore, the images involved are displayed as soon as they are loaded, which means the procedure must also be considerably quick (even for high-resolution inputs). In this project we set out to develop a solution satisfying these requirements.

## Approach

Our approach consists of two main steps: (1) obtaining a representative color sampling of the image, and (2) analyzing the sampled data to find dominant color clusters.

### Sampling the image

In a perfect world, we would use all the information available to us to construct our palette, however as we are dealing with high-resolution images analyzing all of the pixels might take longer than we would like. In those cases where the available computing power is not sufficient to comply with our time demands, we take only a fraction of the available pixels to further analysis.

There are several approaches to selecting our sample. A naive one would be to use a uniform distribution across the whole image, but this method is unbalanced: often there will be those regions that are sampled too often and those that are sampled not enough. To reduce variance, we use jittered grid supersampling: we overlay the image with a coarse grid, and sample uniformly within each cell. This improves the balance of samples across different regions (Figure 1).

### Detecting color clusters

To detect the dominant $$k$$ colors $$C$$ in our collection of samples $$S$$, we use the $$k$$-means algorithm to divide $$S$$ into clusters, and $$C$$ would then contain the means of these clusters. The algorithm can be summarized in the following four steps:

1. Initialize with a guess for the cluster means (our implementation just picks a random subset of $$k$$ members from $$S$$).
2. Assign each sample to the cluster whose mean it is closest to.
3. Update the means by taking the average of all samples assigned to each cluster and computing the new mean value for each.
4. Repeat steps 2 and 3 until there is no more change in the cluster assignment.

It is important to note that although the colors we obtain are initially represented in RGB-space, we perform the clustering on their respective representation in HSL-space. This, because distance between colors in HSL corresponds better with their difference as it is perceived by us.

## Source code

You can view the source code on GitHub.