In this post we take a look at some impressive data visualisation techniques applied to tree structures. Our data: the entire mutual fund universe. Our mission: to make data both understandable and attractive.
The fund world is immense and confusing. It is hard to know where to begin when investing in mutual funds. Choosing a category can help with the selection but there exist countless classifications, each one with a different objective or central focus.
A working group of EFAMA (European Fund and Asset Management Association) developed a system for classifying funds known as the European Fund Classification (EFC). They do a great job of organising the complex structure using well-defined criteria to group funds.
Using this classification as a starting point, we have put some order in our gigantic fund universe FinUniverse by creating a hierarchical structure containing categories of similar behaving funds.
How to visualise the fund classification?
Since our data has a tree structure, some common data visualisations spring to mind. Such examples are a basic tree plot (above) or a tree directory (below). These allow for clear representations of how the data is organised but they leave little flexibility for comparing node characteristics.
An alternative is a treemap, which uses rectangles to represent each element. In our example, the size of each shape represents the number of funds in the final categories. Produce these graphs easily using this great squarify package in python.
The problem here is that we lose some sense behind the hierarchical structure in the data. We can play with colours to achieve some indication of which categories are similar. But it is still not clear how they are related or which ones are grouped together.
The solution? The Sunburst
In QuantDare we are obsessed with this circular representation of hierarchical data structures. You can have the best of both worlds with the sunburst diagram since both the structure and the data comparison can be depicted.
The rings moving out from the centre are the hierarchy levels. These rings are divided according to the relationship to parent slices.
In our example of the FinUniverse, the first level of rings represents the six main asset classes and the partitions depend on the number of funds each one contains. Subsequent rings are split according to the relative number of funds in the subcategories (regions, credit quality or strategy type).
The sunburst ring speaks for itself when it comes to data visualisation. With just one glance you have a complete view of the fund universe structure and the relative sizes of all the categories. You can find the elegant recursion Python code used to create this sunburst in StackOverflow.
More pretty data visualisation…
Highcharts have an awesome interactive version of sunburst that is really worth checking out. And an alternative to this is a hyperbolic tree with varying node sizes but these can get messy with large datasets.