Statistical Plotting with Julia: AlgebraOfGraphics.jl | by Roland Schätzle | Apr, 2023


Phyto by Antoine Dautry on Unsplash

Easy methods to create statistical plots utilizing the AlgebraOfGraphics.jl (and Makie.jl) package deal

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt USD2146859812 ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

The Grammar of Graphics (GoG) is a theoretical idea, which is the bottom of many in style graphics packages (like ggplot2 in R or ggplot in Python). Throughout the Julia ecosystem there are even a number of graphics packages based mostly on the GoG. So the consumer has the selection. Due to this fact I’ve created this sequence of articles to match these packages so as to make the selection simpler.

I’ve began the sequence with an introduction to the GoG and already introduced the graphics packages Gadfly.jl (Statistical Plotting with Julia: Gadfly.jl) and VegaLite.jl (Statistical Plotting with Julia: VegaLite.jl).

The AlgebraOfGraphics.jl-package (AoG) is now the third graphics package deal based mostly on the Grammar of Graphics (GoG) which I current on this lineup.

For the examples demonstrating AoG on this article, I’ll use the very same knowledge as within the earlier articles (an in depth rationalization of the information might be discovered right here) and I’ll attempt to create the very same visualizations (bar plots, scatter plots, histograms, field plots and violin plots) as I did there, so as to make a 1:1 comparability of all packages attainable. I assume that the information for the examples is prepared within the DataFrames international locations, subregions_cum and regions_cum (as earlier than).

The AoG-package is probably the purest implementation of the GoG thus far, as we are going to see within the following examples. It’s based on sound mathematical ideas and its authors describe it as a “a declarative, question-driven language for knowledge visualizations”. Its primary developer is Pietro Vertechi.

On a technical degree it takes a totally totally different method from the packages we’ve seen to this point: Whereas Gadfly.jl is a standalone graphics package deal, purely written in Julia and VegaLite.jl is a Julia-interface for the Vega-Lite graphics engine, AoG is an add-on package deal to Makie.jl . Makie itself is the youngest graphics package deal throughout the Julia ecosystem (which can also be utterly written in Julia).

The boundaries between AoG and Makie are fluid. A number of components of AoG use Makie-attributes and Makie is at all times the fallback resolution, if some points can’t be expressed utilizing the ideas of AoG itself.

It must also be famous that AoG remains to be a piece in progress. Model 0.1 appeared solely in 2020. Due to this fact it isn’t as full as the opposite, extra mature packages and some points merely don’t work but.

So let’s leap into the primary visualizations, which depict the inhabitants sizes of the areas (i.e. continents) and the subregions respectively utilizing bar plots.

Inhabitants by area

First we wish to present the inhabitants dimension (in 2019) for every area (i.e. continent) as a bar throughout the bar chart. Other than that, every „region-bar“ ought to have a special colour.

Utilizing this straightforward instance, we will see how the essential ideas of AoG work: In GoG-terms, this visualization is predicated on knowledge from the regions_cum DataFrame and it consists of:

  • a mapping of the information attribute Area to the x-axis
  • a mapping of the information attribute Pop2019 to the y-axis
  • a mapping of the information attribute Area to colours
  • use of the “bar” geometry

As I defined within the introduction to the GoG, considered one of its concepts is, {that a} specification of a visualization might be created from separate constructing blocks, which can be mixed to particular wants. AoG has absolutely applied this concept. Due to this fact we will translate the GoG description on to AoG components:

  • regionPop2xy = mapping(:Area, :Pop2019) is the mapping of Area to the x-axis and Pop2019 to the y-axis
  • region2color = mapping(colour = :Area) is the mapping of Area to colours
  • barplot = visible(BarPlot) is the “bar” geometry

Now we will mix these constructing blocks (utilizing the operator *), taking knowledge from regions_cum and create the plot with a name to draw:

draw(knowledge(regions_cum) * regionPop2xy * region2color * barplot)

This ends in the next bar plot:

Inhabitants by area (1) [image by author]

As within the earlier articles, we create additionally a beautified model of every visualization by including labels, a title and a pleasant background colour amongst different issues. This may be achieved in AoG utilizing the Makie-parameters axis and determine to draw:

This results in the next chart:

Inhabitants by area (2) [image by author]

Inhabitants by Subregion

Now let’s transfer on to the visualization of the inhabitants by subregions. That is principally the identical just like the plots above, however we take the information from subregions_cum as an alternative of regions_cum.

So our mapping to the axes is now subregionPop2xy = mapping(:Subregion, :Pop2019). As we would like the bars for the subregions once more coloured by area, we will reuse the mapping from above and the essential plot might be drawn with:

draw(knowledge(subregions_cum) * subregionPop2xy * region2color * barplot)

This produces the next plot:

Subregion by inhabitants (1) [image by author]

Clearly the subregion labels can be extra readable if we selected a horizontal bar plot. This may be achieved by swapping the information attributes within the mapping to the axes: subregionPop2xy_hor = mapping(:Pop2019, :Subregion) and by including orientation = :x to the visible. So the code to attract a horizontal model of this bar plot is:

draw(knowledge(subregions_cum) * subregionPop2xy_hor * region2color * 
visible(BarPlot; route = :x))

That is sadly a specification the place it turns into clear that AoG remains to be a piece in progress. There have to be some bug within the rendering course of, as a result of the results of this draw command appears as follows:

Subregion by inhabitants (2) [image by author]

The ticks on the y-axis in addition to the bars are misplaced and the ticks on the x-axis are neither what we would like.

Inhabitants by Subregion utilizing Makie.jl

So we take this downside as a possibility to change to Makie.jl. Makie is a somewhat low degree graphics package deal. Many issues we get robotically within the packages we’ve seen thus far, should be specified explicitly in Makie. This provides the programmer loads of management however makes the specs fairly verbose.

One other shortcoming is, that Makie can not deal with nominal knowledge. All nominal knowledge must be transformed to a numeric kind earlier than it may be visualized. In our case which means, that we’ve to transform the nominal knowledge of the attributes Area and Subregion to numbers:

  • That is comparatively straightforward for Subregion, as a result of this attribute comprises distinctive values. So we merely use the index values of that column of the DataFrame and retailer them within the new column subregion_num.
  • The Area values will not be distinctive. Due to this fact we convert them first to a CategoricalArray which does implicitly a mapping to numeric values. We will acquire then the corresponding numbers utilizing the operate levelcode and retailer them in one other new column region_num.

Other than that, we selected an ample colour scheme (Set2_8) from ColorSchemes.jl so as to get good and distinguishable colours for the areas. This scheme appears as follows:

The colour scheme Set2_8 [image by author]

For all these preparations we want the next code:

We are going to then immediately create a “beautified” model of the bar plot with labels and so forth. In Makie we want a Determine as a base factor, the place the barplot might be positioned. As Makie can not deal with nominal knowledge, we additionally should specify the ticks for the y-axis manually utilizing the yticks attribute as we will see within the following code, which creates our horizontal bar plot:

It is a lot of code, however the outcome appears fairly pleasing:

Inhabitants by subregion (3) [image by author]

To be able to get a model of this bar plot the place the subregions are sorted by inhabitants dimension, we’ve to kind the information in subregions_cum accordingly utilizing kind!(subregions_cum, :Pop2019) after which execute the code above (together with the mapping to numeric knowledge) once more. This results in the next plot:

Inhabitants by subregion (4) [image by author]

After this tour to Makie, we return again to AoG making an attempt to visualise how inhabitants change is dependent upon the scale of the inhabitants. We will do that utilizing a scatter plot as follows:

popChangeVsPop = knowledge(international locations) * 
mapping(:Pop2019, :PopChangePct) * 
mapping(colour = :Area)

The specification comprises a mapping of Pop2019 to the x-axis and PopChangePct to the y-axis, in addition to a mapping of Area to a colour (we may have reused region2colorat this level, however additionally it is attainable to specify a mapping immediately). A visible might be omitted right here, as a result of the purpose geometry (Scatter) is utilized by default by AoG on this context. This provides us the next plot:

Progress charge in relation to inhabitants (1) [image by author]

As within the earlier articles, we enhance now the visualization by utilizing a logarithmic scale on the x-axis as the information is kind of skewed. As well as we do our “beautification” by including labels, a title and so forth. All this may be achieved by reusing the plot specification popChangeVsPop and including the aforementioned components by passing ample parameters to draw:

This results in the next plot:

Progress charge in relation to inhabitants (2) [image by author]

Now we change to histograms which we use to depict the distribution of GDP per capita among the many totally different international locations. As AoG affords a so-called histogramevaluation, the specification is kind of easy:

draw(knowledge(international locations) * mapping(:GDPperCapita) * histogram())

An evaluation is in AoG a technique to course of knowledge earlier than visualizing it. And sometimes the geometry (visible) relies upon immediately on an evaluation, as on this instance, the place a histogram robotically shall be displayed utilizing a bar geometry.

Distribution of GDP per capita (1) [image by author]

The creation of the histogram might be influenced by altering the variety of bins (through the parameter bins) and by utilizing totally different normalization algorithms. So we get an improved model by utilizing the next specification:

This code reveals once more, how effectively AoG separates the specification of the visualization (histGDPperCapita) from its “beautification” (within the name to draw) resulting in the next diagram:

Distribution of GDP per capita (2) [image by author]

Lastly we visualize the distribution of GDP per capita in every area utilizing field plots and violin plots. This may be achieved with the identical simplicity as above, since AoG affords particular geometries for each plot variants.

To be able to maximize the reuse of components, we first outline the information and the mappings for the distribution (distGDPperCapita) and add then the geometry (utilizing visible). As in all examples, the extra “beautification” can then be added utilizing ample parameters throughout the name to draw.

This code creates the next two diagrams:

Distribution of GDP per capita by area (1) [image by author]
Distribution of GDP per capita by area (2) [image by author]

Zooming in

Because the “most fascinating” half in each diagrams lies within the vary from 0 to 100,000 (on the y-axis), we wish to prohibit the plots to that vary (doing type of a zoom-in).

In AoG that is attainable utilizing the datalimits parameter for visible. However there appears to be one other bug in AoG, since this parameter has the specified impact solely when used on the violin plot, but it surely doesn’t change something when utilized to the field plot.

So utilizing the next specification …

violinRestricted = distGDPperCapita * 
visible(Violin; show_notch = true, datalimits = (0, 100000))

… we get this diagram:

Distribution of GDP per capita by area (3) [image by author]

As talked about above, the AoG-package is clearly the purest implementation of the Grammar of Graphics we’ve seen on this sequence. It actually separates mappings, geometries and so forth. into totally different building-blocks, which might then be mixed utilizing the * operator. It additionally separates clearly extra “ornamental” components (all of the issues we known as “beautification” above) from the visualization correct, thus making specs much more modular and giving us extra building-blocks which might be reused.

I believe it’s fairly regular for such a younger package deal to nonetheless have some tough edges, but it surely has actually a sound basis and appears fairly promising. In fact it was not attainable to point out all of the performance of AoG on this article. So please take a look on the documentation, if you wish to be taught extra about it. And final however not least additionally it is value studying concerning the philosophy underlying this method, which might be discovered here.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button