Welcome to the blog

Posts

My thoughts and ideas

Introduction to transition/transverion plots | Griffith Lab

Genomic Visualization and Interpretations

Introduction to transition/transverion plots

Transitions and transversions describe the mutation of a nucleotide from one base to another. Specifically transitions describe the mutation of nucleotides between purines or between pyrimidines and transversions from purines to pyrimidines or pyrimidines to purines. Often such events can give clues as to the origin of mutations within a sample. For example transverions are associated with ionizing radiation.

Transitions and Transversions by Krishnavedala with permission under CC BY-SA 4.0

Loading Data

Figure 1 of the manuscript “Recurrent somatic mutations affecting B-cell receptor signaling pathway genes in follicular lymphoma” plots the rate of each type of transition and transversion observed in a cohort of samples. In this section we will use the GenVisR function TvTi to create a version of this figure, shown below. The underlying data for this figure can be found in supplemental table S5 and is the same data used in the “Introduction to ggplot2” section. You can download the data here if you haven’t already.

This research was originally published in Blood. Krysiak et al. Blood 2017 129:473-483 by Krysiak et al. with permission under © the American Society of Hematology

Data Preparation

The GenVisR function TvTi() will calculate the rate of individual transitions and transversions within a cohort and plot this information as a stacked barchart. As with the waterfall() function TvTi() can accept multiple file types as specified by the parameter fileType. Here we will use the most flexible, an “MGI” file type which expects a data frame with column names “sample”, “reference”, and “variant”. Further we will re-assign the sample column to use the same sample names as in the manuscript figure. In order to match the manuscript figure we will also limit the data to just the discovery cohort and re-factor the data frame with the function factor() so samples without any data are not plotted.

# load the data into R
mutationData <- read.delim("ggplot2ExampleData.tsv")

# change the "Simple_name" column to "sample"
colnames(mutationData)[colnames(mutationData) %in% "sample"] <- "sample_1"
colnames(mutationData)[colnames(mutationData) %in% "Simple_name"] <- "sample"

# subset the data to just the discovery cohort
mutationData <- mutationData[mutationData$dataset == "discovery",]
mutationData$sample <- factor(mutationData$sample, levels=unique(mutationData$sample))

# run TvTi
TvTi(mutationData, fileType="MGI")

Plotting Frequencies with Proportions

You might notice from the previous plot that two samples look a bit off. Specifically the samples “LYM002-Naive” and “LYM005-Naive” are missing transitions and transversions. By default TvTi() plots only the proportion of transitions and transversions which can be deceiving if the frequency of mutation events is not high enough to get an accurate calculation of the relative proportion. We can use the parameter type which expects a string specifying either “proportion” or “frequency”, to instead plot the frequency of each mutation event. Let’s use this parameter two create two plots, one of frequency the other of proportion and combine them. To do this we will need to output our plots as grobs using out="grob", and use the function arrangeGrob() from the gridExtra package to combine the two plots into one. Finally we will need to call grid.draw() on the grob to output it to the current graphical device. We can see from the resulting plot that quite a few samples have low mutation frequencies and we should interpret these samples with caution.

# install and load gridExtra
install.packages("gridExtra")
library(gridExtra)
library(grid)

# make a frequency and proportion plot
tvtiFreqGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="frequency")
tvtiPropGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="proportion")

# combine the two plots
finalGrob <- arrangeGrob(tvtiFreqGrob, tvtiPropGrob, ncol=1)

# draw the plot
grid.draw(finalGrob)

Changing the order of samples

By default the order of the samples on the x-axis uses the levels of the sample column in the input data frame to determine an order, if that column is not of type factor it is coerced to one using the existing order of samples to define the levels within that factor. This behavior can be altered using parameter sort which will take a character vector specifying either “sample” or “tvti”. Specifying “sample” will order the x-axis alpha-numerically based on the sample names, conversely “tvti” will attempt to order the x-axis samples by decreasing proportions of each transition/transversion type. In the manuscript figure we can see that a custom sort order is being used. Let’s take advantage of the default ordering behavior in TvTi() and change the levels of the sample column to match the order in the manuscript figure.

# determine the exisitng order of samples
levels(mutationData$sample)

# create a custom sample order to match the manuscript
sampleOrder <- c("LYM145-Naive", "LYM023-Naive", "LYM045-Naive", "LYM238-Sorted", "LYM238-Naive", "LYM058-Naive", "LYM125-Naive", "LYM100-Naive", "LYM088-Naive", "LYM167-Naive", "LYM153-Naive", "LYM002-Naive", "LYM005-Naive", "LYM120-Naive", "LYM120-Treated", "LYM215-Sorted", "LYM215-Treated", "LYM139-Treated", "LYM177-Treated", "LYM202-Sorted", "LYM202-Treated", "LYM062-Treated", "LYM057-tNHL", "LYM001-tNHL", "LYM175-tNHL", "LYM193-tNHL", "LYM237-tNHL", "LYM013-tNHL")

# make sure all samples have been specified
all(mutationData$sample %in% sampleOrder)

# alter the sample levels of the input data
mutationData$sample <- factor(mutationData$sample, levels=sampleOrder)

# recreate the plot
tvtiFreqGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="frequency")
tvtiPropGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="proportion")
finalGrob <- arrangeGrob(tvtiFreqGrob, tvtiPropGrob, ncol=1)
grid.draw(finalGrob)

adding clinical data

As with many other GenVisR functions we can add a heatmap of clinical data via the parameter clinData which expects a data frame with column names “sample”, “variable”, and “value”. A subset of the clinical data is available to download here, go ahead and load it into R. The clinical data contains information for all samples in the experiment however we are only interested in the discovery cohort, the next step then is to subset our clinical data to only those samples in our main plot. From there we can use melt() to coerce the data frame into the required “long” format and add the clinical data to the proportions plot with the clinData parameter. In order to more closely match the manuscript figure we will also define a custom color pallette and set the number of columns in the legend with the parameters clinVarCol and clinLegCol respectively.

# read in the clinical data
clinicalData <- read.delim("FL_ClinicalData.tsv")

# subset just the discovery cohort
clinicalData <- clinicalData[clinicalData$Simple_name %in% mutationData$sample,]
all(sort(unique(as.character(clinicalData$Simple_name))) == sort(unique(as.character(mutationData$sample))))

# convert to long format
library(reshape2)
clinicalData <- melt(data=clinicalData, id.vars="Simple_name")
colnames(clinicalData) <- c("sample", "variable", "value")

# define a few clinical parameter inputs
clin_colors <- c('0'="lightblue",'1'='dodgerblue2','2'='blue1','3'='darkblue','4'='black','FL'='green3','t-NHL'='darkgreen','Treated'='darkred','TreatmentNaive'='red','NA'='lightgrey')

# add the clinical data to the plots
tvtiFreqGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="frequency")
tvtiPropGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="proportion", clinData=clinicalData, clinLegCol = 2, clinVarCol = clin_colors)
finalGrob <- arrangeGrob(tvtiFreqGrob, tvtiPropGrob, ncol=1, heights=c(2, 5))
grid.draw(finalGrob)

re-aligning plots

Up til this point we have glossed over what arrangeGrob() is actually doing however this is no longer a luxury we can afford. In the previous figure we can see that adding the clinical data to our plot has caused our proportion and frequency plots to go out of alignment. This has occurred because the y-axis text in the clinical plot takes up more space than in the frequency plot. Because we added clinical data with GenVisR to the proportion plot those graphics are aligned with each other however TvTi() has no idea the frequency plot exists, it’s added to our graphic after the fact with arrangeGrob(). In order to fix this issue we will need a basic understanding of “grobs”, “TableGrobs”, and “viewports”. First off a grob is just short for “grid graphical object” from the low-level graphics package grid; Think of it as a set of instructions for create a graphical object (i.e. a plot). The graphics library underneath all of ggplot2’s graphical elements are really composed of grob’s because ggplot2 uses grid underneath. A TableGrob is a class from the gtable package and provides an easier way to view and manipulate groups of grobs, it is actually the intermediary between ggplot2 and grid. A “viewport” is a graphics region for which a grob is assigned. We have already been telling GenVisR to output grobs instead of drawing a graphic because we have been using the arrangeGrob() function in gridExtra to arrange and display extra viewports in order to create a final plot. Let’s briefly look at what this actually means, we can show the layout of viewports with the function gtable_show_layout(). In the figure below we have overlayed this layout on to our plot and have shown the table grob to the right. In the table grob we can see that at the top level our grobs are compsed of 2 vertical elements and 1 horizontal elements. The column z corresponds to the order of plot layers, and cells corresponds to the viewport coordinates. So we can see that the grob listed in row 1 is plotted first and is located on the viewport spanning vertical elemnents 1-1 and horizontal elements 1-1 and thus on our layout is in the viewport (1, 1).

# load the gtable library
library(gtable)

# visualize the viewports of our plot
gtable_show_layout(finalGrob)

Pretty simple right? But not so fast the table grob in finalGrob is a little over simplified on the surface. It is true that our plot is composed of two final grobs however each grob is made up of lists of grobs in the table grob all the way down to the individual base elements which make up the plot. In the figure below we illustrate the second layer of these lists. To re-align the plots we’ll need to peel back the layers until we can acess the original grobs from ggplot2. We will then need to obtain the widths of each of these elements, find the max width with unit.pmax(), and re-assign the width of each as these grobs using the max width.

# find the layer of grobs widths we care about
proportionPlotWidth <- finalGrob$grobs[[2]]$grobs[[1]]$widths
clinicalPlotWidth <- finalGrob$grobs[[2]]$grobs[[2]]$widths
transitionPlotWidth <- finalGrob$grobs[[1]]$grobs[[1]]$widths

# find the max width of each of these
maxWidth <- unit.pmax(proportionPlotWidth, clinicalPlotWidth, transitionPlotWidth)

# re-assign the max widths
finalGrob$grobs[[2]]$grobs[[1]]$widths <- as.list(maxWidth)
finalGrob$grobs[[2]]$grobs[[2]]$widths <- as.list(maxWidth)
finalGrob$grobs[[1]]$grobs[[1]]$widths <- as.list(maxWidth)

# plot the plot
grid.draw(finalGrob)

adding additional layers

Our plot is getting getting close to looking like the one in the manuscript, there are a few final touches we should add however. Specifically let’s add in some vertical lines separating samples by “treatment-naive”, “treated”, and “t-NHL”. Let’s also remove the redundant x-axis text, we can do all this by adding additional ggplot2 layers to the various plots with the parameter layers and clinLayer.

# load ggplot2
library(ggplot2)

# create a layer for vertical lines
layer1 <- geom_vline(xintercept=c(14.5, 22.5))

# create a layer to remove redundant x-axis labels
layer2 <- theme(axis.text.x=element_blank(), axis.title.x=element_blank())

# create the plot
tvtiFreqGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="frequency", layers=list(layer1, layer2))
tvtiPropGrob <- TvTi(mutationData, fileType="MGI", out="grob", type="proportion", clinData=clinicalData, clinLegCol = 2, clinVarCol = clin_colors, layers=layer1, clinLayer=layer1)
finalGrob <- arrangeGrob(tvtiFreqGrob, tvtiPropGrob, ncol=1, heights=c(2, 5))
grid.draw(finalGrob)

Exercises

The plot above has been finished, however we recreated all the plots without taking the time to realign them. Try and aswer the questions below regarding our finalGrob, then try to fix the alignment. If you get stuck go over the “realigning plots” section again.

How many grob layers are in finalGrob?

Get a hint!

try and use finalgrobs$grobs to dig into layers until you reach an endpoint, remember finalgrobs$grobs returns lists.

Answer

There are six separate layers of grobs, here they are: finalGrob$grobs[[1]]$grobs[[1]]$grobs[[15]]$grobs[[1]]$grobs

Let’s practice extracting individual grob element from finalGrob, use grid.draw() to try and draw one of the legend titles.

Get a hint!

The grob names should give a hint regarding which element they correspond to, also remember that $grobs returns a list not the actual table grob.

Answer

Here is one solution, there are two others! grid.draw(finalGrob$grobs[[1]]$grobs[[1]]$grobs[[15]]$grobs[[1]]$grobs[[2]])

Finally let’s go ahead and fix the final version of our plot by realigning the grobs in finalGrob.

Get a hint!

You will have to go three layers deep to access the proper viewports!

Answer

The answer is actually the same code used in the realign plots section of the tutorial.

Introduction to waterfall plots | Griffith Lab

Genomic Visualization and Interpretations

Introduction to waterfall plots

After sequencing a set of samples a common question to ask is what mutations are present in the sample/patient cohort. This type of data is often displayed in a heatmap like structure with rows and columns denotating genes and samples with these variables ordered in a hierarchical pattern based on recurrence. Such plots make viewing mutually exclusive and co-occurring mutational events a trivial task. The waterfall() function from the GenVisR package makes it easy to create these types of charts while also allowing additional data in the context of sample and gene data to be added to the plot. In this tutorial we will use the waterfall() function to re-create panel C of figure 2 from the paper “A Phase I Trial of BKM120 (Buparlisib) in Combination with Fulvestrant in Postmenopausal Women with Estrogen Receptor-Positive Metastatic Breast Cancer.”.

Figure 2C by Ma et al. Clinical Cancer Research with permission under AACR copyright

Installing GenVisR and loading Data

To begin let’s install and load the GenVisR library from bioconductor. We will also need to load the mutation data from Supplemental Table 3, a version that has been converted to a tab delimited format is available for download here. We also need some supplemental information regarding the clinical data which is available here and mutation burden data available here.

# Install and load GenVisR
source("https://bioconductor.org/biocLite.R")
biocLite("GenVisR")
library(GenVisR)

# Set your working directory - change this to where you downloaded your files
setwd("~/Downloads")

# Load relevant data from the manuscript
mutationData <- read.delim("BKM120_Mutation_Data.tsv")
clinicalData <- read.delim("BKM120_Clinical.tsv")
mutationBurden <- read.delim("BKM120_MutationBurden.tsv")

Creating the initial plot

The waterfall() function is designed to work with specific file types read in as data frames, the default being MAF files, however the option exists to use custom file types as long as the column names “sample”, “gene”, and “variant_class” are present. This is done by setting fileType="Custom". Let’s go ahead and re-format our mutation data and create an initial plot using this parameter. Note that when using a custom file type the priority of mutation types must be specified with the variant_class_order parameter which accepts a character vector. We’ll explain what this does in the next section, for now just give it a character vector containing all mutation types present in mutationData.

# Reformat the mutation data for waterfall()
mutationData <- mutationData[,c("patient", "gene.name", "trv.type", "amino.acid.change")]
colnames(mutationData) <- c("sample", "gene", "variant_class", "amino.acid.change")

# Create a vector to save mutation priority order for plotting
mutation_priority <- as.character(unique(mutationData$variant_class))

# Create an initial plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutation_priority)

Setting the mutation hierarchy

As you can see from the previous plot, waterfall() plots mutations in a method similar to a heatmap, however the astute user may be asking what happens in cases where there are two different mutations types for the same cell in the heatmap. The answer is waterfall() picks one based on a mutation hierarchy. For a known file format such as MAF this hierarchy is pre set such that the more deleterious mutations are higher on the hierarchy. However, for a “custom” file type there is no apriori knowledge of what this order should be and thus must be set by the user. As mentioned, this is done with the variant_class_order parameter which accepts a character vector of all mutation types within the mutation data input in order of most to least deleterious. The effect this parameter has is best exemplified in the example below.

# Make sure seed is set to 426 to reproduce!
set.seed(426)

# Install and load the gridExtra package
#install.packages("gridExtra")
library(gridExtra)

# Create a data frame of random elements to plot
inputData <- data.frame(sample = sample(letters[1:5], 20, replace = TRUE), gene = sample(letters[1:5], 20, replace = TRUE), variant_class = sample(c("x", "y", "z"), 20, replace = TRUE))

# Choose the most deleterious to plot with y being defined as the most
# deleterious
most_deleterious <- c("y", "z", "x")

# Plot the data with waterfall using the 'Custom' parameter
p1 <- waterfall(inputData, fileType = "Custom", variant_class_order = most_deleterious, mainXlabel = TRUE, out="grob")

# Change the most deleterious order
p2 <- waterfall(inputData, fileType = "Custom", variant_class_order = rev(most_deleterious), mainXlabel = TRUE, out="grob")

# Arrange the two plots side by side
grid.arrange(p1, p2, ncol=2)

# Summarize the mutation types for a given sample/gene
inputData[inputData$sample=="e" & inputData$gene=="a",]

Notice that in the figure above, the top left cell (sample: e, gene: a) has mutations of two types (y and z). Between the two plots we reversed the hierarchy of the mutations specified in variant_class_order causing mutation of type “z” to have a higher precedence in the right most plot. As we would expect waterfall() then displays the color for mutation type “z” instead of “y” in this cell. Let’s go ahead and set a variant_class_order that makes sense for the breast cancer plot we’re working on. Remember this must be a character vector and contain all mutation types in the data frame mutationData.


# Define a mutation hierarchy
mutationHierarchy <- c("nonsense", "frame_shift_del", "frame_shift_ins", "in_frame_del", "splice_site_del", "splice_site", "missense", "splice_region", "rna")

# Create waterfall plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy)

Changing the color of tiles

Often it is desirable to change the colors of the plotted cells, either for purely aesthetic reasons, or to group similar mutation types. The waterfall() parameter which allows this is mainPalette which expects a character vector mapping mutation types to acceptable R colors. Let’s go ahead and match the colors in our plot to the one in the paper.

# define colours for all mutations
mutationColours <- c("nonsense"='#4f00A8', "frame_shift_del"='#A80100', "frame_shift_ins"='#CF5A59', "in_frame_del"='#ff9b34', "splice_site_del"='#750054', "splice_site"='#A80079', "missense"='#009933', "splice_region"='#ca66ae', "rna"='#888811')

# create waterfall plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy, mainPalette=mutationColours)

Adding a custom mutation burden

You might notice that while the mutation burden between the manuscript plot and our plot are very similar they are not exactly the same. The waterfall() function aproximates the mutation burden via the formula (# of mutations)/(coverage space) * 1,000,000 where the coverage space is controlled by the coverageSpace parameter and takes an integer giving the number of base pairs adequately covered in the experiment. This is only an aproximation as the coverage per sample can fluctuate, in situations such as this the user has the option of providing user defined mutation burden calculation for each sample for which to plot. This is done with the mutBurden parameter which takes a data frame with two columns, “sample” (which should match the samples in mutationData) and “mut_burden” (giving the actual value to plot). We’ve downloaded what the actual values are and stored them in the mutationBurden data frame we created above. You’ll notice the samples between mutationBurden and mutationData don’t quite match. Specifically it looks like an identifier “WU0” or “W00” has been added to each sample in the mutationBurden data frame. Let’s fix this using a regular expression with gsub and use these mutation burden values for our plot.


# Find which samples are not in the mutationBurden data frame
# First, let's look at the sample names in the mutationData and mutationBurden
mutationData$sample
mutationBurden$sample

# Now, determine the non-overlap between these values
sampleVec <- unique(mutationData$sample)
sampleVec[!sampleVec %in% mutationBurden$sample]

# Fix mutationBurden to match mutationData
mutationBurden$sample <- gsub("^WU(0)+", "", mutationBurden$sample)

# Check for non-overlap again
sampleVec[!sampleVec %in% mutationBurden$sample]

# Create the waterfall plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy, mainPalette=mutationColours, mutBurden=mutationBurden)

At this stage it is appropriate to talk about the subsetting functions within waterfall(). We could subset mutationData to include, for example, just some specific genes that we wish to plot. However, doing so would reduce the accuracy of the mutation burden top sub-plot if not using custom values. The waterfall() function contains a number of parameters for limiting the genes and mutations plotted without affecting the mutation burden calculation. These parameters are mainRecurCutoff, plotGenes, maxGenes and rmvSilent.

How would you create a waterfall plot showing only genes which are mutated in 15% of samples?

set the mainRecurCutoff parameter to .15

Adding clinical data

As stated previously waterfall() can display additional data in the bottom sub-plot. In order to do this a data frame in long format with column names “sample”, “variable”, “value” must be given to the parameter clinData. As with the mutBurden parameter, the samples in both data frames must match. Let’s go ahead and reproduce the clinical sub-plot from the manuscript figure. We will also use the clinLegCol, clinVarCol and clinVarOrder parameters to specify the number of columns, the colours, and the order of variables for the legend respectively. We also apply the section_heights parameter which takes a numeric vector providing the ratio of each verticle plot. In this situation we have a total of three verticle plots, the mutation burden, main, and clinical plot so we need a numeric vector of length 3.

# reformat clinical data to long format
library(reshape2)
clinicalData_2 <- clinicalData[,c(1,2,3,5)]
colnames(clinicalData_2) <- c("sample", "Months on Study", "Best Response", "Treatment Setting")
clinicalData_2 <- melt(data=clinicalData_2, id.vars=c("sample"))

# find which samples are not in the mutationBurden data frame
sampleVec <- unique(mutationData$sample)
sampleVec[!sampleVec %in% clinicalData$sample]

# fix mutationBurden to match mutationData
clinicalData_2$sample <- gsub("^WU(0)+", "", clinicalData_2$sample)

# create the waterfall plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy, mainPalette=mutationColours, mutBurden=mutationBurden, clinData=clinicalData_2, clinLegCol=3, clinVarCol=c('0-6'='#ccbadc', '6.1-12'='#9975b9', '12.1+'='#663096', 'Partial Response'='#c2ed67', 'Progressive Disease'='#E63A27', 'Stable Disease'='#e69127', '1'='#90ddee', '2'='#649aa6', '3+'='#486e77'), clinVarOrder=c('1', '2', '3+', 'Partial Response', 'Stable Disease', 'Progressive Disease', '0-6', '6.1-12', '12.1+'), section_heights=c(1, 5, 1))

Adding cell labels

You might have noticed when we subsetted our data frame at the begining of this tutorial we kept an extra column named “amino.acid.change”. This was done so that we could use this column to add cell labels to our plot as the authors of the manuscript figure did. In order to do this we only need specify the column in mutationData from which to pull labels from, we do this with mainLabelCol="amino.acid.change". Further we adjust the size of the labels so that they fit in the cells with mainLabelSize = 3. You can also change the angle of labels with the parameter mainLabelAngle which may be useful if the the cells on a plot are not perfectly square.

# create the waterfall plot
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy, mainPalette=mutationColours, mutBurden=mutationBurden, clinData=clinicalData_2, clinLegCol=3, clinVarCol=c('0-6'='#ccbadc', '6.1-12'='#9975b9', '12.1+'='#663096', 'Partial Response'='#c2ed67', 'Progressive Disease'='#E63A27', 'Stable Disease'='#e69127', '1'='#90ddee', '2'='#649aa6', '3+'='#486e77'), clinVarOrder=c('1', '2', '3+', 'Partial Response', 'Stable Disease', 'Progressive Disease', '0-6', '6.1-12', '12.1+'), section_heights=c(1, 5, 1), mainLabelCol="amino.acid.change", mainLabelSize = 3)


What is the most commonly mutated gene in this cohort?

PIK3CA is the most commonly mutated, in 50% (8/16) samples.

Are there any recurrently mutated amino acid positions in this gene?

Yes. E542K, H1047R, and E545K are each mutated twice.

If yes, are they known mutation hotspots? How would you determine this?

Yes. E542K, E545K and H1047R are all well known hotspots of mutation in PIK3CA, especially in human breast cancer. You could determine this by exploring PIK3CA mutation frequencies in COSMIC or ProteinPaint.

Re-arranging genes and samples

In some cases it may be desireable to forgo the default hierarchical ordering of cells in favor of a more custom approach. We can see in the authors original figure they elected to do this such that samples were grouped by months on study and gene were grouped by pathway. We can achieve this with the parameters geneOrder and sampOrder both of which take a character vector giving the order in which to plot row and column cells respectively.

# Create a sample ordering
sample_ordering <- c("19", "5", "31", "22", "12", "2", "32", "8", "28", "18", "4", "24", "23", "17", "11", "14")

# Create a gene ordering
gene_ordering <- c("CDH1", "MALAT1", "RUNX1", "NCOR1", "GATA3", "FOXA1", "ESR1", "CBFB", "TBX3", "TAB1", "MED12", "XBP1", "TP53", "RB1CC1", "BRCA2", "ATM", "SMARCD1", "MLL3", "MLL2", "ARID1A", "FBXW7", "CAV1", "MAP3K1", "MAP2K4", "NOTCH4", "PDGFRA", "ERBB3", "ERBB2", "RELN", "MAGI3", "MTOR", "AKT2", "AKT1", "PTEN", "PIK3CA")

# Create a gene ordering
waterfall(mutationData, fileType = "Custom", variant_class_order=mutationHierarchy, mainPalette=mutationColours, mutBurden=mutationBurden, clinData=clinicalData_2, clinLegCol=3, clinVarCol=c('0-6'='#ccbadc', '6.1-12'='#9975b9', '12.1+'='#663096', 'Partial Response'='#c2ed67', 'Progressive Disease'='#E63A27', 'Stable Disease'='#e69127', '1'='#90ddee', '2'='#649aa6', '3+'='#486e77'), clinVarOrder=c('1', '2', '3+', 'Partial Response', 'Stable Disease', 'Progressive Disease', '0-6', '6.1-12', '12.1+'), section_heights=c(1, 5, 1), mainLabelCol="amino.acid.change", mainLabelSize=3, sampOrder=sample_ordering, geneOrder=gene_ordering)