Plotting with R: geom_violin with color based on two column values (using GGPLOT2)

In this post i am trying to explain how to color code the ggplot plots automatically when the number of data points is varying. I did this when I had an issue with lack of contrast between the default colors.

I am using the sample data as below



Load the data into R.
The idea is to have 'ID' and 'SUB ID' in X axis and 'Value'  in Y axis and to have different fill colors based on 'ID'.

1. One way to do is to use 'facet_grid' 


I am calling the the table as 'tbl'.
ggplot(tbl, aes(x=SUB.ID, y=Value )) + geom_violin(aes(fill = ID)) + facet_grid(. ~ tbl$ID)

and the result is,






2. But if you want to do something like what you do in JMP, both ID and Sub ID at bottom of the plot then there is a method (long shot).

First create a a new column by using 'paste'.

tbl$ID_SUB_ID <- paste(tbl$ID,tbl$SUB.ID, sep = "_")




Then plot using X=ID_SUB_ID

ggplot(tbl, aes(x=ID_SUB_ID, y=Value )) + geom_violin(aes(fill = ID))




Lets define the colors manually (based on ID).
This is the good option when you doing this automatically and when the number of IDs are varying every time.


The idea is to have a dark and light colors alternatively.
For example. Brown, beige, darkolivegreen, khaki1, midnightblue,magenta, seagreen4, papayawhip


Important to note  that "Aesthetics must be either length 1 or the same as the data". In other words colors need to be defined to all the rows.


Lets create a data.frame with color names.

com <- data.frame(c('brown', 'beige', 'darkolivegreen', 'khaki1', 'midnightblue','magenta', 'seagreen4', 'papayawhip'))
colnames(com)[1] <- 'color'



Lets create an serial number (index) to the table.
com$index <- seq.int(nrow(com))




As the color is based on ID the colors need to be matched against unique ids.

uID <- data.frame(unique(tbl$ID, incomparables = FALSE))
colnames(uID)[1] <- 'ID'

Lets create an serial number (index) to the table.
uID$index <- seq.int(nrow(uID))



Merge the tables to have colors matched to IDs.

ID_color <- merge(uID,com, by = 'index', all.x = TRUE)



Now merge the data table and the color table by ID.



Now plot again..

ggplot(tblc, aes(x=ID_SUB_ID, y=Value)) + geom_violin(aes(fill = color))



Lets remove the legend as it is not what we want.

ggplot(tblc, aes(x=ID_SUB_ID, y=Value)) + geom_violin(aes(fill = color)) + guides(fill=FALSE)



That's it.
I will discuss how to beautify the plot in another post.