B Reach for the Stars
Needed packages
library(dplyr)
library(ggplot2)
library(knitr)
library(dygraphs)
library(nycflights13)
B.1 Sorted barplots
Building upon the example in Section 2.8:
flights_table <- table(flights$carrier)
flights_table
9E AA AS B6 DL EV F9 FL HA MQ OO UA US
18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536
VX WN YV
5162 12275 601
We can sort this table from highest to lowest counts by using the sort
function:
sorted_flights <- sort(flights_table, decreasing = TRUE)
names(sorted_flights)
[1] "UA" "B6" "EV" "DL" "AA" "MQ" "US" "9E" "WN" "VX" "FL" "AS" "F9" "YV" "HA"
[16] "OO"
It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like "How many airlines have more departing flights than Southwest Airlines?".
We can use the sorted table giving the number of flights defined as sorted_flights
to reorder the carrier
.
ggplot(data = flights, mapping = aes(x = carrier)) +
geom_bar() +
scale_x_discrete(limits = names(sorted_flights))
The last addition here specifies the values of the horizontal x
axis on a discrete scale to correspond to those given by the entries of sorted_flights
.
B.2 Interactive graphics
B.2.1 Interactive linegraphs
Another useful tool for viewing linegraphs such as this is the dygraph
function in the dygraphs
package in combination with the dyRangeSelector
function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:
library(dygraphs)
flights_day <- mutate(flights, date = as.Date(time_hour))
flights_summarized <- flights_day %>%
group_by(date) %>%
summarize(median_arr_delay = median(arr_delay, na.rm = TRUE))
rownames(flights_summarized) <- flights_summarized$date
flights_summarized <- select(flights_summarized, -date)
dyRangeSelector(dygraph(flights_summarized))
The syntax here is a little different than what we have covered so far. The dygraph
function is expecting for the dates to be given as the rownames
of the object. We then remove the date
variable from the flights_summarized
data frame since it is accounted for in the rownames
. Lastly, we run the dygraph
function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector
. (Note that this plot will only be interactive in the HTML version of this book.)
References
Robbins, Naomi. 2013. Creating More Effective Graphs. Chart House.