Advantages | Description |
---|---|
Enhanced Exploration | Allows direct interaction, zooming, filtering, and dynamic parameter changes. |
Deeper Understanding | Provides context, tooltips, and additional information on demand. |
Iterative Analysis | Supports quick iteration through different views, accelerating the EDA process. |
Communication & Collaboration | Engages stakeholders, facilitates clearer explanations, and fosters better discussions. |
Complex Data Representation | Helps in visualizing multidimensional data in a more understandable manner. |
User-Centric Exploration | Puts users in control, allowing personalized insights and exploration. |
While R and packages like tidyverse provide powerful tools for data manipulation, visualization, and modeling, interactive data visualization adds another layer of exploration and understanding. I've highlighted several advantages in the context of exploratory data analysis (EDA) and data science workflows above.
Historically when researchers attempt to convey their findings, they often opt to use a different tools or programming languages like JavaScript to create interactive web visuals, showcasing their key discoveries. However, this shift demands a significant change in context, necessitating an entirely different skill set, which often hinders efficiency. Not only does one have to learn a completely different programming language, the learning curve for certain interactive visualization libraries is another challege. While modern interactive tools such as Tableu offer interactivity and offer nice properties, they often do not gel with a coding workflow since such GUI based systems tend to be rather "closed off" that means they don’t allow themselves to be easily customized, extended, or integrated with another system.
I previously covered the grammar of graphics and GGplot in full detail, additionally, there are several resources available for a gentle introduction to the GGplot framework. Throughout this document, I will be implementing interactivity within the GGplot framework.
# Simulate Height and Weight Data with R misty::descript(starwars) # Simulate Height height < runif(100,120,220) # Simulate Weight weight < 50 + 0.5 * height + 0.7 * (height^1.5) + rnorm(100, sd = 300) # Rescale final_weight < (weight - mean(weight))*(40/sd(weight)) + 150
Highlighting observations is a simple way to add interactivity to a plot. In the example below, I've highlighted a certain segment of observations in the dataset.
# Highlighting Data library(plotly) key < highlight_key(dat) p1 < ggplot(key, aes(height, final_weight)) + geom_point() inter_p1 < highlight(ggplotly(p1), "plotly_selected")
Above we can see a simple example of how a lasso or selection tool can be used within an interactive framework to highlight certain observations of interest and save them as an image. Allowing users such interactivity, however simplified, such as in this case, adds a new dimension to visualization.
When a key insight surfaces, the HTML-generated figures can be easily shared via email or embedded in reports/websites. These interactive visuals, using htmlwidgets, seamlessly work in RMarkdown, Shiny apps, RStudio, Jupyter, etc. Sharing fosters discussions, allowing colleagues to offer fresh perspectives and even glean immediate insights from the graphics.
Another way to add interactivity is to annotate points of interest. In the example below, I've annotated the points of interest with their respective domain. Here the use of the ggforce package is demonstrated to annotate certain points into a hull.
library(ggforce) dat %>% mutate(tall = ifelse(height > 186, 'tall', 'x')) %>% ggplot(aes(height, final_weight)) + geom_point(color = 'black', size = 3, shape = 21, fill = rgb(0,0,0,alpha = 0.5), stroke = 1.1,aes(alpha = tall), show.legend = F) + geom_mark_hull(aes(filter = tall == 'tall', label = tall)) + labs(title = "Weights for Tall People", \t x = "Height", \t y = "Weight") + papaja::theme_apa() + scale_alpha_manual(values = c(1,0.2))
Plotly is a library for data visualization, which offers an array of interactive plots like scatter plots, line charts, and 3D visuals. It stands out for its interactivity, allowing users to explore data within the plots themselves by zooming, hovering for details, and toggling data visibility. Compatible with Jupyter, Dash, Shiny, JS and integration with Pandas and NumPy, Plotly is a versatile tool for creating dynamic and engaging visualizations.
iframe
tag or directly insert the HTML code to display the Plotly visualization.There are two main ways to create a Plotly object:
ggplotly()
plot_ly()
, plot_geo()
, or plot_mapbox()
.Both approaches have somewhat complementary strengths and weaknesses, so it can be beneficial to learn both methods.
The Plotly package in R creates plots in R using the underlying library in JavaScript (plotly.js). The plot_ly() function has a direct connection to plotly.js, providing extra simplifications that enhance the plotting procedure. These simplifications, inspired by the Grammar of Graphics and ggplot2, notably expedite the shift between various visuals, making it easier to uncover valuable data insights.
# Bar Graphs plotly_1 <- diamonds %>% plot_ly() %>% add_histogram(x=~cut) plotly_2 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1))) plotly_3 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1)), stroke = I(rgb(0,0,0,0.2)),span = I(15)) plotly_4 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1)), stroke = I(rgb(0,0,0,0.2)),span = I(15)) %>% layout(title = 'My Graph') # Bar vs Histogram in Plotly hist <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, stroke = I('black'), span = I(2),bingroup = I(5)) bars <- diamonds %>% group_by(cut) %>% count %>% plot_ly() %>% add_bars(x = ~cut, y = ~n, stroke = I('black'), span = I(5)) x <- plotly_json(plotly_1) build <- plotly_build(plotly_1)
The above code demonstrates creating bar graphs using plot_ly() with various configurations such as different colors, stroke settings, spans, and layout adjustments.
Plotly adopts a functional approach akin to the layered grammar of graphics. Here, most functions expect a plotly object as input and yield a modified version of it. These modifications solely depend on the function's input values, unlike base R graphics that often involve side effects. For instance, the layout() function alters layout components like the title within a plotly object.
Complex plot modifications in plotly can be challenging to navigate. The %>% operator from magrittr offers a left-to-right reading sequence, placing the object on the left into the first argument of the function on the right. This approach simplifies the understanding of layered modifications (Wickham, 2014).
diamonds %>% group_by(cut) %>% count() %>% plot_ly(x = ~cut, y = ~n) %>% add_bars(, stroke = I('black'), span = I(5), color = I(rgb(1,0,0,0.5))) %>% add_text(text = ~scales::comma(n), y = ~n, textposition = "top middle", cliponaxis = FALSE)
The above code demonstrates how to put things together in plotly. The code is self-explanatory, however, in this example, we add multiple layers to a plotly graph, a text layer in addition to an histogram. The viewers have the option to turn off certain layers in the interactive component. Additionally, we combine the layers using magrittr piping.
Plotly layers are the building blocks of a plotly object. The add_*() functions, such as add_histogram(), add_lines(), add_markers(), etc., define how data is rendered into geometric objects, following the layered grammar of graphics. A layer, in this context, comprises five components: data, aesthetic mappings (e.g., color), geometric representation (e.g., rectangles), statistical transformations (e.g., sum), and positional adjustments (e.g., dodge).
Plotly layout is a collection of attributes that define the plot's appearance, such as the title, axis labels, and background color. The layout() function modifies the layout components of a plotly object. The layout() function is a wrapper for the layout attributes, which are defined in the plotly.js schema.
The ggplotly() function in plotly translates ggplot2 visuals into interactive plotly ones, simplifying the addition of interactivity to ggplot2 workflows. This function leverages ggplot2's intuitive interface for effortless exploration of statistical summaries across different groups using various geoms.
It efficiently handles facets, relative frequency displays, and seamlessly supports extensions like ggforce, naniar, and GGally, enhancing the visualization capabilities. It showcases different ways to visualize data, leveraging geom_sina(), stat_summary(), and ggplot2's strengths, facilitating in-depth analysis and model diagnostics. ggplotly() empowers the visualization toolkit by enabling interactive features like hover, zoom, and filtering, enhancing exploratory analysis, and allowing linking multiple views. Despite not always yielding perfect conversions, modifying ggplotly()'s return values or customizing tooltips can enhance interactivity, covered in later chapters for advanced modifications.
Plot <- |GGplot Object| ggplotly(Plot) # Example ggplotly(ggplot(diamonds, aes(x = carat, y = price, color = cut)) + geom_point())
In Plotly, a "trace" refers to a visual representation of data on a plot. Each trace represents a specific set of data points and contains information about how that data should be displayed. Traces can include scatter plots, lines, bars, histograms, heatmaps, and more.
Key Points | Description |
---|---|
Type of Data | Traces encapsulate specific types of data visualizations, such as scatter points, lines, bars, or other chart elements. |
Attributes | Each trace has its own set of attributes defining appearance and behavior, including settings for markers, lines, colors, labels, etc. |
Multiple Traces | Allows having multiple traces on a single plot, facilitating visualization and comparison of different datasets or aspects of the same dataset. |
Plot Composition | Traces are combined to create a complete plot. Each trace is added to the plot layout and configured individually, collectively representing the entire dataset or multiple datasets. |
plot_ly() %>% add_trace( type = "scatter", mode = "markers+lines", x = 4:6, y = 4:6 ) \br plot_ly() %>% add_trace( type = "scatter", mode = "markers", x = 4:6, y = 4:6 ) %>% add_trace(type = "scatter", mode = "lines", x = 4:6, y = 4:6)
The first block of code adds a single trace to the plot. It creates a scatter plot where markers and lines are both visible (mode = "markers+lines"), and it uses the values 4, 5, and 6 for both x and y axes. In the second block, we employ the two traces separately, allowing users to interactively manipulate each individual trace.
Plotly R Library Basic Charts | |||
---|---|---|---|
Plotly's R graphing library makes interactive, publication-quality graphs online. Examples of how to make basic charts. | |||
Scatter and Line Plots Visualize relationships between numeric variables |
Bar Charts Compare data across different categories |
Bubble Charts Display three dimensions of data |
WebGL vs SVG in R Comparing rendering techniques in R |
Filled Area Plots Display data and emphasize trends |
Horizontal Bar Charts Present data horizontally |
Gantt Charts Visualize project schedules and timelines |
Sunburst Charts Hierarchical data visualization |
Pie Charts Show proportions of a whole |
Tables Display structured data |
Dot Plots Represent distributions of data |
Dumbbell Plots Visualize changes between two points |
Sankey Diagram Show flow or relationships between entities |
Treemap Charts Represent hierarchical data |
In a scatter plot, markers refer to the individual data points displayed on the chart. Each data point is represented by a marker, which is a visual element such as a dot, symbol, or shape that signifies the value of the data point on both the x and y axes.
# Basic Scatterplot \br p_1 <- dat %>% ggplot(aes(height, final_weight)) + geom_point(size = 4, shape = 21, stroke = 1.1, color = 'black', fill = rgb(0,0,0,0.5)) + papaja::theme_apa() + labs(title = 'Height vs Weight', y = 'Weight', x = 'Height') \br ggplotly(p_1)
dat %>% plot_ly(x = ~height, y = ~final_weight, mode = "markers", type = "scatter", name = "Scatter Points") %>% add_trace(x = ~height, y = ~final_weight, mode = "lines", type = "scatter", name = "Smoothing Line") %>% layout(title = "Scatter Plot with Smoothing Line")
Below, is a list of some common marker attributes that can be customized in Plotly:
Attribute | Description |
---|---|
size | Sets the marker size. |
color | Determines the marker color. |
symbol | Specifies the marker symbol type. |
opacity | Sets the marker opacity. |
line | Defines the marker's border line properties. |
gradient | Specifies the marker color scale gradient. |
sizeref | Sets the scale factor for marker size. |
sizemode | Specifies how the marker size is determined. |
showlegend | Determines whether the marker appears in the legend. |
p1 <- plot_ly(type = "scatter") %>% add_trace(mode = "markers",x=1:2,y=rep(1,2), marker = list(size = 20) ,name = "Default Marker") %>% add_trace(x=1:2,y=rep(2,2),mode = "markers", marker = list(size = 30, color = "red"), name = "Custom Size & Color") %>% add_trace(x=1:2,y=rep(3,2),mode = "markers", marker = list(size = 20,symbol = "square"), name = "Square Markers") %>% add_trace(x=1:2,y=rep(4,2),mode = "markers", marker = list(size = 20,symbol = "triangle-up", color = "green"), name = "Triangle Markers") %>% add_trace(x=1:2,y=rep(5,2),mode = "markers", marker = list(size = 20,symbol = "diamond", opacity = 0.7), name = "Custom Opacity") %>% add_trace(x=1:2,y=rep(6,2),mode = "markers", marker = list(size = 20, opacity = 1, line = list(color = 'black',width = 5)), name = "border") %>% layout(title = "Custom Marker Styles in Plotly", xaxis = list(range = c(0, 4)))
Grouped scatterplots are a great way to visualize the relationship between two variables across different groups. In the example below, we can see how the relationship between height and weight varies across different groups.
p1 <- starwars %>% filter(!mass>300) %>% ggplot(aes(height,mass)) + geom_smooth(method = 'lm',se = F) + papaja::theme_apa() int1 <- p1 + geom_point(color = 'black', fill = 'red', shape = 21, size = 3.5, alpha = 0.8) int2 <- p1 + geom_point(data = starwars %>% drop_na(),aes(fill = sex), color = 'black', shape = 21, size = 3.5, alpha = 0.8) + theme(legend.position = c(0.15,0.85)) int3 <- p1 + geom_point(data = starwars %>% drop_na(),aes(shape = sex), color = 'black', size = 3.5, alpha = 0.8) + theme(legend.position = c(0.15,0.85)) int_plot <- ggplotly(int3)
starwars %>% plot_ly() %>% add_trace(type = 'scatter', mode = 'markers', x = ~height, y = ~mass, color = ~sex,colors = col, marker = list(size = 20, opacity = 0.5, line = list(color = 'black', width = 2)))
Line plots, also known as line charts or line graphs, are a type of data visualization that displays information as a series of data points connected by straight lines. They are particularly useful for showing trends and relationships between continuous data points over a continuous interval or time period.
economics %>% ggplot(aes(date, unemploy)) + geom_line() + papaja::theme_apa() int_plot2 <- (economics %>% arrange(psavert) %>% mutate(decade = 10 * (year(date) %/% 10), decade = factor(decade)) %>% ggplot(aes(date, unemploy, group = decade, color = decade)) + geom_line() + papaja::theme_apa() )%>% ggplotly()
Above, we can see a simple demonstration of how interactivity with lineplots can provide us with a lot more information. We can see from the above static plot, that unemployment rate changes over time, however, the change is non-linear and in part, a function of the rising population and economic conditions. With some minor tweaks, we can add a certain amount of interactivity, where users can toggle unemployment rates by decade, offering an entirely new flavor to the visualization.
If we wish to identify, yearly trends in unemployment rates by month, it would be rather tedious to examine the entire plot, or a facetted plot. Alternatively, we could use our newfound skills and interactively present the plot allowing users/viewers to manipulate certain facets and identify more meaningful trends.
int_plot3 <- (econ %>% ggplot(aes(mnth, unemploy, col = ordered(yr))) + geom_line(show.legend = F) + scale_x_continuous(breaks = 1:12,labels = month.abb) + papaja::theme_apa()) %>% ggplotly()
Similarly, we can use the ggplot framework to produce all kinds of plots with an interactive twist.
Bar charts are a type of data visualization that are used to display and compare the number, frequency or other measure (e.g., mean) for different discrete categories or groups. The bars can be either vertical (sometimes called a column graph) or horizontal. The height or length of the bar is proportional to the number of observations or frequency.
The add_bars() and add_histogram() functions in Plotly.js encapsulate the 'bar' and 'histogram' trace types, respectively. They differ primarily in their handling of data: add_bars() requires both x and y values for bar heights, while add_histogram() needs only a single variable, letting Plotly.js handle binning in the browser. Despite their similar usage, both can visualize either numeric or discrete variables, differing in where the binning occurs.
add_bar(p, x = NULL, y = NULL, text = NULL, hovertext = NULL, hoverinfo = NULL, hoverlabel = NULL, marker = NULL, opacity = NULL, base = NULL, width = NULL, x0 = NULL, dx = NULL, y0 = NULL, dy = NULL, orientation = NULL, name = NULL, error_y = NULL, error_x = NULL, ids = NULL, showlegend = NULL, legendgroup = NULL, stackgroup = NULL, alignmentgroup = NULL, offsetgroup = NULL, customdata = NULL, xaxis = NULL, yaxis = NULL, frame = NULL, connectgaps = NULL, visible = NULL, ...)
p
: A plotly object.x
: Sets the x coordinates.y
: Sets the y coordinates.text
: Sets text elements associated with each (x,y) pair.hovertext
: Sets text elements associated with each (x,y) pair to appear on hover.hoverinfo
: Determines which trace information appears on hover.hoverlabel
: Sets the hover label properties.marker
: Sets the marker properties (color, size, etc.).opacity
: Sets the opacity of the bars.base
: Sets the base of the bars.width
: Sets the bar width.x0
: Alternate to x
.dx
: Sets the x coordinate step.y0
: Alternate to y
.dy
: Sets the y coordinate step.orientation
: Sets the orientation of the bars ('v' for vertical, 'h' for horizontal).name
: Sets the trace name.error_y
: Sets the y-axis error bars.error_x
: Sets the x-axis error bars.ids
: Assigns id labels to each datum.showlegend
: Determines whether or not an item corresponding to this trace is shown in the legend.legendgroup
: Sets the legend group for this trace.stackgroup
: Sets the stack group.alignmentgroup
: Sets the alignment group.offsetgroup
: Sets the offset group.customdata
: Assigns extra data to each datum.xaxis
: Sets a reference to a named x axis.yaxis
: Sets a reference to a named y axis.frame
: Sets the frame.connectgaps
: Determines whether or not gaps in the provided data arrays are connected.visible
: Determines whether or not this trace is visible.d_1 <- ISLR2::Auto d_1 %>% summarise(displacement = mean(displacement), .by = cylinders) %>% mutate(cylinders = factor(cylinders)) %>% plot_ly(x = ~cylinders, y = ~displacement, type = 'bar')
Creating Simple Histograms is a simple extension using the plotly framework.
p1 <- plot_ly(diamonds, x = ~price) %>% add_histogram(name = "plotly.js")
The add_histogram() function sends all of the ob- served values to the browser and lets plotly.js perform the binning.
p1 <- plot_ly(diamonds, x = ~cut) %>% add_histogram() p2 <- diamonds %>% count(cut) %>% plot_ly(x = ~cut, y = ~n) %>% add_bars() subplot(p1,p2)
diamonds %>% group_split(cut) %>% map(~.x %>% plot_ly(x = ~price,type = "histogram",name = paste0("Cut = ", unique(.$cut)))) %>% subplot(nrows = 2, shareX = TRUE,shareY = T ,titleX = FALSE) %>% hide_legend()
The above code splits the datasets into lists which further allows us to map a certain histogram function on each subset of the data. The functionality of "subplot" further allows you to comine all the plot lists into one with a shared x and y axis.
The easiest way to produce grouped bar plots is similar to ggplot. Where we can group or color by a discrete variable.
plot_ly(diamonds, x = ~cut, color = ~clarity) %>% add_histogram()
The layout argument allows you to specify the type of grouping. Several groupings are available in plotly such as -
plot_ly(diamonds, x = ~cut, color = ~clarity) %>% add_histogram() %>% layout(barmode = "stack")
Box plots and violin plots are used to visualize the distribution of data and compare the distribution of data between different groups or categories. Box plots display the median, quartiles, and outliers of a dataset, while violin plots provide a more detailed view of the data distribution by showing the probability density of the data at different values.
p1 <- ggplot(diamonds, aes(x = cut, y = price)) + geom_boxplot() ggplotly(p1)
p1 <- plot_ly(diamonds, x = ~cut, y = ~price, type = "box")
Additionally, we can easily produce violin plots, boxplots, and other combinations as follows -
p1 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% ggplot(aes(x = carat, y = price)) + geom_boxplot() ggplotly(p1) p2 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% plot_ly(x = ~carat, y = ~price, type = 'box') p3 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% plot_ly(x = ~carat, y = ~price, type = 'violin') p4 <- plot_ly(diamonds, x = ~price, y = ~interaction(clarity, cut)) %>% add_boxplot(color = ~clarity) %>% layout(yaxis = list(title = ""))
2D frequency plots are used to visualize the relationship between two categorical variables. They are particularly useful for identifying patterns, trends, and associations between the variables. In Plotly, 2D frequency plots can be created using the add_heatmap() function, which displays the frequency of occurrences of each combination of categories in a grid format.
p <- plot_ly(diamonds, x = ~log(carat), y = ~log(price)) subplot( add_histogram2d(p) %>% colorbar(title = "default") %>% layout(xaxis = list(title = "default")) %>% add_histogram2d(p, zsmooth = "best") %>% colorbar(title = "zsmooth") %>% layout(xaxis = list(title = "zsmooth")), add_histogram2d(p, nbinsx = 60, nbinsy = 60) %>% colorbar(title = "nbins") %>% layout(xaxis = list(title = "nbins")), shareY = TRUE, titleX = TRUE )
The plotly package offers two functions for displaying rectangular bins: add_heatmap() and add_histogram2d(). For numeric data, add_heatmap() is a 2D analog of add_bars() and requires pre-computed bins, while add_histogram2d() is a 2D analog of add_histogram() and computes bins in the browser, making it more suitable for exploratory purposes. add_histogram2d() features zsmooth for increasing bin numbers through bi-linear interpolation, and nbinsx/nbinsy to set the number of bins in the x and y directions. Additionally, filled contours can be used instead of bins with add_histogram2dcontour().
Heatmaps and contour plots are used to visualize the distribution of data across two dimensions. Heatmaps display data as a grid of colored cells, where the color intensity represents the value of the data at each cell. Contour plots, on the other hand, display data as a series of contour lines, where each line represents a constant value of the data.
# Heatmaps p1 <- plot_ly(diamonds, x = ~cut, y = ~color, z = ~price) %>% add_heatmap() # Contours p2 <- plot_ly(diamonds, x = ~cut, y = ~color, z = ~price) %>% add_histogram2dcontour() g1 <- subplot(p1,p2,nrows = 2)
Heatmaps and contour plots are useful for visualizing the distribution of data across two dimensions. They are particularly useful for identifying patterns, trends, and associations between variables. In Plotly, heatmaps and contour plots can be created using the add_heatmap() and add_histogram2dcontour() functions, respectively.
3D plots are used to visualize data in three dimensions. They are particularly useful for visualizing complex relationships between multiple variables. In Plotly, 3D plots can be created using the add_trace() function
As it turns out, by simply adding a z attribute plot_ly() automatically renders markers, lines, and paths in three dimensions.
p1 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~depth) %>% add_markers(size = 0.5) p2 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~cut) %>% add_paths(color = ~cut) p3 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~cut) %>% add_lines(color = ~cut) x <- seq_len(nrow(volcano)) + 100 y <- seq_len(ncol(volcano)) + 500 p4 <- plot_ly() %>% add_surface(x = ~x, y = ~y, z = ~volcano) subplot(p1,p2,p3,p4,nrows = 2)
Plotly offers various ways to create maps, broadly categorized into integrated and custom maps. Integrated maps leverage Plotly's built-in support via Mapbox or d3.js, suitable for quick visualizations without sophisticated geo-spatial representations. Custom maps offer full control over rendering geo-spatial objects, ideal for more complex visualizations.
Plotly supports two types of integrated maps:
plot_mapbox()
for creating maps with dynamic basemaps.plot_geo()
for maps with different projections.Argument | Description |
---|---|
layout.mapbox.style |
Controls the styling of the Mapbox basemap. Examples: "streets", "satellite", "dark". |
layout.updatemenus |
Creates a dropdown menu to control map styles interactively. |
projection |
Used in plot_geo() to specify the type of map projection. Examples: "mercator", "orthographic". |
plot_mapbox(maps::canada.cities) %>% add_markers( x = ~long, y = ~lat, size = ~pop, color = ~country.etc, colors = "Accent", text = ~paste(name, pop), hoverinfo = "text" )
geo <- list( projection = list(type = 'orthographic', rotation = list(lon = -100, lat = 40, roll = 0)), showland = TRUE, landcolor = toRGB("gray95"), countrycolor = toRGB("gray80") ) plot_geo(color = I("red")) %>% add_markers(data = air, x = ~long, y = ~lat, text = ~airport, size = ~cnt, hoverinfo = "text", alpha = 0.5) %>% add_segments(data = group_by(flights, id), x = ~start_lon, xend = ~end_lon, y = ~start_lat, yend = ~end_lat, alpha = 0.3, size = I(1), hoverinfo = "none") %>% layout(geo = geo, showlegend = FALSE)
density <- state.x77[, "Population"] / state.x77[, "Area"]
g <- list(scope = 'usa', projection = list(type = 'albers usa'), lakecolor = toRGB('white'))
plot_geo() %>%
add_trace(z = ~density, text = state.name, locations = state.abb, locationmode = 'USA-states') %>%
layout(geo = g)
The sf
R package allows for creating custom maps with more control over geo-spatial data. Here's how to create a simple map with sf
:
library(rnaturalearth)
world <- ne_countries(returnclass = "sf")
plot_ly(world, color = I("gray90"), stroke = I("black"), span = I(1))
library(cartogram) us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014") plot_ly(us_cont) %>% add_sf(color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills") %>% layout(showlegend = FALSE) %>% colorbar(title = "Population \n 2014")
us_dor <- cartogram_dorling(us, "pop_2014")
plot_ly(stroke = I("black"), span = I(1)) %>%
add_sf(data = us, color = I("gray95"), hoverinfo = "none") %>%
add_sf(data = us_dor, color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills") %>%
layout(showlegend = FALSE)
You can save any widget created using htmlwidgets packages (e.g., plotly, leaflet, DT) as a standalone HTML file using the htmlwidgets::saveWidget() function. By default, this function generates a completely self-contained HTML file, which includes all necessary JavaScript and CSS dependencies. This makes it convenient to share the widget as a single HTML file. To optimize the file size, consider using the partial_bundle() function. This function automatically creates a reduced version of the necessary dependencies, significantly reducing the overall file size, especially when using basic chart types.
p1 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~depth) %>% add_markers(size = 0.5) saveWidget(p1, "plot1.html")
By default, the saveWidget() function generates a completely self-contained HTML file, which includes all necessary JavaScript and CSS dependencies. This makes it convenient to share the widget as a single HTML file. To optimize the file size, consider using the partial_bundle() function. This function automatically creates a reduced version of the necessary dependencies, significantly reducing the overall file size, especially when using basic chart types.
To Include the HTML file in RMarkdown, you can use the following code template -
```{r} htmltools::tags$iframe( src = "p1.html", scrolling = "no", seamless = "seamless", frameBorder = "0" ) ```
Alternatively, you can use the includeHTML() function to embed the HTML file directly into the RMarkdown document.
he subplot() function in plotly offers a versatile way to combine multiple plotly objects into a single object, surpassing the flexibility of trellis display frameworks like ggplot2's facet_wrap(). Unlike these frameworks, subplot() does not require conditioning on a common variable. Its functionality is comparable to the grid.arrange() function from the gridExtra package, which arranges multiple ggplot2 or lattice plots in a single view. The basic use of subplot() involves directly supplying plotly objects. For handling many plots, passing a list of plots can reduce redundancy. For example, you can create one time series per variable in a dataset and synchronize zoom/pan events across them. Conceptually, subplot() arranges plots into a table with a specified number of rows and columns via the nrows argument. By default, rows and columns share equal proportions of height and width, but these can be adjusted using the heights and widths arguments. This flexibility is useful for various visualizations, such as joint density plots or interactive dendrograms created with the heatmaply package.
The above plot demonstrates a visual diagram of controlling the heights of rows and widths of columns. In this particular example, there are five plots being placed in two rows and three columns.
Above, we see creative use of a subplot function to create maginal distributions along with a contour plot.
Sub Plots aren't the only way of linking multiple views. We can utilize the ggplot framework to do the same -
gg1 <- ggplot(economics_long, aes(date, value)) + geom_line() + facet_wrap(~variable, scales = "free_y", ncol = 1) gg2 <- ggplot(economics_long, aes(factor(1), value)) + geom_violin() + facet_wrap(~variable, scales = "free_y", ncol = 1) + theme(axis.text = element_blank(), axis.ticks = element_blank()) subplot(gg1, gg2)
Plotly supports key frame animations through the frame
argument or aesthetic in both plot_ly()
and ggplotly()
. Additionally, the ids
argument ensures smooth transitions between objects with the same ID, facilitating object constancy.
The famous Gapminder animation demonstrates the relationship between GDP per capita and life expectancy over time.
data(gapminder, package = "gapminder")
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
geom_point(aes(size = pop, frame = year, ids = country)) +
scale_x_log10()
ggplotly(gg)
animation_button()
, animation_slider()
, and animation_opts()
to customize the animation controls.The following example modifies the time between frames, transition easing, and the placement of animation controls.
base <- gapminder %>%
plot_ly(x = ~gdpPercap, y = ~lifeExp, size = ~pop, text = ~country, hoverinfo = "text") %>%
layout(xaxis = list(type = "log"))
base %>%
add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
animation_opts(1000, easing = "elastic", redraw = FALSE) %>%
animation_button(x = 1, xanchor = "right", y = 0, yanchor = "bottom") %>%
animation_slider(currentvalue = list(prefix = "YEAR ", font = list(color = "red")))
Frames are ordered numerically or alphabetically by default. Using factors provides more control over frame ordering. In the example below, continents are ordered by average life expectancy.
meanLife <- with(gapminder, tapply(lifeExp, INDEX = continent, mean))
gapminder$continent <- factor(gapminder$continent, levels = names(sort(meanLife)))
base %>%
add_markers(data = gapminder, frame = ~continent) %>%
hide_legend() %>%
animation_opts(frame = 1000, transition = 0, redraw = FALSE)
You can overlay animated frames on top of a static background for better context.
base %>%
add_markers(color = ~continent, showlegend = F, alpha = 0.2, alpha_stroke = 0.2) %>%
add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
animation_opts(1000, redraw = FALSE)
Currently, the scatter plotly.js trace type has full support for animation. For other chart types, creative solutions are necessary. For example, to animate a population pyramid (a bar chart), use add_segments()
instead of add_bars()
.
Animating U.S. population projections by age and gender from 2018 to 2050.
library(idbr)
us <- bind_rows(
idb1(country = "US", year = 2018:2050, variables = c("AGE", "NAME", "POP"), sex = "male"),
idb1(country = "US", year = 2018:2050, variables = c("AGE", "NAME", "POP"), sex = "female")
)
us <- us %>%
mutate(POP = if_else(SEX == 1, POP, -POP), SEX = if_else(SEX == 1, "Male", "Female"))
plot_ly(us, size = I(5), alpha = 0.5) %>%
add_segments(x = ~POP, xend = 0, y = ~AGE, yend = ~AGE, frame = ~time, color = ~factor(SEX))
Visualizing the same data using lines instead of segments.
plot_ly(us, alpha = 0.5) %>%
add_lines(x = ~AGE, y = ~abs(POP), frame = ~time, color = ~factor(SEX), line = list(simplify = FALSE)) %>%
layout(yaxis = list(title = "US population"))
Plotly's animation API provides robust tools for creating dynamic, interactive visualizations. The subplot()
function allows for flexible layout arrangements, and animation controls can be customized to enhance the user experience.
Function | Description |
---|---|
animation_opts() |
Customize animation settings like frame duration and easing |
animation_button() |
Customize the play/pause button |
animation_slider() |
Customize the slider control |
Linking views is a powerful technique for exploring complex data relationships. Plotly provides several methods for linking views, including brushing, highlighting, and filtering. These techniques allow users to interact with one view and see the corresponding changes in other views, providing a more comprehensive understanding of the data.
Graphical brushing and highlighting are interactive techniques that allow users to select data points in one view and see the corresponding changes in other views. This technique is particularly useful for exploring relationships between variables and identifying patterns in the data.
d_1 <- ISLR2::Auto # Custom color palette custom_colors <- c("#FF5733", "#33FF57", "#3357FF", "#F7FF33", "#FF33A6", "#33FFF7", "#F733FF") p <- d_1 %>% mutate(year = factor(year)) %>% highlight_key(~year) %>% plot_ly( x = ~mpg, y = ~acceleration, color = ~year, mode = "markers+text", textposition = "top", colors = custom_colors ) %>% highlight(on = "plotly_hover", off = "plotly_doubleclick")
The above plot, the year column in the dataset is converted to a factor, and highlight_key is used to enable interactive highlighting based on the year values. The plot_ly function is used to create a scatter plot with mpg on the x-axis and acceleration on the y-axis, with data points colored according to the custom palette. Both markers and text are displayed, with text positioned at the top of each marker. The highlight function adds interactivity, allowing users to highlight points by hovering and reset the highlights by double-clicking.
Adding a selector to the plot allows users to choose which variable to highlight. This is particularly useful when exploring relationships between multiple variables.
data(txhousing, package = "ggplot2") # declare `city` as the SQL 'query by' column tx <- highlight_key(txhousing, ~city,"Select a City") # initiate a plotly object base <- plot_ly(tx, color = I("black")) %>% group_by(city) # create a time series of median house price ts <- base %>% group_by(city) %>% add_lines(x = ~date, y = ~median, width = 0.5) highlight(ts, on = "plotly_click", selectize = TRUE, dynamic = TRUE, persistent = TRUE,color = 'red', selected = attrs_selected(line = list(width = 3)))
The above plot demonstrates how to add a selector to the plot, allowing users to choose which variable to highlight. The plotly object is initiated with the base plot, and a time series of median house prices is created for each city. The highlight function is used to add interactivity, enabling users to select a city by clicking on the plot. The selected city is highlighted in red, and the line width is increased to 3.
Sub-plots are a powerful way to visualize multiple views of the same data. Plotly provides several methods for linking sub-plots, including shared axes, linked brushing, and synchronized zooming. These techniques allow users to interact with one sub-plot and see the corresponding changes in other sub-plots, providing a more comprehensive understanding of the data.
dot_plot <- base %>% summarise(miss = sum(is.na(median))) %>% filter(miss > 0) %>% add_markers( x = ~miss, y = ~forcats::fct_reorder(city, miss), hoverinfo = "x+y" ) %>% layout( xaxis = list(title = "Number of months missing"), yaxis = list(title = "") ) p <- subplot(dot_plot, ts, widths = c(.2, .8), titleX = TRUE) %>% layout(showlegend = FALSE) %>% highlight(on = "plotly_selected", dynamic = TRUE, selectize = TRUE)
The above plot demonstrates how to link sub-plots using shared axes, linked brushing, and synchronized zooming. The dot_plot is created to visualize the number of missing values in the median house price data for each city. The ts plot displays the time series of median house prices for each city. The subplot function is used to combine the dot_plot and ts plots, with the widths argument specifying the relative widths of the sub-plots. The highlight function adds interactivity, allowing users to select data points in the dot_plot and see the corresponding changes in the ts plot.
Function | Description |
---|---|
highlight_key() |
Enable interactive highlighting based on a key variable |
highlight() |
Add interactivity to a plot, allowing users to highlight data points |
subplot() |
Combine multiple plotly objects into a single sub-plot |
Cross-talk is a powerful technique for filtering data across multiple views. It allows users to interact with one view and see the corresponding changes in other views, providing a more comprehensive understanding of the data. Plotly provides several methods for filtering data with cross-talk, including linked brushing, dynamic filtering, and persistent selection.
# Filtering library(crosstalk) # generally speaking, use a "unique" key for filter, # especially when you have multiple filters! d_1 tx <- highlight_key(d_1,~name,"Select a Car") gg <- ggplot(tx) + geom_point(aes(weight, horsepower,group = name), color = 'red') + theme_minimal() p <- bscols( filter_select("id", "Select a car", tx, ~name), ggplotly(gg, dynamicTicks = TRUE), widths = c(12, 12) )
library(crosstalk) tx <- highlight_key(d_1) widgets <- bscols( widths = c(12, 12, 12), filter_select("cylinders", "Cylinders", tx, ~cylinders), filter_slider("horsepower", "Horsepower", tx, ~horsepower), filter_checkbox("year", "Years", tx, ~year, inline = TRUE)) p <- bscols( widths = c(4, 8), widgets, plot_ly(tx, x = ~mpg, y = ~acceleration, showlegend = FALSE) %>% add_lines(color = I('red'), colors = "black") )
While interactivity using plotly is a quick and dirty way of gettings things done, Interactive dashboards using Shiny or JavaScript offer dynamic and user-friendly ways to visualize and interact with data. Shiny, an R package, allows developers to build interactive web applications directly from R. It provides a seamless way to combine the power of R's data analysis capabilities with the interactivity of web technologies. With Shiny, users can create dashboards that respond to user inputs, such as sliders, dropdowns, and buttons, to dynamically update visualizations and analyses. JavaScript, on the other hand, is a versatile language for building interactive dashboards, often using libraries like D3.js, Plotly.js, or React. These libraries offer extensive customization options and the ability to handle complex interactions and animations, making it possible to create highly interactive and visually appealing dashboards. Both Shiny and JavaScript enable real-time data updates, interactive filtering, and responsive design, enhancing the user's ability to explore and understand data in an engaging and intuitive manner.
Feature | Shiny | Plotly |
---|---|---|
Language Integration | Seamless integration with R, enabling direct use of R's data analysis and statistical functions. | Can be used with multiple languages including Python, R, and JavaScript, providing flexibility in development. |
Customization | Extensive customization of user interfaces with various input and output widgets, allowing for complex interactions and layouts. | Highly customizable visualizations with extensive styling options and the ability to create complex, interactive charts. |
Ease of Use | User-friendly for those familiar with R, with a simple syntax for creating interactive elements and linking them to data. | Easy to create interactive visualizations with minimal code, especially for users familiar with Plotly’s straightforward syntax. |
Real-time Interactivity | Supports real-time data updates and dynamic interactions through reactive programming, making it ideal for live dashboards. | Provides real-time updates and interactivity, especially useful in web-based visualizations and data exploration tasks. |
Complexity Handling | Capable of handling complex server-side computations and large data processing tasks through its reactive framework. | Efficiently handles complex visualizations and large datasets with smooth rendering and interaction capabilities. |
Deployment | Easy deployment options through Shiny Server, Shinyapps.io, or Docker, enabling quick sharing and scaling of applications. | Flexible deployment on various platforms including web browsers, Jupyter notebooks, and standalone HTML files. |
Community and Support | Strong community support with extensive documentation, tutorials, and a large number of contributed packages. | Wide community support with thorough documentation, examples, and active development in multiple programming languages. |
I am a 2nd year PhD student in the Quanitative Methods Department at York University. Although my research primary revolves around adapting machine learning methodologies to Psychology, I take immense pleasure in improving the statistical literacy for everyone involved.
Feel free to reach out to me at arjun10@yorku.ca.
Here are some resources that I found particularly useful while creating this document -