Arjun Ghumman

Interactive Data Visualization

Introduction to IDV (Interactive Data-Vizualization)

Advantages Description
Enhanced Exploration Allows direct interaction, zooming, filtering, and dynamic parameter changes.
Deeper Understanding Provides context, tooltips, and additional information on demand.
Iterative Analysis Supports quick iteration through different views, accelerating the EDA process.
Communication & Collaboration Engages stakeholders, facilitates clearer explanations, and fosters better discussions.
Complex Data Representation Helps in visualizing multidimensional data in a more understandable manner.
User-Centric Exploration Puts users in control, allowing personalized insights and exploration.

While R and packages like tidyverse provide powerful tools for data manipulation, visualization, and modeling, interactive data visualization adds another layer of exploration and understanding. I've highlighted several advantages in the context of exploratory data analysis (EDA) and data science workflows above.

Historically when researchers attempt to convey their findings, they often opt to use a different tools or programming languages like JavaScript to create interactive web visuals, showcasing their key discoveries. However, this shift demands a significant change in context, necessitating an entirely different skill set, which often hinders efficiency. Not only does one have to learn a completely different programming language, the learning curve for certain interactive visualization libraries is another challege. While modern interactive tools such as Tableu offer interactivity and offer nice properties, they often do not gel with a coding workflow since such GUI based systems tend to be rather "closed off" that means they don’t allow themselves to be easily customized, extended, or integrated with another system.

Grammar of Graphics and GGplot

I previously covered the grammar of graphics and GGplot in full detail, additionally, there are several resources available for a gentle introduction to the GGplot framework. Throughout this document, I will be implementing interactivity within the GGplot framework.

Simulating Height and Weight Data

            # Simulate Height and Weight Data with R

            misty::descript(starwars)
            
            # Simulate Height
            
            height < runif(100,120,220)
            
            # Simulate Weight 
            weight < 50 + 0.5 * height + 0.7 * (height^1.5) + rnorm(100, sd = 300)
            
            # Rescale 
            
            final_weight < (weight - mean(weight))*(40/sd(weight)) + 150
        

Highlighting Observations

Highlighting observations is a simple way to add interactivity to a plot. In the example below, I've highlighted a certain segment of observations in the dataset.

            # Highlighting Data

            library(plotly)

            key < highlight_key(dat)

            p1 < ggplot(key, aes(height, final_weight)) +
            geom_point()

            inter_p1 < highlight(ggplotly(p1), "plotly_selected")
        

Above we can see a simple example of how a lasso or selection tool can be used within an interactive framework to highlight certain observations of interest and save them as an image. Allowing users such interactivity, however simplified, such as in this case, adds a new dimension to visualization.

When a key insight surfaces, the HTML-generated figures can be easily shared via email or embedded in reports/websites. These interactive visuals, using htmlwidgets, seamlessly work in RMarkdown, Shiny apps, RStudio, Jupyter, etc. Sharing fosters discussions, allowing colleagues to offer fresh perspectives and even glean immediate insights from the graphics.

Annotating Points

Another way to add interactivity is to annotate points of interest. In the example below, I've annotated the points of interest with their respective domain. Here the use of the ggforce package is demonstrated to annotate certain points into a hull.

library(ggforce)

            dat %>% mutate(tall = ifelse(height > 186, 'tall', 'x')) %>% 
              ggplot(aes(height, final_weight)) + 
              geom_point(color = 'black', size = 3, shape = 21, fill = rgb(0,0,0,alpha = 0.5), stroke = 1.1,aes(alpha = tall), show.legend = F) +
              geom_mark_hull(aes(filter = tall == 'tall', label = tall)) +
              labs(title = "Weights for Tall People",
               \t    x = "Height",
                \t   y = "Weight") +
              papaja::theme_apa() +
              scale_alpha_manual(values = c(1,0.2))

Plotly

Plotly is a library for data visualization, which offers an array of interactive plots like scatter plots, line charts, and 3D visuals. It stands out for its interactivity, allowing users to explore data within the plots themselves by zooming, hovering for details, and toggling data visibility. Compatible with Jupyter, Dash, Shiny, JS and integration with Pandas and NumPy, Plotly is a versatile tool for creating dynamic and engaging visualizations.

  1. Create a Plotly Visualization: Generate an interactive plot using Plotly in Python or R.
  2. Export the Plot as HTML: Save the interactive Plotly visualization as an HTML file using the appropriate functions in Python (plotly.offline.plot) or R (htmlwidgets package).
  3. Include HTML File in Website: Embed the saved HTML file into your website using an HTML iframe tag or directly insert the HTML code to display the Plotly visualization.

There are two main ways to create a Plotly object:

Both approaches have somewhat complementary strengths and weaknesses, so it can be beneficial to learn both methods.

Plotly (plot_ly)

The Plotly package in R creates plots in R using the underlying library in JavaScript (plotly.js). The plot_ly() function has a direct connection to plotly.js, providing extra simplifications that enhance the plotting procedure. These simplifications, inspired by the Grammar of Graphics and ggplot2, notably expedite the shift between various visuals, making it easier to uncover valuable data insights.

            # Bar Graphs
            plotly_1 <- diamonds %>% plot_ly() %>% add_histogram(x=~cut)
            plotly_2 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1)))
            plotly_3 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1)), stroke = I(rgb(0,0,0,0.2)),span = I(15))
            plotly_4 <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, color = I(rgb(0,0,0,1)), stroke = I(rgb(0,0,0,0.2)),span = I(15)) %>% layout(title = 'My Graph')

            # Bar vs Histogram in Plotly
            hist <- diamonds %>% plot_ly() %>% add_histogram(x = ~cut, stroke = I('black'), span = I(2),bingroup = I(5))
            bars <- diamonds %>% group_by(cut) %>% count %>% plot_ly() %>% add_bars(x = ~cut, y = ~n, stroke = I('black'), span = I(5))


            x <- plotly_json(plotly_1)
            build <- plotly_build(plotly_1)
        

The above code demonstrates creating bar graphs using plot_ly() with various configurations such as different colors, stroke settings, spans, and layout adjustments.

Plotly adopts a functional approach akin to the layered grammar of graphics. Here, most functions expect a plotly object as input and yield a modified version of it. These modifications solely depend on the function's input values, unlike base R graphics that often involve side effects. For instance, the layout() function alters layout components like the title within a plotly object.

Complex plot modifications in plotly can be challenging to navigate. The %>% operator from magrittr offers a left-to-right reading sequence, placing the object on the left into the first argument of the function on the right. This approach simplifies the understanding of layered modifications (Wickham, 2014).

Putting things together

            diamonds %>% group_by(cut) %>% count() %>% 
            plot_ly(x = ~cut, y = ~n) %>%
            add_bars(, stroke = I('black'), span = I(5), color = I(rgb(1,0,0,0.5)))  %>%
            add_text(text = ~scales::comma(n), y = ~n, textposition = "top middle", cliponaxis = FALSE)
        

The above code demonstrates how to put things together in plotly. The code is self-explanatory, however, in this example, we add multiple layers to a plotly graph, a text layer in addition to an histogram. The viewers have the option to turn off certain layers in the interactive component. Additionally, we combine the layers using magrittr piping.

Plotly Layers

Plotly layers are the building blocks of a plotly object. The add_*() functions, such as add_histogram(), add_lines(), add_markers(), etc., define how data is rendered into geometric objects, following the layered grammar of graphics. A layer, in this context, comprises five components: data, aesthetic mappings (e.g., color), geometric representation (e.g., rectangles), statistical transformations (e.g., sum), and positional adjustments (e.g., dodge).

  1. In addition to layout(), there are add_*() functions (e.g., add_histogram(), add_lines(), etc.) defining how data is rendered into geometric objects, following the layered grammar of graphics.
  2. A layer, in this context, comprises five components: data, aesthetic mappings (e.g., color), geometric representation (e.g., rectangles), statistical transformations (e.g., sum), and positional adjustments (e.g., dodge).
  3. plot_ly() automatically adds a layer; explicitly adding layers, like add_histogram(), clarifies the plot's elements (e.g., plot_ly(diamonds, x = ~cut) requires add_histogram()).
  4. plotly offers add_*() functions like add_histogram() and add_bars(), where add_histogram() computes statistics dynamically, while add_bars() needs pre-specified bar heights.
  5. There are several other add_*() functions, with some performing statistical calculations in the browser (e.g., add_histogram2d(), add_contour(), add_boxplot()), while others focus more on graphics than statistics.
  6. Non-statistical layers generally offer faster runtime due to reduced computational load, while statistical layers provide more client-side interactivity options.
  7. Optimizing performance often involves rendering large sets of graphical elements, where Canvas via toWebGL() might be preferable over SVG (default).
  8. Combining multiple graphical layers into one plot is common, and understanding plot_ly() becomes crucial for this purpose.

Plotly layout is a collection of attributes that define the plot's appearance, such as the title, axis labels, and background color. The layout() function modifies the layout components of a plotly object. The layout() function is a wrapper for the layout attributes, which are defined in the plotly.js schema.

GGPlotly

The ggplotly() function in plotly translates ggplot2 visuals into interactive plotly ones, simplifying the addition of interactivity to ggplot2 workflows. This function leverages ggplot2's intuitive interface for effortless exploration of statistical summaries across different groups using various geoms.

It efficiently handles facets, relative frequency displays, and seamlessly supports extensions like ggforce, naniar, and GGally, enhancing the visualization capabilities. It showcases different ways to visualize data, leveraging geom_sina(), stat_summary(), and ggplot2's strengths, facilitating in-depth analysis and model diagnostics. ggplotly() empowers the visualization toolkit by enabling interactive features like hover, zoom, and filtering, enhancing exploratory analysis, and allowing linking multiple views. Despite not always yielding perfect conversions, modifying ggplotly()'s return values or customizing tooltips can enhance interactivity, covered in later chapters for advanced modifications.

            Plot <- |GGplot Object|
            ggplotly(Plot)

            # Example
            ggplotly(ggplot(diamonds, aes(x = carat, y = price, color = cut)) + geom_point())
        

Traces

In Plotly, a "trace" refers to a visual representation of data on a plot. Each trace represents a specific set of data points and contains information about how that data should be displayed. Traces can include scatter plots, lines, bars, histograms, heatmaps, and more.

Key Points Description
Type of Data Traces encapsulate specific types of data visualizations, such as scatter points, lines, bars, or other chart elements.
Attributes Each trace has its own set of attributes defining appearance and behavior, including settings for markers, lines, colors, labels, etc.
Multiple Traces Allows having multiple traces on a single plot, facilitating visualization and comparison of different datasets or aspects of the same dataset.
Plot Composition Traces are combined to create a complete plot. Each trace is added to the plot layout and configured individually, collectively representing the entire dataset or multiple datasets.
            plot_ly() %>% 
            add_trace(
                type = "scatter",
                mode = "markers+lines",
                x = 4:6,
                y = 4:6
            ) \br
            plot_ly() %>% 
            add_trace(
                type = "scatter",
                mode = "markers",
                x = 4:6,
                y = 4:6
            ) %>% 
            add_trace(type = "scatter",
                        mode = "lines",
                        x = 4:6,
                        y = 4:6)
        

The first block of code adds a single trace to the plot. It creates a scatter plot where markers and lines are both visible (mode = "markers+lines"), and it uses the values 4, 5, and 6 for both x and y axes. In the second block, we employ the two traces separately, allowing users to interactively manipulate each individual trace.

ToolTip

Layout

Chart Types

Plotly R Library Basic Charts
Plotly's R graphing library makes interactive, publication-quality graphs online. Examples of how to make basic charts.
Scatter and Line Plots
Visualize relationships between numeric variables
Bar Charts
Compare data across different categories
Bubble Charts
Display three dimensions of data
WebGL vs SVG in R
Comparing rendering techniques in R
Filled Area Plots
Display data and emphasize trends
Horizontal Bar Charts
Present data horizontally
Gantt Charts
Visualize project schedules and timelines
Sunburst Charts
Hierarchical data visualization
Pie Charts
Show proportions of a whole
Tables
Display structured data
Dot Plots
Represent distributions of data
Dumbbell Plots
Visualize changes between two points
Sankey Diagram
Show flow or relationships between entities
Treemap Charts
Represent hierarchical data

Basic Scatterplot

In a scatter plot, markers refer to the individual data points displayed on the chart. Each data point is represented by a marker, which is a visual element such as a dot, symbol, or shape that signifies the value of the data point on both the x and y axes.

            # Basic Scatterplot \br
            p_1 <- dat %>% ggplot(aes(height, final_weight)) + 
            geom_point(size = 4, shape = 21, stroke = 1.1, color = 'black', fill = rgb(0,0,0,0.5)) +
            papaja::theme_apa() +
            labs(title = 'Height vs Weight', y = 'Weight', x = 'Height') \br

            ggplotly(p_1)
          
                dat %>% plot_ly(x = ~height, y = ~final_weight, mode = "markers", type = "scatter", name = "Scatter Points") %>%
                add_trace(x = ~height, y = ~final_weight, mode = "lines", type = "scatter", name = "Smoothing Line") %>%
                layout(title = "Scatter Plot with Smoothing Line")
            

Marker Attributes

Below, is a list of some common marker attributes that can be customized in Plotly:

Attribute Description
size Sets the marker size.
color Determines the marker color.
symbol Specifies the marker symbol type.
opacity Sets the marker opacity.
line Defines the marker's border line properties.
gradient Specifies the marker color scale gradient.
sizeref Sets the scale factor for marker size.
sizemode Specifies how the marker size is determined.
showlegend Determines whether the marker appears in the legend.
                p1 <- plot_ly(type = "scatter") %>%
                    add_trace(mode = "markers",x=1:2,y=rep(1,2), marker = list(size = 20) ,name = "Default Marker") %>% 
                    add_trace(x=1:2,y=rep(2,2),mode = "markers", marker = list(size = 30, color = "red"), name = "Custom Size & Color") %>%
                    add_trace(x=1:2,y=rep(3,2),mode = "markers", marker = list(size = 20,symbol = "square"), name = "Square Markers") %>%
                    add_trace(x=1:2,y=rep(4,2),mode = "markers", marker = list(size = 20,symbol = "triangle-up", color = "green"), name = "Triangle Markers") %>%
                    add_trace(x=1:2,y=rep(5,2),mode = "markers", marker = list(size = 20,symbol = "diamond", opacity = 0.7), name = "Custom Opacity") %>%
                    add_trace(x=1:2,y=rep(6,2),mode = "markers", marker = list(size = 20, opacity = 1, line = list(color = 'black',width = 5)), name = "border") %>%
                    layout(title = "Custom Marker Styles in Plotly", xaxis = list(range = c(0, 4)))
            

Grouped Scatterplots

Grouped scatterplots are a great way to visualize the relationship between two variables across different groups. In the example below, we can see how the relationship between height and weight varies across different groups.

                p1 <- starwars %>% filter(!mass>300) %>% ggplot(aes(height,mass)) +
                geom_smooth(method = 'lm',se = F) +
                papaja::theme_apa()

                int1 <- p1 +  geom_point(color = 'black', fill = 'red', shape = 21, size = 3.5, alpha = 0.8)
                int2 <- p1 + geom_point(data = starwars %>% drop_na(),aes(fill = sex), color = 'black', shape = 21, size = 3.5, alpha = 0.8) + theme(legend.position = c(0.15,0.85))
                int3 <- p1 + geom_point(data = starwars %>% drop_na(),aes(shape = sex), color = 'black', size = 3.5, alpha = 0.8) + theme(legend.position = c(0.15,0.85))

                int_plot <- ggplotly(int3)
            
                starwars %>% 
                    plot_ly() %>%
                    add_trace(type = 'scatter', mode = 'markers', x = ~height, y = ~mass, color = ~sex,colors = col, marker = list(size = 20, opacity = 0.5, line = list(color = 'black', width = 2)))
            

Line Plots

Line plots, also known as line charts or line graphs, are a type of data visualization that displays information as a series of data points connected by straight lines. They are particularly useful for showing trends and relationships between continuous data points over a continuous interval or time period.

                economics %>% ggplot(aes(date, unemploy)) +
                geom_line() +
                papaja::theme_apa()

                int_plot2 <- (economics %>%
                arrange(psavert) %>%
                mutate(decade = 10 * (year(date) %/% 10),
                        decade = factor(decade)) %>% 
                ggplot(aes(date, unemploy, group = decade, color = decade)) +
                geom_line() +
                papaja::theme_apa() )%>% ggplotly()
            

Above, we can see a simple demonstration of how interactivity with lineplots can provide us with a lot more information. We can see from the above static plot, that unemployment rate changes over time, however, the change is non-linear and in part, a function of the rising population and economic conditions. With some minor tweaks, we can add a certain amount of interactivity, where users can toggle unemployment rates by decade, offering an entirely new flavor to the visualization.

If we wish to identify, yearly trends in unemployment rates by month, it would be rather tedious to examine the entire plot, or a facetted plot. Alternatively, we could use our newfound skills and interactively present the plot allowing users/viewers to manipulate certain facets and identify more meaningful trends.

                int_plot3 <- (econ %>% ggplot(aes(mnth, unemploy, col = ordered(yr))) +
                geom_line(show.legend = F) +
                scale_x_continuous(breaks = 1:12,labels = month.abb) +
                papaja::theme_apa()) %>% ggplotly()
            

Similarly, we can use the ggplot framework to produce all kinds of plots with an interactive twist.

Bar Charts

Bar charts are a type of data visualization that are used to display and compare the number, frequency or other measure (e.g., mean) for different discrete categories or groups. The bars can be either vertical (sometimes called a column graph) or horizontal. The height or length of the bar is proportional to the number of observations or frequency.

The add_bars() and add_histogram() functions in Plotly.js encapsulate the 'bar' and 'histogram' trace types, respectively. They differ primarily in their handling of data: add_bars() requires both x and y values for bar heights, while add_histogram() needs only a single variable, letting Plotly.js handle binning in the browser. Despite their similar usage, both can visualize either numeric or discrete variables, differing in where the binning occurs.

            add_bar(p, x = NULL, y = NULL, text = NULL, hovertext = NULL, hoverinfo = NULL, 
            hoverlabel = NULL, marker = NULL, opacity = NULL, base = NULL, width = NULL, 
            x0 = NULL, dx = NULL, y0 = NULL, dy = NULL, orientation = NULL, 
            name = NULL, error_y = NULL, error_x = NULL, ids = NULL, showlegend = NULL, 
            legendgroup = NULL, stackgroup = NULL, alignmentgroup = NULL, offsetgroup = NULL, 
            customdata = NULL, xaxis = NULL, yaxis = NULL, frame = NULL, 
            connectgaps = NULL, visible = NULL, ...)
          
  1. p: A plotly object.
  2. x: Sets the x coordinates.
  3. y: Sets the y coordinates.
  4. text: Sets text elements associated with each (x,y) pair.
  5. hovertext: Sets text elements associated with each (x,y) pair to appear on hover.
  6. hoverinfo: Determines which trace information appears on hover.
  7. hoverlabel: Sets the hover label properties.
  8. marker: Sets the marker properties (color, size, etc.).
  9. opacity: Sets the opacity of the bars.
  10. base: Sets the base of the bars.
  11. width: Sets the bar width.
  12. x0: Alternate to x.
  13. dx: Sets the x coordinate step.
  14. y0: Alternate to y.
  15. dy: Sets the y coordinate step.
  16. orientation: Sets the orientation of the bars ('v' for vertical, 'h' for horizontal).
  17. name: Sets the trace name.
  18. error_y: Sets the y-axis error bars.
  19. error_x: Sets the x-axis error bars.
  20. ids: Assigns id labels to each datum.
  21. showlegend: Determines whether or not an item corresponding to this trace is shown in the legend.
  22. legendgroup: Sets the legend group for this trace.
  23. stackgroup: Sets the stack group.
  24. alignmentgroup: Sets the alignment group.
  25. offsetgroup: Sets the offset group.
  26. customdata: Assigns extra data to each datum.
  27. xaxis: Sets a reference to a named x axis.
  28. yaxis: Sets a reference to a named y axis.
  29. frame: Sets the frame.
  30. connectgaps: Determines whether or not gaps in the provided data arrays are connected.
  31. visible: Determines whether or not this trace is visible.
          d_1 <- ISLR2::Auto

          d_1 %>% summarise(displacement = mean(displacement), .by = cylinders) %>% mutate(cylinders = factor(cylinders)) %>% plot_ly(x = ~cylinders, y = ~displacement, type = 'bar')
        

Creating Simple Histograms is a simple extension using the plotly framework.

          p1 <- plot_ly(diamonds, x = ~price) %>% add_histogram(name = "plotly.js")
        

The add_histogram() function sends all of the ob- served values to the browser and lets plotly.js perform the binning.

While there is a distinction between histograms and bar plots, add_histogram and add_bars handle the data similarly.

          p1 <- plot_ly(diamonds, x = ~cut) %>%
          add_histogram()
        p2 <- diamonds %>%
          count(cut) %>%
          plot_ly(x = ~cut, y = ~n) %>% add_bars()
        
        subplot(p1,p2)

Multiple Interactive distributions

          diamonds %>%
          group_split(cut) %>%
          map(~.x %>% plot_ly(x = ~price,type = "histogram",name = paste0("Cut = ", unique(.$cut)))) %>%
          subplot(nrows = 2, shareX = TRUE,shareY = T ,titleX = FALSE) %>% hide_legend()
        

The above code splits the datasets into lists which further allows us to map a certain histogram function on each subset of the data. The functionality of "subplot" further allows you to comine all the plot lists into one with a shared x and y axis.

Grouped Bar Plots

The easiest way to produce grouped bar plots is similar to ggplot. Where we can group or color by a discrete variable.

          plot_ly(diamonds, x = ~cut, color = ~clarity) %>% add_histogram()
        

The layout argument allows you to specify the type of grouping. Several groupings are available in plotly such as -

  1. group
  2. stack
  3. overlay
  4. relative
          plot_ly(diamonds, x = ~cut, color = ~clarity) %>% add_histogram() %>% layout(barmode = "stack")
        

Box and Violin Plots

Box plots and violin plots are used to visualize the distribution of data and compare the distribution of data between different groups or categories. Box plots display the median, quartiles, and outliers of a dataset, while violin plots provide a more detailed view of the data distribution by showing the probability density of the data at different values.

Box Plot

          p1 <- ggplot(diamonds, aes(x = cut, y = price)) + geom_boxplot()
          ggplotly(p1)
        
          p1 <- plot_ly(diamonds, x = ~cut, y = ~price, type = "box")
        

Additionally, we can easily produce violin plots, boxplots, and other combinations as follows -

          p1 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% ggplot(aes(x = carat, y = price)) + geom_boxplot()
          ggplotly(p1)

          p2 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% plot_ly(x = ~carat, y = ~price, type = 'box')

          p3 <- diamonds %>% mutate(carat = cut_interval(carat, 5)) %>% plot_ly(x = ~carat, y = ~price, type = 'violin')

          p4 <- plot_ly(diamonds, x = ~price, y = ~interaction(clarity, cut)) %>% add_boxplot(color = ~clarity) %>%
            layout(yaxis = list(title = ""))
        

2d Frequencies

2D frequency plots are used to visualize the relationship between two categorical variables. They are particularly useful for identifying patterns, trends, and associations between the variables. In Plotly, 2D frequency plots can be created using the add_heatmap() function, which displays the frequency of occurrences of each combination of categories in a grid format.

          p <- plot_ly(diamonds, x = ~log(carat), y = ~log(price))

          subplot(
            add_histogram2d(p) %>%
              colorbar(title = "default") %>% layout(xaxis = list(title = "default")) %>% 
            add_histogram2d(p, zsmooth = "best") %>% colorbar(title = "zsmooth") %>% layout(xaxis = list(title = "zsmooth")),
            add_histogram2d(p, nbinsx = 60, nbinsy = 60) %>% colorbar(title = "nbins") %>%
              layout(xaxis = list(title = "nbins")),
            shareY = TRUE, titleX = TRUE
          )

        

The plotly package offers two functions for displaying rectangular bins: add_heatmap() and add_histogram2d(). For numeric data, add_heatmap() is a 2D analog of add_bars() and requires pre-computed bins, while add_histogram2d() is a 2D analog of add_histogram() and computes bins in the browser, making it more suitable for exploratory purposes. add_histogram2d() features zsmooth for increasing bin numbers through bi-linear interpolation, and nbinsx/nbinsy to set the number of bins in the x and y directions. Additionally, filled contours can be used instead of bins with add_histogram2dcontour().

Heat-Maps/Contours

Heatmaps and contour plots are used to visualize the distribution of data across two dimensions. Heatmaps display data as a grid of colored cells, where the color intensity represents the value of the data at each cell. Contour plots, on the other hand, display data as a series of contour lines, where each line represents a constant value of the data.

          # Heatmaps

          p1 <- plot_ly(diamonds, x = ~cut, y = ~color, z = ~price) %>% add_heatmap()

          # Contours 

          p2 <- plot_ly(diamonds, x = ~cut, y = ~color, z = ~price) %>% add_histogram2dcontour()

          g1 <- subplot(p1,p2,nrows = 2)
        

Heatmaps and contour plots are useful for visualizing the distribution of data across two dimensions. They are particularly useful for identifying patterns, trends, and associations between variables. In Plotly, heatmaps and contour plots can be created using the add_heatmap() and add_histogram2dcontour() functions, respectively.

3D Plots

3D plots are used to visualize data in three dimensions. They are particularly useful for visualizing complex relationships between multiple variables. In Plotly, 3D plots can be created using the add_trace() function

As it turns out, by simply adding a z attribute plot_ly() automatically renders markers, lines, and paths in three dimensions.

          p1 <-  plot_ly(diamonds, x = ~carat, y = ~price, z = ~depth) %>% add_markers(size = 0.5)

          p2 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~cut) %>% add_paths(color = ~cut)

          p3 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~cut) %>% add_lines(color = ~cut)

          x <- seq_len(nrow(volcano)) + 100
          y <- seq_len(ncol(volcano)) + 500
          p4 <- plot_ly() %>% add_surface(x = ~x, y = ~y, z = ~volcano)

          subplot(p1,p2,p3,p4,nrows = 2)
        

Maps

Plotly offers various ways to create maps, broadly categorized into integrated and custom maps. Integrated maps leverage Plotly's built-in support via Mapbox or d3.js, suitable for quick visualizations without sophisticated geo-spatial representations. Custom maps offer full control over rendering geo-spatial objects, ideal for more complex visualizations.

Types of Integrated Maps

Plotly supports two types of integrated maps:

Arguments for Integrated Maps

Argument Description
layout.mapbox.style Controls the styling of the Mapbox basemap. Examples: "streets", "satellite", "dark".
layout.updatemenus Creates a dropdown menu to control map styles interactively.
projection Used in plot_geo() to specify the type of map projection. Examples: "mercator", "orthographic".

Example 1: Bubble Chart with Mapbox

  plot_mapbox(maps::canada.cities) %>%
    add_markers(
      x = ~long,
      y = ~lat,
      size = ~pop,
      color = ~country.etc,
      colors = "Accent",
      text = ~paste(name, pop),
      hoverinfo = "text"
    )

Example 2: Flight Paths with plot_geo()

      geo <- list(
        projection = list(type = 'orthographic', rotation = list(lon = -100, lat = 40, roll = 0)),
        showland = TRUE,
        landcolor = toRGB("gray95"),
        countrycolor = toRGB("gray80")
      )
      plot_geo(color = I("red")) %>%
        add_markers(data = air, x = ~long, y = ~lat, text = ~airport, size = ~cnt, hoverinfo = "text", alpha = 0.5) %>%
        add_segments(data = group_by(flights, id), x = ~start_lon, xend = ~end_lon, y = ~start_lat, yend = ~end_lat, alpha = 0.3, size = I(1), hoverinfo = "none") %>%
        layout(geo = geo, showlegend = FALSE)
      

Choropleths


        density <- state.x77[, "Population"] / state.x77[, "Area"]
        g <- list(scope = 'usa', projection = list(type = 'albers usa'), lakecolor = toRGB('white'))
        plot_geo() %>%
          add_trace(z = ~density, text = state.name, locations = state.abb, locationmode = 'USA-states') %>%
          layout(geo = g)
        

Custom Maps

The sf R package allows for creating custom maps with more control over geo-spatial data. Here's how to create a simple map with sf:


    library(rnaturalearth)
    world <- ne_countries(returnclass = "sf")
    plot_ly(world, color = I("gray90"), stroke = I("black"), span = I(1))
    

Types of Cartograms

Example: Continuous Area Cartogram

    library(cartogram)
    us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014")
    plot_ly(us_cont) %>%
      add_sf(color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills") %>%
      layout(showlegend = FALSE) %>%
      colorbar(title = "Population \n 2014")
    

Example: Dorling Cartogram


    us_dor <- cartogram_dorling(us, "pop_2014")
    plot_ly(stroke = I("black"), span = I(1)) %>%
      add_sf(data = us, color = I("gray95"), hoverinfo = "none") %>%
      add_sf(data = us_dor, color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills") %>%
      layout(showlegend = FALSE)
    

Linking/Publishing Plots

You can save any widget created using htmlwidgets packages (e.g., plotly, leaflet, DT) as a standalone HTML file using the htmlwidgets::saveWidget() function. By default, this function generates a completely self-contained HTML file, which includes all necessary JavaScript and CSS dependencies. This makes it convenient to share the widget as a single HTML file. To optimize the file size, consider using the partial_bundle() function. This function automatically creates a reduced version of the necessary dependencies, significantly reducing the overall file size, especially when using basic chart types.

Saving Standalone HTMLS

            p1 <- plot_ly(diamonds, x = ~carat, y = ~price, z = ~depth) %>% add_markers(size = 0.5)
            saveWidget(p1, "plot1.html")
        

By default, the saveWidget() function generates a completely self-contained HTML file, which includes all necessary JavaScript and CSS dependencies. This makes it convenient to share the widget as a single HTML file. To optimize the file size, consider using the partial_bundle() function. This function automatically creates a reduced version of the necessary dependencies, significantly reducing the overall file size, especially when using basic chart types.

To Include the HTML file in RMarkdown, you can use the following code template -

          ```{r}
          htmltools::tags$iframe(
            src = "p1.html",
            scrolling = "no",
            seamless = "seamless",
            frameBorder = "0"
            )
          ```
        

Alternatively, you can use the includeHTML() function to embed the HTML file directly into the RMarkdown document.

Arranging Interactive Plots

he subplot() function in plotly offers a versatile way to combine multiple plotly objects into a single object, surpassing the flexibility of trellis display frameworks like ggplot2's facet_wrap(). Unlike these frameworks, subplot() does not require conditioning on a common variable. Its functionality is comparable to the grid.arrange() function from the gridExtra package, which arranges multiple ggplot2 or lattice plots in a single view. The basic use of subplot() involves directly supplying plotly objects. For handling many plots, passing a list of plots can reduce redundancy. For example, you can create one time series per variable in a dataset and synchronize zoom/pan events across them. Conceptually, subplot() arranges plots into a table with a specified number of rows and columns via the nrows argument. By default, rows and columns share equal proportions of height and width, but these can be adjusted using the heights and widths arguments. This flexibility is useful for various visualizations, such as joint density plots or interactive dendrograms created with the heatmaply package.

The above plot demonstrates a visual diagram of controlling the heights of rows and widths of columns. In this particular example, there are five plots being placed in two rows and three columns.

Above, we see creative use of a subplot function to create maginal distributions along with a contour plot.

Sub Plots aren't the only way of linking multiple views. We can utilize the ggplot framework to do the same -

          gg1 <- ggplot(economics_long, aes(date, value)) + geom_line() + facet_wrap(~variable, scales = "free_y", ncol = 1)
          gg2 <- ggplot(economics_long, aes(factor(1), value)) + geom_violin() +
          facet_wrap(~variable, scales = "free_y", ncol = 1) + theme(axis.text = element_blank(), axis.ticks = element_blank())
          subplot(gg1, gg2)
        

Animating Views

Animation API

Plotly supports key frame animations through the frame argument or aesthetic in both plot_ly() and ggplotly(). Additionally, the ids argument ensures smooth transitions between objects with the same ID, facilitating object constancy.

Example: Gapminder Animation

The famous Gapminder animation demonstrates the relationship between GDP per capita and life expectancy over time.

data(gapminder, package = "gapminder")
        gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
          geom_point(aes(size = pop, frame = year, ids = country)) +
          scale_x_log10()
        ggplotly(gg)

Key Frame Animation Components

Customizing Animation

The following example modifies the time between frames, transition easing, and the placement of animation controls.

base <- gapminder %>%
          plot_ly(x = ~gdpPercap, y = ~lifeExp, size = ~pop, text = ~country, hoverinfo = "text") %>%
          layout(xaxis = list(type = "log"))

        base %>%
          add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
          animation_opts(1000, easing = "elastic", redraw = FALSE) %>%
          animation_button(x = 1, xanchor = "right", y = 0, yanchor = "bottom") %>%
          animation_slider(currentvalue = list(prefix = "YEAR ", font = list(color = "red")))

Controlling Frame Order

Frames are ordered numerically or alphabetically by default. Using factors provides more control over frame ordering. In the example below, continents are ordered by average life expectancy.

meanLife <- with(gapminder, tapply(lifeExp, INDEX = continent, mean))
        gapminder$continent <- factor(gapminder$continent, levels = names(sort(meanLife)))

        base %>%
          add_markers(data = gapminder, frame = ~continent) %>%
          hide_legend() %>%
          animation_opts(frame = 1000, transition = 0, redraw = FALSE)

Overlaying Frames

You can overlay animated frames on top of a static background for better context.

base %>%
          add_markers(color = ~continent, showlegend = F, alpha = 0.2, alpha_stroke = 0.2) %>%
          add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
          animation_opts(1000, redraw = FALSE)

Animation Support

Currently, the scatter plotly.js trace type has full support for animation. For other chart types, creative solutions are necessary. For example, to animate a population pyramid (a bar chart), use add_segments() instead of add_bars().

Example: Population Pyramid

Animating U.S. population projections by age and gender from 2018 to 2050.

library(idbr)
        us <- bind_rows(
          idb1(country = "US", year = 2018:2050, variables = c("AGE", "NAME", "POP"), sex = "male"),
          idb1(country = "US", year = 2018:2050, variables = c("AGE", "NAME", "POP"), sex = "female")
        )

        us <- us %>%
          mutate(POP = if_else(SEX == 1, POP, -POP), SEX = if_else(SEX == 1, "Male", "Female"))

        plot_ly(us, size = I(5), alpha = 0.5) %>%
          add_segments(x = ~POP, xend = 0, y = ~AGE, yend = ~AGE, frame = ~time, color = ~factor(SEX))

Line Chart Alternative

Visualizing the same data using lines instead of segments.

plot_ly(us, alpha = 0.5) %>%
          add_lines(x = ~AGE, y = ~abs(POP), frame = ~time, color = ~factor(SEX), line = list(simplify = FALSE)) %>%
          layout(yaxis = list(title = "US population"))

Plotly's animation API provides robust tools for creating dynamic, interactive visualizations. The subplot() function allows for flexible layout arrangements, and animation controls can be customized to enhance the user experience.

Key Functions and Arguments

Function Description
animation_opts() Customize animation settings like frame duration and easing
animation_button() Customize the play/pause button
animation_slider() Customize the slider control

Linking Views

Linking views is a powerful technique for exploring complex data relationships. Plotly provides several methods for linking views, including brushing, highlighting, and filtering. These techniques allow users to interact with one view and see the corresponding changes in other views, providing a more comprehensive understanding of the data.

Graphically Brushing/Highlighting

Graphical brushing and highlighting are interactive techniques that allow users to select data points in one view and see the corresponding changes in other views. This technique is particularly useful for exploring relationships between variables and identifying patterns in the data.

          d_1 <- ISLR2::Auto
          # Custom color palette
          custom_colors <- c("#FF5733", "#33FF57", "#3357FF", "#F7FF33", "#FF33A6", "#33FFF7", "#F733FF")

          p <- d_1 %>% 
            mutate(year = factor(year)) %>% 
            highlight_key(~year) %>% 
            plot_ly(
              x = ~mpg, 
              y = ~acceleration, 
              color = ~year,
              mode = "markers+text", 
              textposition = "top",
              colors = custom_colors
            ) %>%
            highlight(on = "plotly_hover", off = "plotly_doubleclick")
        

The above plot, the year column in the dataset is converted to a factor, and highlight_key is used to enable interactive highlighting based on the year values. The plot_ly function is used to create a scatter plot with mpg on the x-axis and acceleration on the y-axis, with data points colored according to the custom palette. Both markers and text are displayed, with text positioned at the top of each marker. The highlight function adds interactivity, allowing users to highlight points by hovering and reset the highlights by double-clicking.

Adding a selector

Adding a selector to the plot allows users to choose which variable to highlight. This is particularly useful when exploring relationships between multiple variables.

          data(txhousing, package = "ggplot2")
          # declare `city` as the SQL 'query by' column
          tx <- highlight_key(txhousing, ~city,"Select a City")
          # initiate a plotly object
          base <- plot_ly(tx, color = I("black")) %>% group_by(city)
          # create a time series of median house price
          ts <- base %>%
            group_by(city) %>%
            add_lines(x = ~date, y = ~median, width = 0.5)

          highlight(ts, on = "plotly_click",
                    selectize = TRUE,
                    dynamic = TRUE,
                    persistent = TRUE,color = 'red',
                    selected = attrs_selected(line = list(width = 3)))
        

The above plot demonstrates how to add a selector to the plot, allowing users to choose which variable to highlight. The plotly object is initiated with the base plot, and a time series of median house prices is created for each city. The highlight function is used to add interactivity, enabling users to select a city by clicking on the plot. The selected city is highlighted in red, and the line width is increased to 3.

Linking Sub-Plots

Sub-plots are a powerful way to visualize multiple views of the same data. Plotly provides several methods for linking sub-plots, including shared axes, linked brushing, and synchronized zooming. These techniques allow users to interact with one sub-plot and see the corresponding changes in other sub-plots, providing a more comprehensive understanding of the data.

          dot_plot <- base %>%
          summarise(miss = sum(is.na(median))) %>% filter(miss > 0) %>%
          add_markers(
            x = ~miss,
            y = ~forcats::fct_reorder(city, miss), hoverinfo = "x+y"
          ) %>% layout(
            xaxis = list(title = "Number of months missing"),
            yaxis = list(title = "") )
        p <- subplot(dot_plot, ts, widths = c(.2, .8), titleX = TRUE) %>% layout(showlegend = FALSE) %>%
          highlight(on = "plotly_selected", dynamic = TRUE, selectize = TRUE)
        

The above plot demonstrates how to link sub-plots using shared axes, linked brushing, and synchronized zooming. The dot_plot is created to visualize the number of missing values in the median house price data for each city. The ts plot displays the time series of median house prices for each city. The subplot function is used to combine the dot_plot and ts plots, with the widths argument specifying the relative widths of the sub-plots. The highlight function adds interactivity, allowing users to select data points in the dot_plot and see the corresponding changes in the ts plot.

Key Functions and Arguments

Function Description
highlight_key() Enable interactive highlighting based on a key variable
highlight() Add interactivity to a plot, allowing users to highlight data points
subplot() Combine multiple plotly objects into a single sub-plot

Filtering with Cross-talk

Cross-talk is a powerful technique for filtering data across multiple views. It allows users to interact with one view and see the corresponding changes in other views, providing a more comprehensive understanding of the data. Plotly provides several methods for filtering data with cross-talk, including linked brushing, dynamic filtering, and persistent selection.

          # Filtering
          library(crosstalk)
          # generally speaking, use a "unique" key for filter,
          # especially when you have multiple filters!
          d_1
          tx <- highlight_key(d_1,~name,"Select a Car")
          gg <- ggplot(tx) + geom_point(aes(weight, horsepower,group = name), color = 'red') + theme_minimal()
          
          p <- bscols(
            filter_select("id", "Select a car", tx, ~name), ggplotly(gg, dynamicTicks = TRUE),
            widths = c(12, 12)
          )
        

Dashboards

          library(crosstalk)
          tx <- highlight_key(d_1)

          widgets <-  bscols(
            widths = c(12, 12, 12),
            filter_select("cylinders", "Cylinders", tx, ~cylinders), filter_slider("horsepower", "Horsepower", tx, ~horsepower), filter_checkbox("year", "Years", tx, ~year, inline = TRUE))

          p <-  bscols(
            widths = c(4, 8), widgets,
            plot_ly(tx, x = ~mpg, y = ~acceleration, showlegend = FALSE) %>%
              add_lines(color = I('red'), colors = "black") )
        

While interactivity using plotly is a quick and dirty way of gettings things done, Interactive dashboards using Shiny or JavaScript offer dynamic and user-friendly ways to visualize and interact with data. Shiny, an R package, allows developers to build interactive web applications directly from R. It provides a seamless way to combine the power of R's data analysis capabilities with the interactivity of web technologies. With Shiny, users can create dashboards that respond to user inputs, such as sliders, dropdowns, and buttons, to dynamically update visualizations and analyses. JavaScript, on the other hand, is a versatile language for building interactive dashboards, often using libraries like D3.js, Plotly.js, or React. These libraries offer extensive customization options and the ability to handle complex interactions and animations, making it possible to create highly interactive and visually appealing dashboards. Both Shiny and JavaScript enable real-time data updates, interactive filtering, and responsive design, enhancing the user's ability to explore and understand data in an engaging and intuitive manner.

Feature Shiny Plotly
Language Integration Seamless integration with R, enabling direct use of R's data analysis and statistical functions. Can be used with multiple languages including Python, R, and JavaScript, providing flexibility in development.
Customization Extensive customization of user interfaces with various input and output widgets, allowing for complex interactions and layouts. Highly customizable visualizations with extensive styling options and the ability to create complex, interactive charts.
Ease of Use User-friendly for those familiar with R, with a simple syntax for creating interactive elements and linking them to data. Easy to create interactive visualizations with minimal code, especially for users familiar with Plotly’s straightforward syntax.
Real-time Interactivity Supports real-time data updates and dynamic interactions through reactive programming, making it ideal for live dashboards. Provides real-time updates and interactivity, especially useful in web-based visualizations and data exploration tasks.
Complexity Handling Capable of handling complex server-side computations and large data processing tasks through its reactive framework. Efficiently handles complex visualizations and large datasets with smooth rendering and interaction capabilities.
Deployment Easy deployment options through Shiny Server, Shinyapps.io, or Docker, enabling quick sharing and scaling of applications. Flexible deployment on various platforms including web browsers, Jupyter notebooks, and standalone HTML files.
Community and Support Strong community support with extensive documentation, tutorials, and a large number of contributed packages. Wide community support with thorough documentation, examples, and active development in multiple programming languages.

Contact/Resources

Who am I?

I am a 2nd year PhD student in the Quanitative Methods Department at York University. Although my research primary revolves around adapting machine learning methodologies to Psychology, I take immense pleasure in improving the statistical literacy for everyone involved.

Feel free to reach out to me at arjun10@yorku.ca.

My Website

Resources

Here are some resources that I found particularly useful while creating this document -

  1. Interactive Web-Based Data Visualization with R,plotly,and shiny - Carson Sievert
  2. Plotly Documentation
  3. Plotly for R
  4. R Graph Gallery