Creating Long Vector Plots in R: A Comprehensive Guide to HilbertVis

Long Vector-Plot/Coverage Plot in R

In this article, we will explore how to create a long vector plot with coverage data for chromosomes. We’ll go through the process of creating the plot, modifying its appearance, and adding additional features such as genes that are of special interest.

Introduction to HilbertVis

HilbertVis is a package in R that allows you to create long vector plots. These plots are particularly useful for displaying large datasets with a varying number of data points along each axis. The plotLongVector() function from the HilbertVis package provides an easy way to visualize these types of data.

Setting up the Environment

Before we begin, make sure you have R and the necessary packages installed on your system. You can install the required packages using the following command:

# Install required packages
install.packages("HilbertVis")

Once you’ve installed the package, load it into your R environment:

# Load HilbertVis package
library(HilbertVis)

Creating the Initial Plot

Let’s start by creating a long vector plot using the plotLongVector() function. We’ll create two vectors, one for each chromosome.

# Create random data for chromosomes
chr1 <- abs(makeRandomTestData(len = 1.3e07))
chr2 <- abs(makeRandomTestData(len = 1e07))

# Set up the plot
par(mfcol = c(8, 1), mar = c(1, 1, 1, 1), ylog = TRUE)

# Create a function to plot coverage
plotCoverage <- function(chr, start, end) {
  # Clear the plot
  plot.new()
  
  # Set up the plot window
  plot.window(c(start, length(chr)), c(0, 10))
  
  # Hide axis labels on the x-axis
  axis(1, labels = FALSE)
  
  # Add a y-axis with no title
  axis(4)
  
  # Plot coverage data
  lines(start:end, log(chr[start:end]), type = "l")
}

# Plot coverage for chr1
plotCoverage(chr1, start = 1, end = length(chr1))

# Plot coverage for chr2
plotCoverage(chr2, start = 1, end = length(chr2))

This code creates a long vector plot with two chromosomes. However, there are several issues with the current plot:

  • There is no y-axis label.
  • The x-axis appears to be limited by the length of chr1 instead of going up to its maximum value.
  • The vectors have different lengths.

Modifying the Plot

To address these issues, we need to make some modifications to our code. Let’s start with adding a y-axis label:

# Add y-axis label
axis(4, label = "Chromosome")

Next, let’s modify the plot window so that it goes up to the maximum value of chr1 instead of its length:

# Set plot limits based on data
plot.window(c(min(chr1), max(chr1)), c(0, 10))

Finally, let’s make sure all the vectors have the same length before plotting them. We can do this by padding the shorter vector with zeros:

# Ensure all vectors are of the same length
if (length(chr2) < length(chr1)) {
  chr2 <- c(chr2, rep(0, length(chr1) - length(chr2)))
}

plotCoverage(chr1, start = 1, end = length(chr1))
plotCoverage(chr2, start = 1, end = length(chr2))

Adding Additional Features

Let’s create a new vector that represents genes of special interest. These vectors will have more zeroes than values compared to the chromosome vectors.

# Create random data for genes
genes_chr1 <- abs(makeRandomTestData(len = 1.3e07))
genes_chr2 <- abs(makeRandomTestData(len = 1e07))

if (length(genes_chr2) < length(chr1)) {
  genes_chr2 <- c(genes_chr2, rep(0, length(chr1) - length(genes_chr2)))
}

Now that we have our gene vectors, let’s add a red dot under the long vector plot for each value greater than zero.

# Add dots for positive values in genes
plotCoverage(chr1, start = 1, end = length(chr1)
for (i in seq_along(genes_chr1)) {
  if (genes_chr1[i] > 0) {
    points(i + 1, genes_chr1[i], pch = 16, col = "red")
  }
}

plotCoverage(chr2, start = 1, end = length(chr2))
for (i in seq_along(genes_chr2)) {
  if (genes_chr2[i] > 0) {
    points(i + 1, genes_chr2[i], pch = 16, col = "red")
  }
}

Using ggplot2 for Alternative Plotting

Alternatively, you can use the ggplot2 package to create similar plots. Here’s an example of how to plot coverage data using geom_area:

# Load required packages
require(ggplot2)

# Construct test data with 3 chromosomes and 100 positions
chr <- rep(paste0("chr", 1:3), each = 100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)

df <- data.frame(chr, pos, cov)

# Create a plot
p <- ggplot(data = df, aes(x = pos, y = cov)) + geom_area(aes(fill = chr))

# Add facetting for different chromosomes
p + facet_wrap(~ chr, ncol = 1)

This code creates an area plot with geom_area using the ggplot2 package. The fill color is based on the chromosome name.

Conclusion

In this article, we explored how to create long vector plots with coverage data for chromosomes in R. We went through the process of creating a new plot from scratch and modifying its appearance as needed. Additionally, we showed an alternative approach using ggplot2.


Last modified on 2023-09-09