Long Vector-Plot/Coverage Plot in R
In this article, we will explore how to create a long vector plot with coverage data for chromosomes. We’ll go through the process of creating the plot, modifying its appearance, and adding additional features such as genes that are of special interest.
Introduction to HilbertVis
HilbertVis is a package in R that allows you to create long vector plots. These plots are particularly useful for displaying large datasets with a varying number of data points along each axis. The plotLongVector()
function from the HilbertVis package provides an easy way to visualize these types of data.
Setting up the Environment
Before we begin, make sure you have R and the necessary packages installed on your system. You can install the required packages using the following command:
# Install required packages
install.packages("HilbertVis")
Once you’ve installed the package, load it into your R environment:
# Load HilbertVis package
library(HilbertVis)
Creating the Initial Plot
Let’s start by creating a long vector plot using the plotLongVector()
function. We’ll create two vectors, one for each chromosome.
# Create random data for chromosomes
chr1 <- abs(makeRandomTestData(len = 1.3e07))
chr2 <- abs(makeRandomTestData(len = 1e07))
# Set up the plot
par(mfcol = c(8, 1), mar = c(1, 1, 1, 1), ylog = TRUE)
# Create a function to plot coverage
plotCoverage <- function(chr, start, end) {
# Clear the plot
plot.new()
# Set up the plot window
plot.window(c(start, length(chr)), c(0, 10))
# Hide axis labels on the x-axis
axis(1, labels = FALSE)
# Add a y-axis with no title
axis(4)
# Plot coverage data
lines(start:end, log(chr[start:end]), type = "l")
}
# Plot coverage for chr1
plotCoverage(chr1, start = 1, end = length(chr1))
# Plot coverage for chr2
plotCoverage(chr2, start = 1, end = length(chr2))
This code creates a long vector plot with two chromosomes. However, there are several issues with the current plot:
- There is no y-axis label.
- The x-axis appears to be limited by the length of
chr1
instead of going up to its maximum value. - The vectors have different lengths.
Modifying the Plot
To address these issues, we need to make some modifications to our code. Let’s start with adding a y-axis label:
# Add y-axis label
axis(4, label = "Chromosome")
Next, let’s modify the plot window so that it goes up to the maximum value of chr1
instead of its length:
# Set plot limits based on data
plot.window(c(min(chr1), max(chr1)), c(0, 10))
Finally, let’s make sure all the vectors have the same length before plotting them. We can do this by padding the shorter vector with zeros:
# Ensure all vectors are of the same length
if (length(chr2) < length(chr1)) {
chr2 <- c(chr2, rep(0, length(chr1) - length(chr2)))
}
plotCoverage(chr1, start = 1, end = length(chr1))
plotCoverage(chr2, start = 1, end = length(chr2))
Adding Additional Features
Let’s create a new vector that represents genes of special interest. These vectors will have more zeroes than values compared to the chromosome vectors.
# Create random data for genes
genes_chr1 <- abs(makeRandomTestData(len = 1.3e07))
genes_chr2 <- abs(makeRandomTestData(len = 1e07))
if (length(genes_chr2) < length(chr1)) {
genes_chr2 <- c(genes_chr2, rep(0, length(chr1) - length(genes_chr2)))
}
Now that we have our gene vectors, let’s add a red dot under the long vector plot for each value greater than zero.
# Add dots for positive values in genes
plotCoverage(chr1, start = 1, end = length(chr1)
for (i in seq_along(genes_chr1)) {
if (genes_chr1[i] > 0) {
points(i + 1, genes_chr1[i], pch = 16, col = "red")
}
}
plotCoverage(chr2, start = 1, end = length(chr2))
for (i in seq_along(genes_chr2)) {
if (genes_chr2[i] > 0) {
points(i + 1, genes_chr2[i], pch = 16, col = "red")
}
}
Using ggplot2 for Alternative Plotting
Alternatively, you can use the ggplot2
package to create similar plots. Here’s an example of how to plot coverage data using geom_area
:
# Load required packages
require(ggplot2)
# Construct test data with 3 chromosomes and 100 positions
chr <- rep(paste0("chr", 1:3), each = 100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
# Create a plot
p <- ggplot(data = df, aes(x = pos, y = cov)) + geom_area(aes(fill = chr))
# Add facetting for different chromosomes
p + facet_wrap(~ chr, ncol = 1)
This code creates an area plot with geom_area
using the ggplot2
package. The fill color is based on the chromosome name.
Conclusion
In this article, we explored how to create long vector plots with coverage data for chromosomes in R. We went through the process of creating a new plot from scratch and modifying its appearance as needed. Additionally, we showed an alternative approach using ggplot2
.
Last modified on 2023-09-09