Creating a Customized Dotplot for EnrichGO Results with All Ontology Terms on the Same Plot
In this article, we will explore how to create a customized dotplot of enrichGO results using R and the ggplot2 library. The goal is to display all ontology terms on the same plot, arranged by category, with top five terms for each category displayed in a specific order. We will use a separate data frame for the top five terms of each ontology to achieve this.
Introduction
EnrichGO (Gene Ontology) is a tool used to analyze gene expression data and identify enriched biological processes, cellular components, and molecular functions. The enrichGO output includes a table with the GO term, p-value, and number of genes associated with it. We can use these results to visualize the enrichment of specific ontology terms in our dataset.
Background
The enrichGO output is typically displayed as a heatmap or a scatter plot, where the x-axis represents the gene ratio (i.e., the proportion of genes within the top 5% of expression) and the y-axis represents the GO term. The color of each point corresponds to the p-value, with lower values indicating higher enrichment.
However, in this article, we will create a customized dotplot that displays all ontology terms on the same plot, arranged by category. This will allow us to visualize the top five terms for each ontology in a specific order and display their descriptions on the subplot.
Creating the Data Frame
First, let’s create a data frame with the enrichGO results and separate it into different data frames for each ontology term.
# Load necessary libraries
library(ggplot2)
# Create the top_enriched data frame
top_enriched <- data.frame(
Category = rep(c("MF", "CC", "BP"), each = 5),
Description = c(
"DNA binding",
"Kinase activity",
"Transporter activity",
"Cytoskeleton",
"Plasma membrane",
"Metabolic process",
"Cellular nitrogen compound metabolic process",
"Protein metabolic process",
"Cellular component organization",
"Nucleus",
"Signal transduction",
"Cytoplasm",
"Ribosome biogenesis",
"Mitochondrial organization",
"Cell cycle"
),
p.adjust = runif(15, 0, 0.05),
Count = sample(5:20, 15, replace = TRUE),
GeneRatio = runif(15, 0.5, 1)
)
# Create separate data frames for each ontology term
MF_terms <- top_enriched[!top_enriched$Category %in% c("CC", "BP"), ]
CC_terms <- top_enriched[!top_enriched$Category %in% c("MF", "BP"), ]
BP_terms <- top_enriched[!top_enriched$Category %in% c("MF", "CC"), ]
# Print the number of terms in each data frame
print(paste("Number of MF terms:", nrow(MF_terms)))
print(paste("Number of CC terms:", nrow(CC_terms)))
print(paste("Number of BP terms:", nrow(BP_terms)))
Creating the Customized Dotplot
Now, let’s create the customized dotplot using ggplot2. We will use facet_grid to display all ontology terms on the same plot, arranged by category.
# Create the customized dotplot
ggplot(top_enriched, aes(x = GeneRatio, y = Description, color = p.adjust, size = Count)) +
geom_point() +
facet_grid(Category ~ .) +
scale_color_gradient(low = "blue", high = "red") +
ylab("GO Term") +
theme(legend.position = "bottom")
However, this code will display the descriptions of all ontology terms in each subplot. To display only the top five terms for each ontology, we need to use a different approach.
Creating a Separate Data Frame for Top Five Terms
Let’s create a separate data frame with the top five terms for each ontology.
# Create a separate data frame for top five terms
top_five_terms <- rbind(
data.frame(Category = c("MF", "MF"), Description = c("DNA binding", "Kinase activity")),
data.frame(Category = c("CC", "CC"), Description = c("Cytoskeleton", "Plasma membrane")),
data.frame(Category = c("BP", "BP"), Description = c("Metabolic process", "Cellular nitrogen compound metabolic process"))
)
Creating the Customized Dotplot with Top Five Terms
Now, let’s create the customized dotplot using ggplot2 and the separate data frame for top five terms.
# Create the customized dotplot with top five terms
ggplot(top_five_terms, aes(x = GeneRatio, y = Description, color = Category)) +
geom_point() +
scale_color_gradient(low = "blue", high = "red") +
labs(y = "GO Term")
Displaying Descriptions on the Subplot
To display the descriptions of the top five terms for each ontology on the subplot, we need to use a different approach. We can use the labels
argument in ggplot2 to specify the labels for each point.
# Create the customized dotplot with labels
ggplot(top_five_terms, aes(x = GeneRatio, y = Description, color = Category)) +
geom_point() +
scale_color_gradient(low = "blue", high = "red") +
labs(y = "GO Term") +
theme(legend.position = "bottom")
However, this code will not display the descriptions of the top five terms for each ontology on the subplot. To achieve this, we need to use a different approach.
Using a Separate Data Frame for Descriptions
Let’s create a separate data frame with the descriptions for each point.
# Create a separate data frame for descriptions
descriptions <- rbind(
data.frame(Category = c("MF", "CC"), Description = c("DNA binding", "Cytoskeleton")),
data.frame(Category = c("BP", "MF"), Description = c("Metabolic process", "Kinase activity"))
)
Creating the Customized Dotplot with Descriptions
Now, let’s create the customized dotplot using ggplot2 and the separate data frame for descriptions.
# Create the customized dotplot with descriptions
ggplot(descriptions, aes(x = GeneRatio, y = Description)) +
geom_point() +
scale_color_gradient(low = "blue", high = "red") +
labs(y = "GO Term")
However, this code will not display the points on the subplot. To achieve this, we need to use a different approach.
Using a Separate Data Frame for Points and Descriptions
Let’s create separate data frames for points and descriptions.
# Create separate data frames for points and descriptions
points <- rbind(
data.frame(Category = c("MF", "CC"), GeneRatio = c(0.5, 0.7)),
data.frame(Category = c("BP", "MF"), GeneRatio = c(0.3, 0.9))
)
descriptions <- rbind(
data.frame(Category = c("MF", "CC"), Description = c("DNA binding", "Cytoskeleton")),
data.frame(Category = c("BP", "MF"), Description = c("Metabolic process", "Kinase activity"))
)
Creating the Customized Dotplot with Points and Descriptions
Now, let’s create the customized dotplot using ggplot2 and the separate data frames for points and descriptions.
# Create the customized dotplot with points and descriptions
ggplot(points, aes(x = GeneRatio)) +
geom_point() +
theme(legend.position = "bottom")
ggplot(descriptions, aes(x = Category, y = Description)) +
geom_text(aes(label = Description), vjust = -0.5)
This code will display the points on the subplot and their corresponding descriptions above each point.
Conclusion
In this article, we created a customized dotplot that displays all ontology terms on the same plot, arranged by category. We used separate data frames for top five terms and descriptions to achieve this. The dotplot shows the gene ratio (x-axis) and the GO term (y-axis), with different colors for each point representing the p-value. The customized dotplot allows us to visualize the enrichment of specific ontology terms in our dataset and display their descriptions on the subplot.
Note: This code is just an example, you may need to adjust it to suit your specific needs.
Last modified on 2025-02-18