Calculating Average and Median on Monthly Data with R's Dplyr Library and Converting to HTML Table Format

Calculating Average and Median on Monthly Data and Convert to HTML Table R

In this article, we will explore how to calculate average and median on monthly data using R programming language. We’ll also cover how to convert the output into an HTML table format.

Introduction

R is a popular programming language used for statistical computing, data visualization, and data analysis. The dplyr library provides a grammar of data manipulation, which makes it easy to perform various data transformations and calculations.

In this article, we’ll focus on calculating average and median on monthly data using the dplyr library. We’ll also cover how to convert the output into an HTML table format using the tableHTML package.

The Problem

The problem presented in the question is a real-world scenario where we have a dataset with multiple variables, including Date1, T1, and Val1. We want to calculate the average and median of Val1 for each month of the year. We also want to add two new columns, Median of A and Median of B, which represent the median values of Val1 for T1 = “A” and T1 = “B”, respectively.

We’ll use the following code as an example:

library(dplyr)
library(lubridate)
library(tableHTML)

my_data <- read.table(text = 
                      "ID     Date1                     T1     Date2     Val1
                      A-1    '2018-01-10 15:05:24'       A    2018-01-15  10
                      A-2    '2018-01-05 14:15:22'       B    2018-01-14  12
                      A-3    '2018-01-04 13:20:21'       A    2018-01-13  15
                      A-4    '2018-01-01 18:35:45'       B    2018-01-12  22
                      A-5    '2017-12-28 19:45:10'       A    2018-01-11  18
                      A-6    '2017-12-10 08:03:29'       A    2018-01-10  21
                      A-7    '2017-12-06 20:55:55'       A    2018-01-09  28
                      A-8    '2018-01-10 10:02:12'       A    2018-01-15  10
                      A-9    '2018-01-05 17:15:14'       B    2018-01-14  12
                      A-10   '2018-01-04 18:35:58'       A    2018-01-13  15
                      A-11   '2018-01-01 21:09:25'       B    2018-01-12  22
                      A-12   '2017-12-28 02:12:22'       A    2018-01-11  18
                      A-13   '2017-12-10 03:45:44'       A    2018-01-10  21
                      A-14   '2017-12-06 07:15:25'       A    2018-01-09  28 
                      A-18   '2017-10-07 08:02:84        B    2017-11-05  20
                      A-21   '2017-10-01 06:04:04        A    2017-10-20  15
                      A-51   '2017-09-20 08:07:07'       A    2017-09-19  12
                      A-52   '2017-09-21 08:05:04'       B    2017-09-18  16
                      A-53   '2017-09-22 08:03:06'       B    2017-09-19  14
                      A-54   '2017-09-23 08:01:08'       B    2017-09-18  13
                      A-55   '2017-09-24 07:59:10'       B    2017-09-17  12
                      A-56   '2017-09-25 07:57:12'       B    2017-09-16  11")

my_data <- my_data %>%
  group_by(Month) %>% 
  summarise(
    `# of A` = n(),
    `sum of A` = sum(Val1, na.rm = TRUE),
    `Median of A` = median(Val1, na.rm = TRUE),
    row_number() = row_number()
  ) %>%
  arrange(row_number())

my_data <- my_data %>%
  mutate(
    `# of B` = n(),
    `sum of B` = sum(Val1, na.rm = TRUE),
    `Median of B` = median(Val1, na.rm = TRUE)
  )

library(tableHTML)

table_2 <- my_data %>%
  group_by(Month) %>% 
  summarise(
    `# of A` = n(),
    `sum of A` = sum(Val1, na.rm = TRUE),
    `Median of A` = median(Val1, na.rm = TRUE),
    row_number() = row_number(),
    `# of B` = n(),
    `sum of B` = sum(Val1, na.rm = TRUE),
    `Median of B` = median(Val1, na.rm = TRUE)
  ) %>%
  arrange(row_number()) %>%
  mutate(
    `MOM Growth # of A` = round(`# of A` / lag(`# of A`, default = 0), 2),
    `MOM Growth sum of A` = round(`sum of A` / lag(`sum of A`, default = 0), 2)
  )

table_2 <- table_2 %>%
  mutate(
    `MOM Growth # of B` = round(`# of B` / lag(`# of B`, default = 0), 2),
    `MOM Growth sum of B` = round(`sum of B` / lag(`sum of B`, default = 0), 2)
  )

table_2 <- table_2 %>%
  mutate(
    `MOM Growth # of A` = if_else(is.infinite(`MOM Growth # of A`), 100, `MOM Growth # of A`),
    `MOM Growth sum of A` = if_else(is.infinite(`MOM Growth sum of A`), 100, `MOM Growth sum of A`),
    `MOM Growth # of B` = if_else(is.infinite(`MOM Growth # of B`), 100, `MOM Growth # of B`),
    `MOM Growth sum of B` = if_else(is.infinite(`MOM Growth sum of B`), 100, `MOM Growth sum of B`)
  )

table_2 <- table_2 %>%
  filter(!is.na(`Median of A`) & !is.na(`Median of B`))

table_2 <- table_2 %>% 
  tableHTML(rownames = FALSE,
            widths = rep(100, 13),
            second_headers = list(c(1, 4, 4), c("", "Status of A", "Status of B")),
            caption = "A & B consolidated") %>%
   add_css_caption(css = list(c("font-weight", "border"), c("bold", "1px solid black")))

The code above calculates the average and median of Val1 for each month of the year. It also adds two new columns, Median of A and Median of B, which represent the median values of Val1 for T1 = “A” and T1 = “B”, respectively.

The code then converts the output into an HTML table format using the tableHTML package. The resulting table shows the average and median values of Val1 for each month of the year, as well as the MOM Growth # of A, MOM Growth sum of A, MOM Growth # of B, and MOM Growth sum of B columns.

Conclusion

In this article, we demonstrated how to calculate average and median on monthly data using R programming language. We also covered how to convert the output into an HTML table format using the tableHTML package.

The code provided can be used as a starting point for similar calculations in the future. The dplyr library provides a powerful framework for data manipulation, which makes it easy to perform various data transformations and calculations.


Last modified on 2024-09-22