Categorizing Movie Renters Based on Frequency: A Step-by-Step SQL Solution

Understanding the Problem and Breaking it Down

The problem involves categorizing customers based on their movie rental frequency. We have three categories: Regulars, Weekenders, and Hoi Polloi (a catch-all for those who don’t fit into the other two). To determine these categories, we need to analyze the customer’s rental history.

Table Structure Overview

We are given three tables: Customer, Movie, and Rental. The Rental table contains information about each rental, including the customer ID, movie ID, rental date, payment date, and amount. We also have a primary key (Rental_id) for each row in this table.

Current SQL Query

The provided SQL query attempts to categorize customers by looking up their IDs in each week. However, there are some issues with the current query:

select 
   r.customer_id
 , concat(c.first_name, ' ', c.last_name) as Customer_Name
 , dayname(r.rental_date) as day_of_rental
 , case 
     when dayname(r.rental_date) in ('Monday','Tuesday','Wednesday','Thursday','Friday')
     then 'Regulars'
     else 'Weekenders'
   end as Customer_Category
from rental r
inner join customer c on r.customer_id = c.customer_id;

Issues with the Current Query

There are a few issues with this query:

  1. The dayname function returns only the first character of the day’s name (e.g., ‘Mon’ for Monday). This means we cannot accurately determine if a rental was on a weekend day without further analysis.
  2. There is no aggregation or grouping to provide a clear picture of each customer’s rental history.

Solution Overview

To solve this problem, we will break it down into several steps:

  1. Calculate the total number of rentals for each customer.
  2. Determine the total number of weeks with a rental and overall (including weekends).
  3. Count the number of weekend days in each week.
  4. Categorize customers based on their rental frequency.

Step 1: Calculating Total Rentals per Customer

We need to calculate the total number of rentals for each customer using an aggregation function like COUNT(*). We will also use the YEARWEEK function to determine which weeks a customer has rented a movie. Here’s how we can do it:

select 
   r.customer_id,
   count(*) as num_rentals,
   count(distinct yearweek(rental_date)) as num_weeks,
   (to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
   sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id;

This query will give us the total number of rentals, weeks with a rental, overall weeks (including weekends), and weekend rentals for each customer.

Step 2: Categorizing Customers

To categorize customers, we can use a CASE statement inside an aggregated function. The idea is to look at the threshold values and assign the customer category accordingly:

select 
   r.customer_id,
   (case when num_weeks > 10 and 
          num_weeks >= num_weeks_overall * 0.9
     then 'Regular'
     when weekend_rentals >= 0.8 * num_rentals
     then 'Weekender'
     else 'Hoi Polloi'
  end) as category
from (select r.customer_id,
             count(*) as num_rentals,
             count(distinct yearweek(rental_date)) as num_weeks,
             (to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
             sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
      from rental r
      group by r.customer_id) r;

This query will assign the customer category based on their rental frequency.

Step 3: Executing the Query

We can now execute the query and get the final result:

select 
   r.customer_id,
   (case when num_weeks > 10 and 
          num_weeks >= num_weeks_overall * 0.9
     then 'Regular'
     when weekend_rentals >= 0.8 * num_rentals
     then 'Weekender'
     else 'Hoi Polloi'
  end) as category
from (select r.customer_id,
             count(*) as num_rentals,
             count(distinct yearweek(rental_date)) as num_weeks,
             (to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
             sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
      from rental r
      group by r.customer_id) r;

This will give us the final result, categorizing customers based on their movie rental frequency.

Conclusion

In this solution, we have broken down the problem of categorizing customers based on their movie rental frequency. We have used SQL queries to calculate the total number of rentals per customer and determine their rental frequency. The CASE statement is used inside an aggregated function to assign a category based on the threshold values.

Note: This query will return only one row for each customer, since we are grouping by customer_id.


Last modified on 2024-12-12