Understanding the Problem and Breaking it Down
The problem involves categorizing customers based on their movie rental frequency. We have three categories: Regulars, Weekenders, and Hoi Polloi (a catch-all for those who don’t fit into the other two). To determine these categories, we need to analyze the customer’s rental history.
Table Structure Overview
We are given three tables: Customer
, Movie
, and Rental
. The Rental
table contains information about each rental, including the customer ID, movie ID, rental date, payment date, and amount. We also have a primary key (Rental_id
) for each row in this table.
Current SQL Query
The provided SQL query attempts to categorize customers by looking up their IDs in each week. However, there are some issues with the current query:
select
r.customer_id
, concat(c.first_name, ' ', c.last_name) as Customer_Name
, dayname(r.rental_date) as day_of_rental
, case
when dayname(r.rental_date) in ('Monday','Tuesday','Wednesday','Thursday','Friday')
then 'Regulars'
else 'Weekenders'
end as Customer_Category
from rental r
inner join customer c on r.customer_id = c.customer_id;
Issues with the Current Query
There are a few issues with this query:
- The
dayname
function returns only the first character of the day’s name (e.g., ‘Mon’ for Monday). This means we cannot accurately determine if a rental was on a weekend day without further analysis. - There is no aggregation or grouping to provide a clear picture of each customer’s rental history.
Solution Overview
To solve this problem, we will break it down into several steps:
- Calculate the total number of rentals for each customer.
- Determine the total number of weeks with a rental and overall (including weekends).
- Count the number of weekend days in each week.
- Categorize customers based on their rental frequency.
Step 1: Calculating Total Rentals per Customer
We need to calculate the total number of rentals for each customer using an aggregation function like COUNT(*)
. We will also use the YEARWEEK
function to determine which weeks a customer has rented a movie. Here’s how we can do it:
select
r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id;
This query will give us the total number of rentals, weeks with a rental, overall weeks (including weekends), and weekend rentals for each customer.
Step 2: Categorizing Customers
To categorize customers, we can use a CASE
statement inside an aggregated function. The idea is to look at the threshold values and assign the customer category accordingly:
select
r.customer_id,
(case when num_weeks > 10 and
num_weeks >= num_weeks_overall * 0.9
then 'Regular'
when weekend_rentals >= 0.8 * num_rentals
then 'Weekender'
else 'Hoi Polloi'
end) as category
from (select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id) r;
This query will assign the customer category based on their rental frequency.
Step 3: Executing the Query
We can now execute the query and get the final result:
select
r.customer_id,
(case when num_weeks > 10 and
num_weeks >= num_weeks_overall * 0.9
then 'Regular'
when weekend_rentals >= 0.8 * num_rentals
then 'Weekender'
else 'Hoi Polloi'
end) as category
from (select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date))) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id) r;
This will give us the final result, categorizing customers based on their movie rental frequency.
Conclusion
In this solution, we have broken down the problem of categorizing customers based on their movie rental frequency. We have used SQL queries to calculate the total number of rentals per customer and determine their rental frequency. The CASE
statement is used inside an aggregated function to assign a category based on the threshold values.
Note: This query will return only one row for each customer, since we are grouping by customer_id
.
Last modified on 2024-12-12