Understanding the Shapiro-Wilk Test and its Application in Oracle PL/SQL: A Practical Guide to Analyzing Normality with DBMS_STAT_FUNCS

Understanding the Shapiro-Wilk Test and its Application in Oracle PL/SQL

The Shapiro-Wilk test is a statistical method used to determine whether a set of data comes from a normal distribution. In this article, we will explore how to use the Shapiro-Wilk test in Oracle PL/SQL, specifically using the DBMS_STAT_FUNCS.normal_dist_fit procedure.

Introduction to the Shapiro-Wilk Test

The Shapiro-Wilk test is a non-parametric statistical method that uses a rank correlation coefficient to determine whether a set of data comes from a normal distribution. It was first proposed by Samuel Shapiro in 1965 and has since become a widely used statistical method.

The Shapiro-Wilk test works by ranking the data points from smallest to largest and then calculating a correlation coefficient based on these rankings. The correlation coefficient is then compared to a critical value determined by the sample size, which indicates whether the data comes from a normal distribution or not.

Using DBMS_STAT_FUNCS.normal_dist_fit in Oracle PL/SQL

The DBMS_STAT_FUNCS.normal_dist_fit procedure is used to calculate the Shapiro-Wilk test statistic for a given dataset. This procedure takes several parameters:

  • The first parameter specifies the data source, which can be a table or view.
  • The second parameter specifies the column(s) of interest from the data source.
  • The third parameter specifies the name of the output variable, which will contain the Shapiro-Wilk test statistic.

The procedure returns three output variables:

  • mn: the mean of the data
  • sd: the standard deviation of the data
  • sw: the Shapiro-Wilk test statistic

Example Usage of DBMS_STAT_FUNCS.normal_dist_fit

To use the DBMS_STAT_FUNCS.normal_dist_fit procedure, we can follow these steps:

  1. Create a view based on our query that produces a column of numbers.
  2. Use the normal_dist_fit procedure to calculate the Shapiro-Wilk test statistic for the view.

Creating a View

To create a view, we need to have the necessary privileges, including SELECT ANY TABLE and CREATE VIEW. We can then use the following SQL command:

create or replace view my_emp (total_comp)
as
    select salary * (1 + nvl(commission_pct, 0)) from hr.employees;

Invoking the Procedure

Once we have created the view, we can invoke the normal_dist_fit procedure to calculate the Shapiro-Wilk test statistic:

declare
    mn number;
    sd number;
    sw number;
begin
    dbms_stat_funcs.normal_dist_fit('MATHGUY', 'MY_EMP', 'TOTAL_COMP',
                                    'SHAPIRO_WILKS', mn, sd, sw);
end;

W value : .8852586932906502861798487994791857389177;

Discussion of Oracle Documentation

The Oracle documentation for the DBMS_STAT_FUNCS package is incorrect. The procedure normal_dist_fit actually takes three IN parameters: the data source, column(s) of interest from the data source, and the name of the output variable.

However, it does not take two OUT parameters like the mean and standard deviation. Instead, it returns these values as output variables.

Privileges for Creating a View

When creating a view in Oracle, we need to have the necessary privileges, including SELECT ANY TABLE and CREATE VIEW. However, if we are using a role to grant these privileges, we need to ensure that the role is granted directly to us, not through another role.

This is because roles can inherit privileges from parent roles, which may limit our ability to create views.

Conclusion

The Shapiro-Wilk test is a statistical method used to determine whether a set of data comes from a normal distribution. In Oracle PL/SQL, we can use the DBMS_STAT_FUNCS.normal_dist_fit procedure to calculate this test statistic for a given dataset.

However, it’s essential to note that the Oracle documentation for this package is incorrect, and we need to handle the output variables differently than expected.

Additionally, when creating views in Oracle, we must ensure that we have the necessary privileges, including SELECT ANY TABLE and CREATE VIEW, and that these privileges are granted directly to us or through a role.


Last modified on 2024-03-16