Understanding the Shapiro-Wilk Test and its Application in Oracle PL/SQL
The Shapiro-Wilk test is a statistical method used to determine whether a set of data comes from a normal distribution. In this article, we will explore how to use the Shapiro-Wilk test in Oracle PL/SQL, specifically using the DBMS_STAT_FUNCS.normal_dist_fit
procedure.
Introduction to the Shapiro-Wilk Test
The Shapiro-Wilk test is a non-parametric statistical method that uses a rank correlation coefficient to determine whether a set of data comes from a normal distribution. It was first proposed by Samuel Shapiro in 1965 and has since become a widely used statistical method.
The Shapiro-Wilk test works by ranking the data points from smallest to largest and then calculating a correlation coefficient based on these rankings. The correlation coefficient is then compared to a critical value determined by the sample size, which indicates whether the data comes from a normal distribution or not.
Using DBMS_STAT_FUNCS.normal_dist_fit in Oracle PL/SQL
The DBMS_STAT_FUNCS.normal_dist_fit
procedure is used to calculate the Shapiro-Wilk test statistic for a given dataset. This procedure takes several parameters:
- The first parameter specifies the data source, which can be a table or view.
- The second parameter specifies the column(s) of interest from the data source.
- The third parameter specifies the name of the output variable, which will contain the Shapiro-Wilk test statistic.
The procedure returns three output variables:
mn
: the mean of the datasd
: the standard deviation of the datasw
: the Shapiro-Wilk test statistic
Example Usage of DBMS_STAT_FUNCS.normal_dist_fit
To use the DBMS_STAT_FUNCS.normal_dist_fit
procedure, we can follow these steps:
- Create a view based on our query that produces a column of numbers.
- Use the
normal_dist_fit
procedure to calculate the Shapiro-Wilk test statistic for the view.
Creating a View
To create a view, we need to have the necessary privileges, including SELECT ANY TABLE
and CREATE VIEW
. We can then use the following SQL command:
create or replace view my_emp (total_comp)
as
select salary * (1 + nvl(commission_pct, 0)) from hr.employees;
Invoking the Procedure
Once we have created the view, we can invoke the normal_dist_fit
procedure to calculate the Shapiro-Wilk test statistic:
declare
mn number;
sd number;
sw number;
begin
dbms_stat_funcs.normal_dist_fit('MATHGUY', 'MY_EMP', 'TOTAL_COMP',
'SHAPIRO_WILKS', mn, sd, sw);
end;
W value : .8852586932906502861798487994791857389177;
Discussion of Oracle Documentation
The Oracle documentation for the DBMS_STAT_FUNCS
package is incorrect. The procedure normal_dist_fit
actually takes three IN parameters: the data source, column(s) of interest from the data source, and the name of the output variable.
However, it does not take two OUT parameters like the mean and standard deviation. Instead, it returns these values as output variables.
Privileges for Creating a View
When creating a view in Oracle, we need to have the necessary privileges, including SELECT ANY TABLE
and CREATE VIEW
. However, if we are using a role to grant these privileges, we need to ensure that the role is granted directly to us, not through another role.
This is because roles can inherit privileges from parent roles, which may limit our ability to create views.
Conclusion
The Shapiro-Wilk test is a statistical method used to determine whether a set of data comes from a normal distribution. In Oracle PL/SQL, we can use the DBMS_STAT_FUNCS.normal_dist_fit
procedure to calculate this test statistic for a given dataset.
However, it’s essential to note that the Oracle documentation for this package is incorrect, and we need to handle the output variables differently than expected.
Additionally, when creating views in Oracle, we must ensure that we have the necessary privileges, including SELECT ANY TABLE
and CREATE VIEW
, and that these privileges are granted directly to us or through a role.
Last modified on 2024-03-16