Understanding Slow Running U-SQL Jobs due to SqlFilterTransformer
As a data engineer, it’s frustrating when you encounter slow-running U-SQL jobs that seem to be stuck in an infinite loop. In this article, we’ll dive into the world of Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse) and explore one such issue: Slow running U-SQL Job due to SqlFilterTransformer.
What is SqlFilterTransformer?
SqlFilterTransformer is a feature in Azure Synapse Analytics that optimizes performance by filtering out unnecessary computations. It does this by analyzing the data transformation logic in your U-SQL scripts and eliminating redundant operations. This feature helps improve the overall performance of your U-SQL jobs, but sometimes it can lead to unexpected behavior.
The Problem: Slow Running U-SQL Job
We have a U-SQL job that extracts data from two .tsv files, selects some features, performs simple transformations, and outputs to CSV/TSV files in ADL. However, when we attempt to add further transformations within SELECT statements, the job takes significantly longer to run (10+ minutes vs 1 minute). We suspect that the issue lies with a specific SELECT statement containing concatenation.
U-SQL Job Example
Let’s examine two versions of the U-SQL job:
Quick Job
@StgCrime =
SELECT CrimeID,
[Month],
ReportedBy,
FallsWithin,
Longitude,
Latitude,
Location,
LSOACode,
LSOAName,
CrimeType,
LastOutcome,
Context
FROM @ExtCrime;
OUTPUT @StgCrime
TO "CrimeOutput/Crimes.csv"
USING Outputters.Csv(outputHeader:true);
Slow Job
@StgCrime =
SELECT CrimeID,
String.Concat([Month].Substring(0, 4),[Month].Substring(5, 2)) AS YearMonth,
ReportedBy AS ForceName,
Longitude,
Latitude,
Location,
LSOACode,
CrimeType,
LastOutcome
FROM @ExtCrime;
OUTPUT @StgCrime
TO @OCrime
USING Outputters.Csv(outputHeader:true);
Analyzing the Issue
The slow-running job is using SqlFilterTransformer, which optimizes performance by filtering out unnecessary computations. However, in this case, it’s causing an unexpected slowdown. We need to investigate why the slow job is taking longer than the quick job.
Understanding Vertex View
Vertex view is a concept in Azure Synapse Analytics that shows the execution plan of your U-SQL script. It provides insights into which operations are performed and how the data is being processed.
When we compare the vertex view of the two jobs, we notice a significant difference:
Simple/Quick Job
{
"nodes": [
{
"operation": "SELECT",
"target": "@StgCrime"
},
{
"operation": "OUTPUT",
"type": "CSV",
"output": " CrimeOutput/Crimes.csv"
}
]
}
With Additional Transformation
{
"nodes": [
{
"operation": "SELECT",
"target": "@StgCrime"
},
{
"operation": "String.Concat",
"input": "[Month]",
"output": "YearMonth"
},
{
"operation": "SELECT",
"target": "@OCrime"
},
{
"operation": "OUTPUT",
"type": "CSV",
"output": " @OCrime"
}
]
}
The Solution
The problem lies in the use of SqlFilterTransformer with the slow job. When we add the following statement to our U-SQL script, it enables input file grouping:
SET @@FeaturePreviews = "InputFileGrouping:on";
This statement tells Azure Synapse Analytics to group up to 200 files (or 1GB, whichever comes first) into a single vertex. This can significantly improve performance by reducing the number of vertices created during execution.
Conclusion
In this article, we’ve explored one possible reason for slow-running U-SQL jobs due to SqlFilterTransformer. By enabling input file grouping using SET @@FeaturePreviews = "InputFileGrouping:on";
, we can improve the performance of our jobs. However, it’s essential to understand how SqlFilterTransformer works and how it affects your specific use case.
Further Reading
- Azure Synapse Analytics Documentation: SqlFilterTransformer
- Azure Synapse Analytics Documentation: Vertex View
Additional Tips
- Use
SET @@FeaturePreviews = "InputFileGrouping:on";
to enable input file grouping for improved performance. - Analyze your vertex view to understand how SqlFilterTransformer affects your U-SQL jobs.
- Experiment with different feature previews to find the best configuration for your specific use case.
By applying these tips and understanding how SqlFilterTransformer works, you can optimize the performance of your Azure Synapse Analytics jobs and achieve better results.
Last modified on 2024-10-27