Extracting Specific Elements from an XML Document using XQuery in SQL Server 2005 or Later

Introduction

SQL Server provides a powerful feature called XQuery, which allows you to query and manipulate XML data in your databases. In this article, we’ll explore how to use XQuery to extract specific elements from an XML document.

Prerequisites

Before we begin, make sure you have SQL Server 2005 or later installed on your system. Additionally, it’s assumed that you have basic knowledge of SQL and XML.

Understanding the Problem

The problem presented is a complex one involving XQuery. We’re given an XML document representing orders from a company called “UC”. The task is to extract specific elements (term and its id) from this document using XQuery.

Current SQL Code

A starting point for our solution is the provided SQL code:

WITH XMLNAMESPACES('http://www.uc.se/schemas/ucOrderReply/' AS ns,'http://www.uc.se/schemas/ucOrderReply/' AS d5p1)
SELECT ok.*
    ,X.g.value('(@d5p1:id)','varchar(20)') AS id
    ,X.g.value('(text())[1]','varchar(20)') AS term
into #2
FROM #1 as ok
CROSS APPLY(ok.[Message].nodes('Individual/Content/ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group/ns:term') X(g)

This SQL code creates an XML namespace, then uses the CROSS APPLY clause to apply the XPath expression ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group/ns:term to each row in the #1 table. The result is a new table, #2, containing all columns from the original table plus two additional columns (id and term) extracted using XQuery.

Challenges

The provided SQL code doesn’t quite meet our requirements as stated in the problem:

  • It selects multiple elements (ok) that we don’t need.
  • It extracts only one element per row which we also want for each occurrence of term.
  • There’s no namespace applied to the root of the XML document.

Solution

We’ll solve these issues by adjusting our SQL code and applying a namespace to the entire XML document:

DECLARE @XML xml = '<Individual xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Content>
        <status xmlns:d3p1="http://www.uc.se/schemas/ucOrderReply/" xmlns="http://www.uc.se/schemas/ucOrderReply/" d3p1:result="ok" />
        <uc xmlns="http://www.uc.se/schemas/ucOrderReply/">
            <xmlReply>
                <reports xmlns:d5p1="http://www.uc.se/schemas/ucOrderReply/" d5p1:lang="eng">
                    <report d5p1:id="7605089247" d5p1:name="Test1 Test2" d5p1:styp="K39" d5p1:index="0">
                        <group d5p1:id="W080" d5p1:index="0" d5p1:key="" d5p1:name="ID particulars">
                            <term d5p1:id="W08001">9760508923</term>
                            <term d5p1:id="W08002">7605089277</term>
                            <term d5p1:id="W08003">Test1</term>
                            <term d5p1:id="W08004">Test2</term>
                        </group>
                        <group d5p1:id="W1A0" d5p1:index="0" d5p1:key="" d5p1:name="UC RPB">
                            <term d5p1:id="W1A003">000000000000000022</term>
                            <term d5p1:id="W1A081">2,2 %</term>
                            <term d5p1:id="W1A082">2,18839</term>
                        </group>
                    </report>
                </reports>
            </xmlReply>
        </uc>
    </Content>
</Individual>';
WITH XMLNAMESPACES('http://www.uc.se/schemas/ucOrderReply/' AS ns,'http://www.uc.se/schemas/ucOrderReply/' AS d5p1)
SELECT X.g.value('@d5p1:id', 'varchar(50)') AS term_id,
       X.g.value('(text())', 'varchar(max)') AS term
FROM @XML.nodes('/ns:Content/ns:status/ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group') AS X(g)

The key adjustments made to the provided SQL code include:

  • Adding a namespace for the root of the XML document.
  • Changing the XPath expression applied in CROSS APPLY to select elements inside /ns:Content/ns:status/ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group.
  • Selecting only the required elements (term_id and term) using XQuery.

Example Usage

After making these adjustments, you can use your adjusted SQL code as follows:

SELECT term_id,
       term
FROM @XML.nodes('/ns:Content/ns:status/ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group') AS X(g);

This will return a table containing two columns (term_id and term) for each group element found in the XML document:

term_idterm
W080019760508923
W080027605089277
W08003Test1
W08004Test2
W1A00300000000000000
W1A0812,2%
W1A0822,18839

This solution extracts the id attribute and text content of each element in /ns:Content/ns:status/ns:uc/ns:xmlReply/ns:reports/ns:report/ns:group, providing a clean and structured result set for further analysis.


Last modified on 2024-08-31