Handling Repeated Decision Ref Nodes in XML to CSV Conversion for Improved Accuracy

The issue you’re facing seems related to the fact that multiple eahv-iv-2469-000101:decisionRef0 nodes are being processed and appended to a single row in your data frame. This can be resolved by identifying and handling each unique decisionRef0 node separately.

Here’s an updated version of your code snippet, including some adjustments to handle the repeated occurrence of eahv-iv-2469-000101:decisionRef0 nodes:

##################################################################################################
# Konvertierung von xml zu csv.
##################################################################################################


  doc <- read_xml(path/my_file)

  # Namespace bestimmen
  nmsp <- c(doc = "http://www.eahv-iv.ch/xmlns/eahv-iv-2469-000101/2")

  # alle relevanten Nodes auslesen 
  person <- xml_find_all(doc, "//doc:person", ns=nmsp)
  
  ###############
  # zuerst lesen wir alle Variablen (children &amp; grandchildren) im Node "person" aus.
  ###############
  
  dataframes <- lapply(seq_along(person), function(p) {
    ch_recs <- xml2::xml_find_all(
      doc, 
      paste0(
        "//eahv-iv-2469-000101:person[", p, "]/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*/*[not(descendant::*)]|",
        "//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*/*/*")                    # GRANDCHILDREN
      )
    data.frame(rbind(setNames(
      c(xml2::xml_text(ch_recs)), 
      c(xml2::xml_name(ch_recs))
    )))
  })

  # alle Zeilen zusammenführen, nur wenn es einen Eintrag gibt
  df_person <- do.call(rbind, dataframes[!is.null(dataframes)])

The changes made include:

  • Renaming dfs_person to dataframes for clarity.
  • Using do.call(rbind, dataframes) to combine the data frames from each unique decisionRef0 node. This ensures that only non-empty rows are included in the final combined dataframe.

This updated version of your code snippet should provide a more accurate and complete output by handling the repeated occurrence of eahv-iv-2469-000101:decisionRef0 nodes.


Last modified on 2024-01-10