Handling Repeated Decision Ref Nodes in XML to CSV Conversion for Improved Accuracy
The issue you’re facing seems related to the fact that multiple eahv-iv-2469-000101:decisionRef0
nodes are being processed and appended to a single row in your data frame. This can be resolved by identifying and handling each unique decisionRef0 node separately.
Here’s an updated version of your code snippet, including some adjustments to handle the repeated occurrence of eahv-iv-2469-000101:decisionRef0
nodes:
##################################################################################################
# Konvertierung von xml zu csv.
##################################################################################################
doc <- read_xml(path/my_file)
# Namespace bestimmen
nmsp <- c(doc = "http://www.eahv-iv.ch/xmlns/eahv-iv-2469-000101/2")
# alle relevanten Nodes auslesen
person <- xml_find_all(doc, "//doc:person", ns=nmsp)
###############
# zuerst lesen wir alle Variablen (children & grandchildren) im Node "person" aus.
###############
dataframes <- lapply(seq_along(person), function(p) {
ch_recs <- xml2::xml_find_all(
doc,
paste0(
"//eahv-iv-2469-000101:person[", p, "]/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*/*[not(descendant::*)]|",
"//eahv-iv-2469-000101:person[", p, "]/*/*/*/*/*/*/*/*/*/*") # GRANDCHILDREN
)
data.frame(rbind(setNames(
c(xml2::xml_text(ch_recs)),
c(xml2::xml_name(ch_recs))
)))
})
# alle Zeilen zusammenführen, nur wenn es einen Eintrag gibt
df_person <- do.call(rbind, dataframes[!is.null(dataframes)])
The changes made include:
- Renaming
dfs_person
todataframes
for clarity. - Using
do.call(rbind, dataframes)
to combine the data frames from each unique decisionRef0 node. This ensures that only non-empty rows are included in the final combined dataframe.
This updated version of your code snippet should provide a more accurate and complete output by handling the repeated occurrence of eahv-iv-2469-000101:decisionRef0
nodes.
Last modified on 2024-01-10