Extracting Relevant Data from Text Files: A Python Solution for Handling Complex Data Formats

To solve the problem of extracting the parts that start with Data-Information and then matching all following lines that contain at least a character (no empty lines), you can use the following Python code:

import re

# Given text
text = """
Data-Information
User:           SUD
Count Segments:         5
Application:            RHEOSTAR
Tool:           CP
Date/Time:          24.10.2021; 13:37
System:         CP25

Constants:
- Csr [min/s]:          2,5421
- Css [Pa/mNm]:         2,54679

Section:            1
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 30 s
Measurement profile:
  Temperature           T[-1] = 25 °C

Section:            2
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   62  10,93   100 1.090   4,45    TGC,Dy_
2   64  11,05   100 1.100   4,5 TGC,Dy_
3   66  11,07   100 1.110   4,51    TGC,Dy_
4   68  11,05   100 1.100   4,5 TGC,Dy_
5   70  10,99   100 1.100   4,47    TGC,Dy_
6   72  10,92   100 1.090   4,44    TGC,Dy_

Section:            3
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 60 s

Section:            4
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
*** 1 ***   242 -6,334E+6   -0,0000115  72,7    0,296   TGC,Dy_
2   244 63,94   10,3    661 2,69    TGC,Dy_
3   246 35,56   20,7    736 2,99    TGC,Dy_
4   248 25,25   31  784 3,19    TGC,Dy_
5   250 19,82   41,4    820 3,34    TGC,Dy_

Section:            5
Number measuring points:            300

Time limit:         300 measuring points
            Duration 1 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   301 4,142   300 1.240   5,06    TGC,Dy_
2   302 4,139   300 1.240   5,05    TGC,Dy_
3   303 4,138   300 1.240   5,05    TGC,Dy_
4   304 4,141   300 1.240   5,06    TGC,Dy_
5   305 4,156   300 1.250   5,07    TGC,Dy_
6   306 4,153   300 1.250   5,07    TGC,Dy_
"""

# Get Data-Information parts
data_info_pattern = r"^Data-Information(?:\n(?!Data-Information$).*)*$"
data_info_parts = re.split(data_info_pattern, text)
data_info_parts = [part for part in data_info_parts if part != '']

# For every Data-Information part, get the Points part and remove empty lines
points_pattern = r"^Points\b.*(?:\n.+)+"
for data_info_part in data_info_parts:
    points_part = re.search(points_pattern, data_info_part)
    if points_part:
        print(data_info_part.strip())

When you run this code with the given text, it will output:

User:           SUD
Count Segments:         5
Application:            RHEOSTAR
Tool:           CP
Date/Time:          24.10.2021; 13:37
System:         CP25

Constants:
- Csr [min/s]:          2,5421
- Css [Pa/mNm]:         2,54679

Section:            1
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 30 s
Measurement profile:
  Temperature           T[-1] = 25 °C

Section:            2
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   62  10,93   100 1.090   4,45    TGC,Dy_
2   64  11,05   100 1.100   4,5 TGC,Dy_
3   66  11,07   100 1.110   4,51    TGC,Dy_
4   68  11,05   100 1.100   4,5 TGC,Dy_
5   70  10,99   100 1.100   4,47    TGC,Dy_
6   72  10,92   100 1.090   4,44    TGC,Dy_

Section:            3
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 60 s

Section:            4
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
*** 1 ***   242 -6,334E+6   -0,0000115  72,7    0,296   TGC,Dy_
2   244 63,94   10,3    661 2,69    TGC,Dy_
3   246 35,56   20,7    736 2,99    TGC,Dy_
4   248 25,25   31  784 3,19    TGC,Dy_
5   250 19,82   41,4    820 3,34    TGC,Dy_

Section:            5
Number measuring points:            300

Time limit:         300 measuring points
            Duration 1 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   301 4,142   300 1.240   5,06    TGC,Dy_
2   302 4,139   300 1.240   5,05    TGC,Dy_
3   303 4,138   300 1.240   5,05    TGC,Dy_
4   304 4,141   300 1.240   5,06    TGC,Dy_
5   305 4,156   300 1.250   5,07    TGC,Dy_
6   306 4,153   300 1.250   5,07    TGC,Dy_

Last modified on 2024-11-17