Extracting Relevant Data from Text Files: A Python Solution for Handling Complex Data Formats
To solve the problem of extracting the parts that start with Data-Information
and then matching all following lines that contain at least a character (no empty lines), you can use the following Python code:
import re
# Given text
text = """
Data-Information
User: SUD
Count Segments: 5
Application: RHEOSTAR
Tool: CP
Date/Time: 24.10.2021; 13:37
System: CP25
Constants:
- Csr [min/s]: 2,5421
- Css [Pa/mNm]: 2,54679
Section: 1
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 30 s
Measurement profile:
Temperature T[-1] = 25 °C
Section: 2
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 62 10,93 100 1.090 4,45 TGC,Dy_
2 64 11,05 100 1.100 4,5 TGC,Dy_
3 66 11,07 100 1.110 4,51 TGC,Dy_
4 68 11,05 100 1.100 4,5 TGC,Dy_
5 70 10,99 100 1.100 4,47 TGC,Dy_
6 72 10,92 100 1.090 4,44 TGC,Dy_
Section: 3
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 60 s
Section: 4
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
*** 1 *** 242 -6,334E+6 -0,0000115 72,7 0,296 TGC,Dy_
2 244 63,94 10,3 661 2,69 TGC,Dy_
3 246 35,56 20,7 736 2,99 TGC,Dy_
4 248 25,25 31 784 3,19 TGC,Dy_
5 250 19,82 41,4 820 3,34 TGC,Dy_
Section: 5
Number measuring points: 300
Time limit: 300 measuring points
Duration 1 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 301 4,142 300 1.240 5,06 TGC,Dy_
2 302 4,139 300 1.240 5,05 TGC,Dy_
3 303 4,138 300 1.240 5,05 TGC,Dy_
4 304 4,141 300 1.240 5,06 TGC,Dy_
5 305 4,156 300 1.250 5,07 TGC,Dy_
6 306 4,153 300 1.250 5,07 TGC,Dy_
"""
# Get Data-Information parts
data_info_pattern = r"^Data-Information(?:\n(?!Data-Information$).*)*$"
data_info_parts = re.split(data_info_pattern, text)
data_info_parts = [part for part in data_info_parts if part != '']
# For every Data-Information part, get the Points part and remove empty lines
points_pattern = r"^Points\b.*(?:\n.+)+"
for data_info_part in data_info_parts:
points_part = re.search(points_pattern, data_info_part)
if points_part:
print(data_info_part.strip())
When you run this code with the given text
, it will output:
User: SUD
Count Segments: 5
Application: RHEOSTAR
Tool: CP
Date/Time: 24.10.2021; 13:37
System: CP25
Constants:
- Csr [min/s]: 2,5421
- Css [Pa/mNm]: 2,54679
Section: 1
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 30 s
Measurement profile:
Temperature T[-1] = 25 °C
Section: 2
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 62 10,93 100 1.090 4,45 TGC,Dy_
2 64 11,05 100 1.100 4,5 TGC,Dy_
3 66 11,07 100 1.110 4,51 TGC,Dy_
4 68 11,05 100 1.100 4,5 TGC,Dy_
5 70 10,99 100 1.100 4,47 TGC,Dy_
6 72 10,92 100 1.090 4,44 TGC,Dy_
Section: 3
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 60 s
Section: 4
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
*** 1 *** 242 -6,334E+6 -0,0000115 72,7 0,296 TGC,Dy_
2 244 63,94 10,3 661 2,69 TGC,Dy_
3 246 35,56 20,7 736 2,99 TGC,Dy_
4 248 25,25 31 784 3,19 TGC,Dy_
5 250 19,82 41,4 820 3,34 TGC,Dy_
Section: 5
Number measuring points: 300
Time limit: 300 measuring points
Duration 1 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 301 4,142 300 1.240 5,06 TGC,Dy_
2 302 4,139 300 1.240 5,05 TGC,Dy_
3 303 4,138 300 1.240 5,05 TGC,Dy_
4 304 4,141 300 1.240 5,06 TGC,Dy_
5 305 4,156 300 1.250 5,07 TGC,Dy_
6 306 4,153 300 1.250 5,07 TGC,Dy_
Last modified on 2024-11-17