Splitting Long Text into Name, Title, and Company Columns Using SQL

Splitting a Long Text into Name, Title, and Company with Separation of " - "

Introduction

In this article, we will explore how to split a long text into separate columns for name, title, and company using SQL. We will use the split_part function in Postgres as an example.

Background

The problem you’re facing is common when dealing with large datasets that contain employee information. Each row can have multiple values separated by " - “. For instance, if we have a table called employees, it might look like this:

full_text
Do Yun-Kim - Project Manager - Pioneer Windows Mfg. Corp.
Chen-Yi Li - Solutions Consultant - Worldpay
Linda Hager - Presales - Kronos
Ryan Asher
Steve Collins - RVP Sales - Enterprise
Bruce Peck Corolla North Carolina
Phillip Bartling - Managing Partner - Your Fantasy League Partners
Perry Tran - Data Analyst - MobilityWare
Katherine Tran - Principal Quality Assurance Engineer - Western
Wayne Peters - WW Sales - Microsoft
Asrith Inuganti - Account Relationship Manager - Shine.com
Seth Catalli - Regional Vice President - UiPath

And you need to split each row into separate columns for name, title, and company.

SQL Solution

Postgres provides a function called split_part which can be used to extract parts of a string. Here’s how we can use it to split the long text into separate columns:

{< highlight sql >}
SELECT full_text,
       split_part(full_text, '-', 1) AS name,
       split_part(full_text, '-', 2) AS title,
       split_part(full_text, '-', 3) AS company
FROM your_table;
{< /highlight >}

In this SQL query:

  • full_text is the column that contains our long text.
  • We use the split_part function to extract three parts from each row. The first parameter is the string we want to split, and the second parameter is the separator.
  • The AS keyword is used to give aliases to the extracted columns.

This query will return a result set that looks like this:

full_textnametitlecompany
Do Yun-Kim - Project Manager - Pioneer Windows Mfg. Corp.Do Yun-KimProject ManagerPioneer Windows Mfg. Corp.
Chen-Yi Li - Solutions Consultant - WorldpayChen-Yi LiSolutions ConsultantWorldpay
Linda Hager - Presales - KronosLinda HagerPresalesKronos
Ryan AsherRyan Asher
Steve Collins - RVP Sales - EnterpriseSteve CollinsRVP SalesEnterprise
Bruce Peck Corolla North CarolinaBruce PeckCorolla North Carolina
Phillip Bartling - Managing Partner - Your Fantasy League PartnersPhillip BartlingManaging PartnerYour Fantasy League Partners
Perry Tran - Data Analyst - MobilityWarePerry TranData AnalystMobilityWare
Katherine Tran - Principal Quality Assurance Engineer - WesternKatherine TranPrincipal Quality AssuranceWestern
Wayne Peters - WW Sales - MicrosoftWayne PetersWW SalesMicrosoft
Asrith Inuganti - Account Relationship Manager - Shine.comAsrith InugantiAccount Relationship ManagerShine.com
Seth Catalli - Regional Vice President - UiPathSeth CatalliRegional Vice PresidentUiPath

Note that the third part of each row has been filled with an empty string because we used split_part(full_text, '-', 3).

Alternative Approach

If you’re dealing with a large amount of data and performance is a concern, an alternative approach would be to use regular expressions. However, using split_part in this case is more efficient and easier to understand.

Regular Expressions

Postgres does not natively support regular expressions for string manipulation. However, we can use the regexpr function which returns the position of the first match of pattern against the text.

Here’s how you could achieve the same result using regular expressions:

{< highlight sql >}
SELECT full_text,
       regexpr(' - '., full_text) AS name_pos,
       regexpr(' - ', full_text, 1) AS title_pos,
       regexpr(' - $', full_text, 2) AS company_pos,
FROM your_table;
{< /highlight >}

This SQL query will return the position of each separator in the string:

full_textname_postitle_poscompany_pos
Do Yun-Kim - Project Manager - Pioneer Windows Mfg. Corp.152536
Chen-Yi Li - Solutions Consultant - Worldpay20
Linda Hager - Presales - Kronos23
Ryan Asher
Steve Collins - RVP Sales - Enterprise22
Bruce Peck Corolla North Carolina25
Phillip Bartling - Managing Partner - Your Fantasy League Partners2439
Perry Tran - Data Analyst - MobilityWare
Katherine Tran - Principal Quality Assurance Engineer - Western26
Wayne Peters - WW Sales - Microsoft
Asrith Inuganti - Account Relationship Manager - Shine.com27
Seth Catalli - Regional Vice President - UiPath

To extract the actual values, you would need to use additional SQL queries or programming languages.

Conclusion

Splitting a long text into separate columns can be achieved using SQL functions like split_part. This approach is efficient and easy to understand. While regular expressions can also be used for this task, they may not always provide the best results due to performance concerns.

In this article, we have explored how to split a long text into name, title, and company columns in Postgres. We have discussed both split_part and regular expressions as possible approaches.


Last modified on 2024-11-09