Ranking Search Results in Postgres

=====================================================

Introduction

Postgres is a powerful open-source relational database management system that supports various data types and querying mechanisms. In this article, we’ll explore how to rank search results based on relevance while giving precedence to exact matches.

We’ll use an example of a compound database with two columns: compound_name and compound_synonym. We’ll create a vector column using the tsvector type and set up an index for efficient querying. Our goal is to modify the ranking system to prioritize exact matches in the compound_name column.

Background

The tsvector data type allows us to store and query text data with weights, enabling more precise matching and ranking. We’ll create a vector column by concatenating the compound_name and compound_synonym columns using the coalesce function to ensure that empty values are handled correctly.

-- Create a vector column for compound_name and compound_synonym
ALTER TABLE compound
ADD COLUMN document_vector_weights tsvector;

UPDATE compound
SET document_vector_weights = setweight(to_tsvector(coalesce(cmpdname, '')), 'A') ||
    setweight(to_tsvector(cmpdsynonym), 'D');

CREATE INDEX document_weights_index
ON compound (document_vector_weights);

Querying with Weighted Ranking

To achieve weighted ranking based on relevance, we’ll use the ts_rank function. By default, this function assigns weights to terms using a set of predefined values (A=1, B=0.75, C=0.5, D=0). However, for our specific problem, we need to adjust these values.

Adjusting Weighted Ranking

We’ll modify the ranking system by altering the default values of A, B, C, and D. In this case, we want to prioritize exact matches in the compound_name column. To achieve this, we’ll reduce the value of D to 0, which will make compound names have the most weight.

Here’s an example query that demonstrates how to adjust weighted ranking:

-- Rank search results based on relevance with adjusted weights
SELECT *, ts_rank('{0,0,0.10,1}', document_vector_weights, plainto_tsquery('propylene')) AS rank_a
FROM compound
WHERE document_vector_weights @@ plainto_tsquery('propylene')
ORDER BY rank_a DESC;

Handling Exact Matches

To ensure that exact matches in the compound_name column receive top priority, we’ll use a combination of regular expressions and weighted ranking. We’ll adjust the query to match only exact words using the plainto_tsquery function.

-- Match compound names exactly with adjusted weights
SELECT *, ts_rank('{0,0,0.10,1}', document_vector_weights, plainto_tsquery('\\bpropylene\\b')) AS rank_a
FROM compound
WHERE document_vector_weights @@ plainto_tsquery('\\bpropylene\\b')
ORDER BY rank_a DESC;

Additional Considerations

When fine-tuning your ranking system, keep the following points in mind:

Adjust weight values: Adjust A, B, C, and D values to suit your specific use case.
Regular expressions: Use regular expressions (regex) for exact word matching and handling edge cases.
Weight distribution: Balance weights to ensure relevance is maintained while prioritizing compound names.

Conclusion

Postgres provides an effective way to rank search results based on relevance. By creating a vector column and adjusting weighted ranking, we can prioritize compound names in the database. By understanding how Postgres works and using techniques like regex matching, you can create robust ranking systems for your specific use case.

Last modified on 2024-10-17