SQL Where String Contains Using MySQL



In MySQL, a query can be performed using an exact match of a string via the != and/or = operators. But what if you want to match only a portion of a string or some other pattern? That’s where the LIKE operator and REGEXP_LIKE() function comes in.

LIKE Operator

The LIKE operator is a case-insensitive pattern matching operator in SQL that allows for wildcards in the match. The associated wildcards are:

  • _ – Underscore will match on any single character. Multiple underscores can be chained together as well. For example, __ will match on any two characters or ___ on any three characters.
  • % – 0 or more of any character.

REGEXP_LIKE() Function

REGEXP_LIKE() accepts three parameters: the value to apply the pattern to (e.g., column, variable, etc.), the regular expression (regex) itself, and an optional match type parameter.

The optional match type argument can be:

  • i - Case-insensitive search.
  • c – Case-sensitive search.
  • m – Multiple-line mode.
  • n – The “.” character matches line terminators.
  • u – Unix-only line endings.

Note that all the details around regular expressions will not be discussed in this article. Rather a more basic example will be shown. If looking for more details, then a good article on this can be found at MySQL - Regex

With some knowledge on what the LIKE operator is and the wildcards that are available in addition to the REGEXP_LIKE() function, we can now dive into an example. The full example will first be shown followed by a breakdown of the individual pieces.

Example

/* Create the tables and add rows */ 
CREATE TABLE TopSearch
(
	Id INT AUTO_INCREMENT,
	Term VARCHAR(100),	
	CONSTRAINT PK_SearchTest PRIMARY KEY (Id)
);

INSERT INTO TopSearch(Term)
VALUES 
	('Activities in April'),
	('List of March sports'),
	('April holidays'),
	('December snow accumulation in Boise Idaho'),
	('Top 10 April recipes'),
	('Trends to follow in March and April'),
	('Top 15 steps to marketing property'),
	('Almost 90% of homeowners are forgetting this'),
	('There has been a nearly 900% increase in this career');

/* Queries */
SELECT Term FROM TopSearch WHERE Term LIKE '%March%';
SELECT Term FROM TopSearch WHERE Term LIKE '%March sport_';
SELECT Term FROM TopSearch WHERE Term LIKE '%April%' AND Term NOT LIKE '%March%';
SELECT Term FROM TopSearch WHERE Term LIKE '%90!%%' ESCAPE '!';
SELECT * FROM TopSearch WHERE REGEXP_LIKE(Term, 'Top [0-9]{2} ', 'i');

/* Cleanup */
DROP TABLE IF EXISTS TopSearch;

Create the test table and add rows

In the first of the script, a table is being created to hold top search terms. Rows are then added to this table. There isn’t much to mention here other than that IDENTITY(1,1) is being used to create an auto-incrementing Id so that it won't have to manually be specified per row.

LIKE Search for “March” anywhere in Term

SELECT Term FROM TopSearch WHERE Term LIKE '%March%';

This query is looking for “March” with any character(s) before or after “March” via the % wildcard. The terms retrieved with this query are:

  • List of March sports
  • Trends to follow in March and April

LIKE Search for “March sport” with any character after sport

SELECT Term FROM TopSearch WHERE Term LIKE '%March sport_';

This query will retrieve any term that includes “March sport” at the end of the term where “sport” must have one character after it. In this case, only one term is retrieved: “List of March sports”.

LIKE Search for all terms containing “April” but not “March”

SELECT Term FROM TopSearch WHERE Term LIKE '%April%' AND Term NOT LIKE '%March%';

This query will retrieve any term that includes “April” but does not include “March”. The % wildcard is used for both the “April” and “March” pattern match but notice that NOT is included for the “March” pattern. NOT LIKE will exclude rather than include all matching items. The matching terms are

  • Activities in April
  • April holidays
  • Top 10 April recipes

LIKE Search using a wildcard with ESCAPE

SELECT Term FROM TopSearch WHERE Term LIKE '%90!%%' ESCAPE '!';

In some cases, you may need to match on a pattern where one or more of the wildcards needs to be used as part of the string rather than as a wildcard. In this example, we’re doing just that by searching for 90% anywhere in the string.

To allow this, we can use an escape character so that % won’t be treated as a wildcard. The escape character being used in this example is !. So, !% will match up to % in the string. The matching Term is “Almost 90% of homeowners are forgetting this”.

Regex Search for “Top” with two numbers and a space

SELECT * FROM TopSearch WHERE REGEXP_LIKE(Term, 'Top [0-9]{2} ', 'i');

This query will retrieve any term that includes “Top” at the beginning followed by a space and two numbers in the 0-9 range. It does so using a regex of “Top [0-9]{2} ”. This regex is looking for a pattern that starts with “Top” and is followed by 2 numbers in the 0-9 range followed by a space.

The “i” parameter in the REGEXP_LIKE function tells the search to run case insensitive. Note that this parameter was not technically needed here as it will be case insensitive by default. “i” was really only added to show how optional match type params can be specified.

Two terms are retrieved from this search:

  • Top 10 April recipes
  • Top 15 steps to marketing property

Cleanup

DROP TABLE IF EXISTS TopSearch;

Lastly, we are cleaning up the TopSearch table by dropping it. There isn’t anything too interesting to mention here.