Basic guide to understanding Regex

Kevin Huang
5 min readNov 7, 2020

--

It’s been a little more than a month since I started my time at Flatiron, and like many of beginners I always end up at stackoverflow.com whenever I encounter a problem I have with my code. Most of the time I come across some crazy expression on stackoverflow, and I’m sitting here looking at it trying to make any sense of out it. One particular expression was an expression to validate an email. This is what it looked like

I’m going try and break this somewhat-looking complicated regex down given my basic knowledge in Regex.

What is Regex?

Regex is shorthand for Regular Expressions and its purpose is to help you find specific patterns inside a string, and it is not only for Ruby. Many coding languages uses Regex.

In ruby, to define a regex is by putting an expression between two forward slashes. Lets consider the following code:

I created a method called ‘contains_a’ that takes a string as an argument. Inside the method contains the expression !!(str =~ /a/). Let’s break this down a bit starting with what’s inside the parenthesis. In this case, I used the =~ operator. What this operator does is, it matches the regular expression with string provided (in this case, it would be ‘apple’) and it will return the index of the first match. The output would be ‘0’ in this case, because the ‘a’ in ‘apple’ is the first letter that matches. The ‘!!’ is a way to convert the return value as a boolean.

However, Regex is way more powerful as you would imagine. You are not limited to just validating a letter. Regex allows you to validate ranges also. If I wanted to check if a string contains any numbers I can simply do the following:

Here are some useful ranges:

[0–9] matches any number from 0–9
[a-z] matches any letter from a to z

Regex also provides shortcuts of these ranges.

using \w is the same as using [0–9 a-z A-Z] including underscore
using \d is the same as [0–9]
using \s will check if there are any white spaces including tabs, and newline

There are many different types of regular expressions and there are too many to list here. Check out a cheatsheet for regular expressions here!

Going back to the email regex

Let’s go back to the original regular expression that was introduced to us in the beginning. We’re going to break it down into blocks so it doesn’t look intimidating at all!

Lets break it down visually so we can take a look at it block by block

/\A[\w+\-.]+@[a-z\d-.]+\.[a-z]+\z/i

Since we know to start an expression, it requires a / and to close it we’ll also need another / . With that in consideration let’s re-write out expression and create some spacing for us to see it more clearer.

The breakdown \A [\w+\-.] +@ [a-z\d-.] +\. [a-z] +\z i

If we take a look at \A according to the cheatsheet, we know that this means that this is the start of the string.

[\w+\-.]

This is the first part of the email. If I had an email “example@domain.com” This is the “example” portion of the email.

The \w means we’re looking for anything that contains a letter, number or underscore. If we add the + quantifier after the \w means we’re looking for one or more occurrence. With \w+ together, we’re looking for a string that has a mixture of letters and numbers.

+@

Again, we would require the + quantifier here because the + means we’re looking for at least 1 @ sign ,and as we all know, you can’t have an email address without the @ sign.

[a-z\d-.]

This block validates the domain portion of the email address. We’re looking for a range of letters and numbers.

+\.

Just like the @ quantifier we’re looking for the period that comes after the domain name. You might be wondering, why is there an escape character before . , that is because we want to explicitly say we’re looking for a literal period. In some cases the . quantifier can be used to match everything. So in this case, we’re explicitly saying we are looking for "." .

\z

Same with \A quantifier the \z quantifier is the end of the string.

So what about the i after the closing block? Well, putting the i at the end is telling regex that the entire block is case-insensitive. Meaning it’ll ignore all lower case and upper case characters.

As a note:

The given regex for an email validator is not the only way to do it. There are plenty of different ways to set up your regex. You can create an email validator in many different ways! You can try to create your own email (or any other )regex here!

In all, I just barely scratched the surface on Regex. As someone who just started learning code, Regex is an amazing tool (and also very complicated and confusing) and I cannot wait to use more in the future. It’ll definitely take me more practice to get use to using it and actually start creating one myself. I highly encourage you to spend some time learning more on it as it’s used a lot in a professional setting. There are a lot of resources out there!

resources:

--

--