Custom Email Validation in Python (Function for email validation)

Michael Otu - Feb 28 '22 - - Dev Community

What should we consider a valid email? Consider these emails below to see if they are valid. Would you accept such emails even if they are valid?

1. "very.unusual.@.unusual.com"@example.com
2. "very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com
3. "#!$%&'*+-/=?^_`{}|~@example.org"
4. "()<>[]:,;@"!#$%&\'-/=?^_`{}| ~.a"@example.org
Enter fullscreen mode Exit fullscreen mode

It turns out they are valid but should we accept them? You can also read on it here.

Some Valid Emails

Let's have a look at some valid emails. These emails are from here.

  • prettyandsimple@example.com
  • very.common@example.com
  • disposable.style.email.with+symbol@example.com
  • other.email-with-dash@example.com
  • x@example.com
  • "much.more unusual"@example.com
  • "very.unusual.@.unusual.com"@example.com
  • "very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com
  • example-indeed@strange-example.com
  • admin@mailserver1 (local domain name with no top-level domain)
  • #!$%&'\*+-/=?^\_`{}|~@example.org
  • "()<>[]:,;@\\"!#$%&'-/=?^\_`{}| ~.a"@example.org
  • " "@example.org
  • example@localhost
  • example@s.solutions
  • user@com
  • user@localserver
  • user@[IPv6:2001:db8::1]

Some Invalid Emails

  • Abc.example.com - there is no @ character
  • A@b@c@example.com - only one @ is allowed outside quotation marks
  • a"b(c)d,e:f;gi[j\k]l@example.com - none of the special characters in this local part are allowed outside quotation marks
  • just"not"right@example.com - quoted strings must be dot separated or the only element making up the local part
  • this is"not\allowed@example.com - spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash
  • this\ still\"not\allowed@example.com - even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes
  • john..doe@example.com - there are two dots before @
  • john.doe@example..com - there are two dots after @
  • a valid address with a leading space
  • a valid address with a trailing space

Let's Set Limitations On The Emails

A normal email would look something more like username@domain.com. This email is of three parts.

  • username - account name or the local part
  • @ - at sign
  • domain.com - domain or server name with an extension

Local part

The username part of the email address may use any of these ASCII characters:

  • Uppercase and lowercase English letters - [a-zA-Z]
  • Digits - [0-9]
  • Special Characters - ! # $ % & ' \* + - / = ? ^ \_ ` { | } ~ .
    • We will accept, - _ .
    • The dot character, ., must not the first or last character
    • It must not appear two or more times consecutively
  • all characters will be changed to lower case. So HellO@gmail.com will be the same as hello@gmail.com

The Function

It's time to write some code.

We would start with the function definition:

def is_email_valid(email):
    pass
Enter fullscreen mode Exit fullscreen mode

Allowed Valid Characters

We would accept '-', '_', '.' as valid characters and any other that is not enclosed in a quote is not accepted.

Let's import punctuation and whitespace from the string module.

from string import punctuation, whitespace

def is_email_valid(email):
    valid_chars = {'-', '_', '.'}
    invalid_chars = set(punctuation + whitespace) - valid_chars

Enter fullscreen mode Exit fullscreen mode

No Leading Or Ending Space

We have to strip or trim the email of leading or ending space.

def is_email_valid(email=""):
    ...

    stripped_email = email.strip()

Enter fullscreen mode Exit fullscreen mode

Email Must Have Only One @ Character

An email must have an @ and there must be only one @.

def is_email_valid(email=""):
    ...

    # email must have @
    if "@" not in stripped_email:
        return False

    # there must be one @
    if stripped_email.count("@") != 1:
        return False

Enter fullscreen mode Exit fullscreen mode

Get The Local And Domain

We have to split the email at the @ character to get the local and the domain.

def is_email_valid(email=""):
    ...
    # split the email into local and domain part
    local, domain = stripped_email.split("@")

Enter fullscreen mode Exit fullscreen mode

Invalid characters must be in quotes

Apart from -, _ and . all other characters must be in double quotes in the local.

def is_email_valid(email=""):
    ...
    for char in invalid_chars:
        if (
            char in local
            and (not local.startswith('"')
                 or not local.endswith('"'))):
            return False

Enter fullscreen mode Exit fullscreen mode

No consecutive dots

We will get the first dot then the next.

def is_email_valid(email=""):
    ...
    if "." in local:
        try:
            dot_position_in_local = local.index(".")

            if local[dot_position_in_local + 1] == ".":
                return False
        except:
            return False

Enter fullscreen mode Exit fullscreen mode

Must not start or end with do

The local must not start or end with a dot.

def is_email_valid(email=""):
    ...
    if local.startswith(".") or local.endswith("."):
        return False

Enter fullscreen mode Exit fullscreen mode

Domain Must Not Start Nor End with a Hyphen or Dot

The domain can not have a dot nor hyphen as the first or last character.

def is_email_valid(email):
    ...
    if domain.startswith("-") or domain.endswith("-"):
        return False

    if domain.startswith(".") or domain.endswith("."):
        return False
Enter fullscreen mode Exit fullscreen mode

This is enough

Finally, we can return True since all the validation is checked. This validation would or may fail for some emails. This function is for the sake of education and fun.

def is_email_valid(email):
    ...
    return True
Enter fullscreen mode Exit fullscreen mode

Conclusion

Minimally, this function works around valid_chars = {'-', '_', '.'}. You can say it rather checks for invalid emails. It is okay.

If you can not work hard to move forward, work harder not to move back.

The snippet we now have is:

def is_email_valid(email):
    valid_chars = {'-', '_', '.'}
    invalid_chars = set(punctuation + whitespace) - valid_chars

    stripped_email = email.strip()

    # email must have @
    if "@" not in stripped_email:
        return False

    # there must be one @
    if stripped_email.count("@") != 1:
        return False

    # split the email into local and domain part
    local, domain = stripped_email.split("@")

    for char in invalid_chars:
        if (
            char in local
            and (not local.startswith('"')
                 or not local.endswith('"'))):
            return False

    if "." in local:
        try:
            dot_position_in_local = local.index(".")

            if local[dot_position_in_local + 1] == ".":
                return False
        except:
            return False

    # local.startswith('.') and local.endswith('.') == False
    if local.startswith(".") or local.endswith("."):
        return False

    # domain, - can't be first or last char
    if domain.startswith("-") or domain.endswith("-"):
        return False

    # domain, . cant't be first or last char
    if domain.startswith(".") or domain.endswith("."):
        return False

    # dots in email must not be sequential
    dot_position_in_domain = domain.index(".")

    if "." in domain and (domain[dot_position_in_domain] == domain[dot_position_in_domain + 1]):
        return False

    return True
Enter fullscreen mode Exit fullscreen mode

Watch out for the next two articles on refactoring and testing this function for validating emails.

I have a three-part series on password validation, you can check them out.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .