Generate Sanitized Email Hash (as Integer)

James Moberg - Mar 21 '23 - - Dev Community

While reviewing the logs of failed contact form submissions, I identified a couple email address variations that were exploiting some Gmail features in an attempt to bypass our filters. (Gmail has a "plus" feature and ignores periods in addresses.) A SQL query using REPLACE to remove all periods revealed that this comment form spammer had performed 279 attempts using 162 variations of their 15 character gmail username in an effort to circumvent our filters. We log the full email address that was posted and, when matching via SQL solely using the email addresses, it appeared as each email address was only used 2-4 times... versus the 279 obfuscated attempts.

To better identify & highlight abusers via SQL queries, an EmailHash (INT) column has been added to the database table. When searching or logging the email address, the value is sanitized (remove + string and . from the username) and then a java hashCode is generated. Using integers to join database records is much faster than using varchar and has lower storage requirements.

While there's a very slight risk of collision, I'm not aware of any other numeric hashing solution. I've tested this on multiple ColdFusion/Lucee installations and Java's hashCode has returned consistent results regardless of platform and/or JDK. I've used hashCode in some other ColdFusion applications and even have some other blog posts regarding it:

Here's the source code. Enjoy!

https://gist.github.com/JamoCA/b02a8e86f8f082b28ecb494d910e092d

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .