Find MX Records in large batches with Python

Python

In this article we will explore how to quickly find the MX records of a large amount of domains using Python. In recent months we have often found ourselves turning to Python to accomplish a variety of tasks – in our case mostly to quickly process and manipulate data. Python is a great language, easy and fun to learn but insanely powerful. You can also extend the capabilities of Python with numerous libraries out there. I will put references to some good learning resources at the end of the article.

MX Records

Mail Exchange (MX) records are DNS records that allow mail to be routed to the right receiving server. When a mail client (like Outlook or Android Mail or even a web client) wants to send email to your address it has to ‘know’ where your mailbox resides. This could be Exchange Online (the best choice by the way, wink wink) or any other service.

By manually analyzing the MX record for a domain with a tool like dig, nslookup or online tools like mxtoolbox you can also find out where the mailbox resides. At Kernel IT we often do this to validate a cutover. This is the last step for example if migrate accounts from G-Suite to Exchange Online.

Coding and Infrastructure

Another interesting trend we notice in our projects is that the lines between infrastructure and application development have blurred. Even if you’re a hardcore engineer who likes physical networking and servers is now necessary to read-out or construct JSON, XML, Powershell scripts or Windows Batch files. I mention all those in one line but be assured that they are wildly different tools and objects.

But back to the Python script:

 

import dns.resolver
import pandas as pd
import time
import sys

pd.__version__

dns.resolver.default_resolver = dns.resolver.Resolver(configure=False)
dns.resolver.default_resolver.nameservers = ['8.8.8.8']


filename = 'emails.txt'
#loading the file. This is just a simple file with email addresses consecutive lines
try:
    print ("Trying to open file ", filename)
    with open(filename) as f:
        domains = [line.rstrip() for line in f]
except:
    print("Error while loading", filename)
    sys.exit("IO error")
else:
    print (len(domains), "addresses loaded...starting mx lookup.\n\n")
    
time.sleep(1)
    
mxRecords = []
emailAddresses = []


#we use domain.split("@",1)[1] to seperate the domain from the email addresses
#the try-catch is necessary to avoid stopping th execution when a lookup fails.
for domain in domains:
    try:
        answers = dns.resolver.query(domain.split("@",1)[1], 'MX')
    except:
        print ("some error")
        mxRecord = "some error"
    else:
        
        mxRecord = answers[0].exchange.to_text()
    finally:
        mxRecords.append(mxRecord)
        emailAddresses.append(domain)
        print (domain)
        time.sleep(.200)

#a 200 ms pause is added for good measure
#the rest of the program uses pandas to export everything neatly to CSV. It takes to lists "mxRecords" and "emailAddresses" and converts it to a dataframe.

df = pd.DataFrame({"EmailAddress":emailAddresses,
                  "MXRecords":mxRecords})

print ("\n", str(len(emailAddresses)), "records processed") 

df.to_csv(filename, index=False)

The most interesting part of this project was the library “dnspython”. This library does all the lookups and you don’t have to worry about system calls.

Input:

info@kernel.sr
info@amazon.com
info@google.com

 

Output:

EmailAddress,MXRecords
info@kernel.sr,kernel-sr.mail.protection.outlook.com.
info@amazon.com,amazon-smtp.amazon.com.
info@google.com,aspmx.l.google.com.

 

So in conclusion, Python is a fantastic language for data processing with intuitive constructions. If you combine this with existing libraries you can easily build powerful applications.

References:

Posted in Data Processing, Infrastructure and tagged .

Leave a Reply

Your email address will not be published. Required fields are marked *