Spaces:

devusman
/

email

Sleeping

App Files Files Community

devusman commited on Nov 11, 2025

Commit

d455ad5

1 Parent(s): 02c03a4

oyeah

Browse files

Files changed (14) hide show

README.md +37 -19
__pycache__/popular_domains.cpython-311.pyc +0 -0
__pycache__/popular_domains.cpython-313.pyc +0 -0
__pycache__/source_code.cpython-311.pyc +0 -0
__pycache__/source_code.cpython-313.pyc +0 -0
__pycache__/suggestion.cpython-311.pyc +0 -0
__pycache__/suggestion.cpython-313.pyc +0 -0
app.py +205 -0
popular_domains.py +26 -0
requirements.txt +7 -3
source_code.py +152 -0
src/streamlit_app.py +0 -40
style.css +17 -0
suggestion.py +64 -0

README.md CHANGED Viewed

@@ -1,19 +1,37 @@
----
-title: Email
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+# Email Validation Tool
+This is a Streamlit application for email validation. The tool allows users to verify the validity of email addresses and analyze email-related data.
+## Features:
+**Single Email Verification**: Users can enter an email address to check its validity. The tool performs syntax validation, checks MX records, establishes SMTP connection, and checks if the domain is a temporary one. The result is displayed along with key metrics like syntax validation, MX record status, and temporary domain status.
+**Bulk Email Processing**: Users can upload a CSV, XLSX, or TXT file containing a list of email addresses. The tool processes each email in the file and performs validation similar to single email verification. After processing, users can download the results, including email addresses and their validation labels.
+**Domain Information**: For valid email addresses, the tool provides additional domain information such as registrar, server, and country using the WHOIS database.
+## Installation
+To run the Email Validation Tool locally, follow these steps:
+```
+Clone the repository:
+git clone https://github.com/sathish-1804/email_validation_tool.git
+Change the directory to the cloned repository:
+cd email_validation_tool
+Install the required dependencies:
+pip install -r requirements.txt
+```
+To launch the Email Validation Tool, run the following command:
+```
+streamlit run main.py
+```
+The tool will be accessible at http://localhost:8501 in your web browser.
+**Note**:
+- The tool performs email validation using various methods such as syntax validation, MX record checks, and SMTP connection.
+- Bulk email processing allows users to upload a file and process multiple email addresses simultaneously.
+- Domain information retrieval is provided through the WHOIS database for valid email addresses.
+- The tool's functionality can be customized or extended based on specific requirements.
+- Ensure that the data source (CSV, XLSX, or TXT file) contains the necessary columns for processing.

__pycache__/popular_domains.cpython-311.pyc ADDED Viewed

Binary file (2.11 kB). View file

__pycache__/popular_domains.cpython-313.pyc ADDED Viewed

Binary file (2.08 kB). View file

__pycache__/source_code.cpython-311.pyc ADDED Viewed

Binary file (5.72 kB). View file

__pycache__/source_code.cpython-313.pyc ADDED Viewed

Binary file (5.41 kB). View file

__pycache__/suggestion.cpython-311.pyc ADDED Viewed

Binary file (4.5 kB). View file

__pycache__/suggestion.cpython-313.pyc ADDED Viewed

Binary file (3.78 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,205 @@

+import csv
+from tempfile import NamedTemporaryFile
+import shutil
+import pandas as pd
+import source_code as sc
+from suggestion import suggest_email_domain
+import whois
+from popular_domains import emailDomains
+import streamlit as st
+from streamlit_extras.metric_cards import style_metric_cards
+st.set_page_config(
+    page_title="Email verification",
+    page_icon="✅",
+    layout="centered",
+)
+def label_email(email):
+    if not sc.is_valid_email(email):
+        return "Invalid"
+    if not sc.has_valid_mx_record(email.split('@')[1]):
+        return "Invalid"
+    if not sc.verify_email(email):
+        return "Unknown"
+    if sc.is_disposable(email.split('@')[1]):
+        return "Risky"
+    return "Valid"
+def label_emails(input_file):
+    file_extension = input_file.name.split('.')[-1].lower()
+    if file_extension == 'csv':
+        df = process_csv(input_file)
+    elif file_extension == 'xlsx':
+        df = process_xlsx(input_file)
+    elif file_extension == 'txt':
+        df = process_txt(input_file)
+    else:
+        st.warning("Unsupported file format. Please provide a CSV, XLSX, or TXT file.")
+def process_csv(input_file):
+    # Read the uploaded file as a DataFrame
+    if input_file:
+        if isinstance(input_file, str):  # For Streamlit sharing compatibility
+            df = pd.read_csv(input_file, header=None)
+        else:
+            df = pd.read_csv(input_file, header=None)
+        # Create a list to store the results
+        results = []
+        # Process each row in the input DataFrame
+        for index, row in df.iterrows():
+            email = row[0].strip()
+            label = label_email(email)
+            results.append([email, label])
+        # Create a new DataFrame for results
+        result_df = pd.DataFrame(results, columns=['Email', 'Label'])
+        result_df.index = range(1, len(result_df) + 1)  # Starting index from 1
+        return result_df
+    else:
+        return pd.DataFrame(columns=['Email', 'Label'])
+def process_xlsx(input_file):
+    df = pd.read_excel(input_file, header=None)
+    results = []
+    for index, row in df.iterrows():
+        email = row[0].strip()
+        label = label_email(email)
+        results.append([email, label])
+    result_df = pd.DataFrame(results, columns=['Email', 'Label'])
+    result_df.index = range(1, len(result_df) + 1)  # Starting index from 1
+    # Display the results in a table
+    st.dataframe(result_df)
+def process_txt(input_file):
+    input_text = input_file.read().decode("utf-8").splitlines()
+    # Create a list to store the results
+    results = []
+    for line in input_text:
+        email = line.strip()
+        label = label_email(email)
+        results.append([email, label])
+    # Create a DataFrame for the results
+    result_df = pd.DataFrame(results, columns=['Email', 'Label'])
+    result_df.index = range(1, len(result_df) + 1)  # Starting index from 1
+    # Display the results in a table
+    st.dataframe(result_df)
+def main():
+    with open('style.css') as f:
+        st.markdown(f'<style>{f.read()}</style>', unsafe_allow_html=True)
+    st.title("Email Verification Tool", help="This tool verifies the validity of an email address.")
+    st.info("The result may not be accurate. However, it has 90% accuracy.")
+    t1, t2= st.tabs(["Single Email", "Bulk Email Processing"])
+    with t1:
+    # Single email verification
+        email = st.text_input("Enter an email address:")
+        if st.button("Verify"):
+            with st.spinner('Verifying...'):
+                result = {}
+                # Syntax validation
+                result['syntaxValidation'] = sc.is_valid_email(email)
+                if result['syntaxValidation']:
+                    domain_part = email.split('@')[1] if '@' in email else ''
+                    if not domain_part:
+                        st.error("Invalid email format. Please enter a valid email address.")
+                    else:
+                        # Additional validation for the domain part
+                        if not sc.has_valid_mx_record(domain_part):
+                            st.warning("Not valid: MX record not found.")
+                            suggested_domains = suggest_email_domain(domain_part, emailDomains)
+                            if suggested_domains:
+                                st.info("Suggested Domains:")
+                                for suggested_domain in suggested_domains:
+                                    st.write(suggested_domain)
+                            else:
+                                st.warning("No suggested domains found.")
+                        else:
+                            # MX record validation
+                            result['MXRecord'] = sc.has_valid_mx_record(domain_part)
+                            # SMTP validation
+                            if result['MXRecord']:
+                                result['smtpConnection'] = sc.verify_email(email)
+                            else:
+                                result['smtpConnection'] = False
+                            # Temporary domain check
+                            result['is Temporary'] = sc.is_disposable(domain_part)
+                            # Determine validity status and message
+                            is_valid = (
+                                result['syntaxValidation']
+                                and result['MXRecord']
+                                and result['smtpConnection']
+                                and not result['is Temporary']
+                            )
+                            st.markdown("**Result:**")
+                            # Display metric cards with reduced text size
+                            col1, col2, col3 = st.columns(3)
+                            col1.metric(label="Syntax", value=result['syntaxValidation'])
+                            col2.metric(label="MxRecord", value=result['MXRecord'])
+                            col3.metric(label="Is Temporary", value=result['is Temporary'])
+                            # Show SMTP connection status as a warning
+                            if not result['smtpConnection']:
+                                st.warning("SMTP connection not established.")
+                            # Show domain details in an expander
+                            with st.expander("See Domain Information"):
+                                try:
+                                    dm_info = whois.whois(domain_part)
+                                    st.write("Registrar:", dm_info.registrar)
+                                    st.write("Server:", dm_info.whois_server)
+                                    st.write("Country:", dm_info.country)
+                                except:
+                                    st.error("Domain information retrieval failed.")
+                            # Show validity message
+                            if is_valid:
+                                st.success(f"{email} is a Valid email")
+                            else:
+                                st.error(f"{email} is a Invalid email")
+                                if result['is Temporary']:
+                                    st.text("It is a disposable email")
+    with t2:
+        # Bulk email processing
+        st.header("Bulk Email Processing")
+        input_file = st.file_uploader("Upload a CSV, XLSX, or TXT file", type=["csv", "xlsx", "txt"])
+        if input_file:
+            st.write("Processing...")
+            if input_file.type == 'text/plain':
+                process_txt(input_file)
+            else:
+                df = process_csv(input_file)
+                st.success("Processing completed. Displaying results:")
+                st.dataframe(df)
+if __name__ == "__main__":
+    main()

popular_domains.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import numpy as np
+emailDomains = np.array([
+  "aol.com", "att.net", "comcast.net", "facebook.com", "gmail.com", "gmx.com", "googlemail.com",
+  "google.com", "hotmail.com", "hotmail.co.uk", "mac.com", "me.com", "mail.com", "msn.com",
+  "live.com", "sbcglobal.net", "verizon.net", "yahoo.com", "yahoo.co.uk",
+  "email.com", "fastmail.fm", "games.com" , "gmx.net", "hush.com", "hushmail.com", "icloud.com",
+  "iname.com", "inbox.com", "lavabit.com", "love.com", "outlook.com", "pobox.com", "protonmail.ch", "protonmail.com", "tutanota.de", "tutanota.com", "tutamail.com", "tuta.io",
+ "keemail.me", "rocketmail.com" , "safe-mail.net", "wow.com", "ygm.com" ,
+  "ymail.com" , "zoho.com", "yandex.com",
+  "bellsouth.net", "charter.net", "cox.net", "earthlink.net", "juno.com",
+  "btinternet.com", "virginmedia.com", "blueyonder.co.uk", "live.co.uk",
+  "ntlworld.com", "orange.net", "sky.com", "talktalk.co.uk", "tiscali.co.uk",
+  "virgin.net", "bt.com",
+  "sina.com", "sina.cn", "qq.com", "naver.com", "hanmail.net", "daum.net", "nate.com", "yahoo.co.jp", "yahoo.co.kr",
+ "yahoo.co.id", "yahoo.co.in", "yahoo.com.sg", "yahoo.com.ph", "163.com", "yeah.net", "126.com", "21cn.com", "aliyun.com", "foxmail.com",
+  "hotmail.fr", "live.fr", "laposte.net", "yahoo.fr", "wanadoo.fr", "orange.fr", "gmx.fr", "sfr.fr", "neuf.fr", "free.fr",
+  "gmx.de", "hotmail.de", "live.de", "online.de", "t-online.de" , "web.de", "yahoo.de",
+  "libero.it", "virgilio.it", "hotmail.it", "aol.it", "tiscali.it", "alice.it", "live.it", "yahoo.it",
+  "email.it", "tin.it", "poste.it", "teletu.it",
+  "bk.ru", "inbox.ru", "list.ru", "mail.ru", "rambler.ru", "yandex.by", "yandex.com", "yandex.kz", "yandex.ru", "yandex.ua", "ya.ru",
+  "hotmail.be", "live.be", "skynet.be", "voo.be", "tvcablenet.be", "telenet.be",
+  "hotmail.com.ar", "live.com.ar", "yahoo.com.ar", "fibertel.com.ar", "speedy.com.ar", "arnet.com.ar",
+  "yahoo.com.mx", "live.com.mx", "hotmail.es", "hotmail.com.mx", "prodigy.net.mx",
+  "yahoo.ca", "hotmail.ca", "bell.net", "shaw.ca", "sympatico.ca", "rogers.com",
+  "yahoo.com.br", "hotmail.com.br", "outlook.com.br", "uol.com.br", "bol.com.br", "terra.com.br", "ig.com.br", "r7.com"
+    , "zipmail.com.br", "globo.com", "globomail.com", "oi.com.br"])

requirements.txt CHANGED Viewed

@@ -1,3 +1,7 @@
-altair
-pandas
-streamlit

+python-whois
+dnspython
+requests
+streamlit
+streamlit-extras
+jellyfish
+numpy

source_code.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import re
+import dns.resolver
+import smtplib
+import requests
+import threading
+import queue
+import dns.reversename
+CACHE_TTL = 600
+# Initialize a DNS resolver with caching enabled
+resolver = dns.resolver.Resolver(configure=False)
+resolver.nameservers = ['8.8.8.8']
+resolver.cache = dns.resolver.Cache()
+# def is_valid_email(email):
+#     # Check if "@" is present in the email
+#     if "@" not in email:
+#         return False
+#     local_part, domain_part = email.split('@')
+#     # Check for consecutive dots, hyphens, or underscores in the local part
+#     if re.search(r'\.{2}|-{2}|_{2}', local_part):
+#         return False
+#     # Check for consecutive dots, hyphens, or underscores in the domain part
+#     if re.search(r'\.{2}|-{2}|_{2}', domain_part):
+#         return False
+#     # Check for two consecutive dots, hyphens, or underscores anywhere in the email
+#     if re.search(r'\.\-|\-\.|\.\.|\_\-|\-\_|\_\_|\.\.|--', email):
+#         return False
+#     # Validate email syntax
+#     pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
+#     return re.match(pattern, email) is not None
+def is_valid_email(email):
+    # Comprehensive regex for email validation
+    pattern = r'''
+        ^                         # Start of string
+        (?!.*[._%+-]{2})          # No consecutive special characters
+        [a-zA-Z0-9._%+-]{1,64}    # Local part: allowed characters and length limit
+        (?<![._%+-])              # No special characters at the end of local part
+        @                         # "@" symbol
+        [a-zA-Z0-9.-]+            # Domain part: allowed characters
+        (?<![.-])                 # No special characters at the end of domain
+        \.[a-zA-Z]{2,}$           # Top-level domain with minimum 2 characters
+    '''
+    # Match the entire email against the pattern
+    return re.match(pattern, email, re.VERBOSE) is not None
+# mx record validation
+# Set the cache TTL (in seconds)
+def query_dns(record_type, domain):
+    try:
+        # Try to resolve the record from cache first
+        record_name = domain if record_type == 'MX' else f'{domain}.'
+        cache_result = resolver.cache.get((record_name, record_type))
+        if cache_result is not None and (dns.resolver.mtime() - cache_result.time) < CACHE_TTL:
+            return True
+        # Otherwise, perform a fresh DNS query
+        resolver.timeout = 2
+        resolver.lifetime = 2
+        resolver.resolve(record_name, record_type)
+        return True
+    except dns.resolver.NXDOMAIN:
+        # The domain does not exist
+        return False
+    except dns.resolver.NoAnswer:
+        # No record of the requested type was found
+        return False
+    except dns.resolver.Timeout:
+        # The query timed out
+        return False
+    except:
+        # An unexpected error occurred
+        return False
+def has_valid_mx_record(domain):
+    # Define a function to handle each DNS query in a separate thread
+    def query_mx(results_queue):
+        results_queue.put(query_dns('MX', domain))
+    def query_a(results_queue):
+        results_queue.put(query_dns('A', domain))
+    # Start multiple threads to query the MX and A records simultaneously
+    mx_queue = queue.Queue()
+    a_queue = queue.Queue()
+    mx_thread = threading.Thread(target=query_mx, args=(mx_queue,))
+    a_thread = threading.Thread(target=query_a, args=(a_queue,))
+    mx_thread.start()
+    a_thread.start()
+    # Wait for both threads to finish and retrieve the results from the queues
+    mx_thread.join()
+    a_thread.join()
+    mx_result = mx_queue.get()
+    a_result = a_queue.get()
+    return mx_result or a_result
+# smtp connection
+def verify_email(email):
+    # Split the email address into username and domain parts
+    domain = email.split('@')[1]
+    # Check the domain MX records
+    try:
+        mx_records = dns.resolver.resolve(domain, 'MX')
+    except dns.resolver.NoAnswer:
+        return False
+    # Connect to the SMTP server and perform the email verification
+    for mx in mx_records:
+        try:
+            smtp_server = smtplib.SMTP(str(mx.exchange))
+            smtp_server.ehlo()
+            smtp_server.mail('')
+            code, message = smtp_server.rcpt(str(email))
+            smtp_server.quit()
+            if code == 250:
+                return True
+        except:
+            pass
+    return False
+# temporary domain
+def is_disposable(domain):
+    blacklists = [
+        'https://raw.githubusercontent.com/andreis/disposable-email-domains/master/domains.txt',
+        'https://raw.githubusercontent.com/wesbos/burner-email-providers/master/emails.txt'
+    ]
+    for blacklist_url in blacklists:
+        try:
+            blacklist = set(requests.get(blacklist_url).text.strip().split('\n'))
+            if domain in blacklist:
+                return True
+        except Exception as e:
+            print(f'Error loading blacklist {blacklist_url}: {e}')
+    return False

src/streamlit_app.py DELETED Viewed

@@ -1,40 +0,0 @@
-import altair as alt
-import numpy as np
-import pandas as pd
-import streamlit as st
-"""
-# Welcome to Streamlit!
-Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).
-In the meantime, below is an example of what you can do with just a few lines of code:
-"""
-num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
-num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
-indices = np.linspace(0, 1, num_points)
-theta = 2 * np.pi * num_turns * indices
-radius = indices
-x = radius * np.cos(theta)
-y = radius * np.sin(theta)
-df = pd.DataFrame({
-    "x": x,
-    "y": y,
-    "idx": indices,
-    "rand": np.random.randn(num_points),
-})
-st.altair_chart(alt.Chart(df, height=700, width=700)
-    .mark_point(filled=True)
-    .encode(
-        x=alt.X("x", axis=None),
-        y=alt.Y("y", axis=None),
-        color=alt.Color("idx", legend=None, scale=alt.Scale()),
-        size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
-    ))

style.css ADDED Viewed

	@@ -0,0 +1,17 @@

+.css-1xarl3l {
+  font-size: 1.25rem;
+  padding-bottom: 0.25rem;
+}
+/* Move block container higher */
+div.block-container.css-18e3th9.egzxvld2 {
+margin-top: -5em;
+}
+#MainMenu {visibility: hidden;}
+footer {visibility: hidden;}
+div.block-container.css-z5fcl4.e1g8pov64{
+margin-top: -5em;
+}
+/* Path: static\style.css */

suggestion.py ADDED Viewed

	@@ -0,0 +1,64 @@

+from popular_domains import emailDomains
+import jellyfish
+from typing import List
+from concurrent.futures import ThreadPoolExecutor
+import numpy as np
+class TrieNode:
+    def __init__(self, char: str):
+        self.char = char
+        self.children = {}
+        self.word_end = False
+class Trie:
+    def __init__(self):
+        self.root = TrieNode('')
+    def add(self, word: str):
+        node = self.root
+        for char in word:
+            if char not in node.children:
+                node.children[char] = TrieNode(char)
+            node = node.children[char]
+        node.word_end = True
+    def search(self, word: str) -> bool:
+        node = self.root
+        for char in word:
+            if char not in node.children:
+                return False
+            node = node.children[char]
+        return node.word_end
+def suggest_email_domain(domain: str, valid_domains: List[str]) -> List[str]:
+    # Build a trie with valid domains
+    trie = Trie()
+    for valid_domain in valid_domains:
+        trie.add(valid_domain)
+    # Calculate distances using a faster string distance metric
+    distances = {}
+    with ThreadPoolExecutor(max_workers=np.minimum(16, len(valid_domains))) as executor:
+        for valid_domain, distance in zip(valid_domains, executor.map(lambda x: jellyfish.damerau_levenshtein_distance(domain, x), valid_domains)):
+            if distance <= 2:
+                if distance in distances:
+                    if valid_domain not in distances[distance]:
+                        distances[distance].append(valid_domain)
+                else:
+                    distances[distance] = [valid_domain]
+    # Choose the most similar domains based on alphabetical order
+    sorted_domains = np.array([])
+    if distances:
+        min_distance = min(distances.keys())
+        sorted_domains = sorted(distances[min_distance])
+        sorted_domains = [d for d in sorted_domains if trie.search(d)]
+    # Check for phonetic similarity using Soundex
+    soundex_domain = jellyfish.soundex(domain)
+    phonetically_similar_domains = [d for d in valid_domains if jellyfish.soundex(d) == soundex_domain and d not in sorted_domains]
+    # Combine and return the results
+    return sorted_domains + phonetically_similar_domains