Home

Understanding Class Benefits and Building a WHOIS Domain Checker in Python

Understanding Class Benefits and Building a WHOIS Domain Checker in Python

This article demonstrates both the benefits of using classes in Python and provides a practical example of building a WHOIS domain availability checker. We’ll explore how object-oriented programming concepts apply to real-world problems.

Key Concepts Demonstrated:

Procedural vs. Class-Based Approaches

First, let’s compare two programming paradigms:

  1. Procedural Programming: Functions that operate on data passed as parameters
  2. Object-Oriented Programming: Classes that bundle data and methods together

Procedural Approach

In the procedural approach, we define separate functions that operate on data passed as parameters:

def connect_to_server(server, port, timeout):
    """
    Connect to a server using procedural style

    Args:
        server (str): Server address to connect to
        port (int): Port number for the connection
        timeout (int): Connection timeout in seconds

    Returns:
        str: A simulated connection identifier
    """
    print(f"Connecting to {server}:{port} with timeout {timeout}")
    return f"connection_to_{server}"  # Simulated connection

def check_domain_procedural(connection, domain):
    """
    Check a domain using procedural style

    Args:
        connection (str): Connection identifier from connect_to_server
        domain (str): Domain name to check

    Returns:
        bool: True if domain check was successful (simulated)
    """
    print(f"Checking {domain} using {connection}")
    return True  # Simulated result

def disconnect_procedural(connection):
    """
    Disconnect from a server using procedural style

    Args:
        connection (str): Connection identifier to disconnect
    """
    print(f"Disconnecting from {connection}")

With the procedural approach, you need to manually track state. Each function call requires passing the connection as a parameter:

conn = connect_to_server("whois.dns.pl", 43, 10)  # Establish connection
result1 = check_domain_procedural(conn, "example1.pl")  # Check domain with connection
result2 = check_domain_procedural(conn, "example2.pl")  # Check another domain
disconnect_procedural(conn)  # Close the connection

Class-Based Approach

In the class-based approach, we define a class that bundles data and methods together:

class DomainChecker:
    """
    Class-based approach demonstrating encapsulation

    This class represents a domain checking system that can connect to WHOIS servers,
    check domain availability, and manage connections. It bundles the data (server,
    port, timeout, connection) with the methods that operate on that data.

    Attributes:
        server (str): The WHOIS server address
        port (int): The port number for the connection
        timeout (int): Connection timeout in seconds
        connection (str or None): Current connection identifier, None if disconnected
    """

    def __init__(self, server="whois.dns.pl", port=43, timeout=10):
        """
        Initialize the DomainChecker with server details

        The constructor (__init__) method is called when creating a new instance
        of the class. It sets up the initial state of the object.

        Args:
            server (str): WHOIS server address (default: "whois.dns.pl")
            port (int): Port number for connection (default: 43)
            timeout (int): Connection timeout in seconds (default: 10)
        """
        # These are instance attributes - each instance of the class gets its own copy
        self.server = server      # Store the server address
        self.port = port          # Store the port number
        self.timeout = timeout    # Store the timeout value
        self.connection = None    # Initially, no connection exists

    def connect(self):
        """
        Connect to the server using instance variables

        This method uses the instance's stored server, port, and timeout values
        to establish a connection. The connection state is maintained in the
        instance variable self.connection.

        Returns:
            bool: True indicating successful connection (simulated)
        """
        print(f"Connecting to {self.server}:{self.port} with timeout {self.timeout}")
        # Store the connection identifier in the instance
        self.connection = f"connection_to_{self.server}"
        return True

    def check_domain(self, domain):
        """
        Check a domain using the instance's stored connection

        This method checks if a connection exists before attempting to check
        a domain. It uses the connection maintained by this instance.

        Args:
            domain (str): Domain name to check for availability

        Returns:
            bool or None: True if domain check was successful, None if no connection
        """
        # Check if we have an active connection before proceeding
        if not self.connection:
            print("No connection! Please connect first.")
            return None  # Return None to indicate failure due to no connection

        print(f"Checking {domain} using {self.connection}")
        return True  # Simulated result indicating successful domain check

    def disconnect(self):
        """
        Disconnect from the server using instance state

        This method closes the connection maintained by this instance and
        resets the connection state to None.
        """
        # Only disconnect if there's an active connection
        if self.connection:
            print(f"Disconnecting from {self.connection}")
            # Reset the connection to None to indicate disconnection
            self.connection = None

With the class approach, state is managed automatically by the object. The DomainChecker instance maintains its own connection state internally:

checker = DomainChecker()  # Create a new DomainChecker instance
checker.connect()  # Connection state stored in the instance automatically
result1 = checker.check_domain("example1.pl")  # Uses stored connection automatically
result2 = checker.check_domain("example2.pl")  # Uses stored connection automatically
checker.disconnect()  # Uses stored connection automatically

Multiple Instances Example

One of the advantages of the class-based approach is that it’s easy to create multiple instances with different configurations. Each instance maintains its own independent state:

# Easy to create multiple instances with different configurations
# Each instance maintains its own independent state
checker_pl = DomainChecker(server="whois.dns.pl", port=43)  # Polish WHOIS server
checker_com = DomainChecker(server="whois.verisign-grs.com", port=43)  # .com WHOIS server

# Both instances can operate independently
checker_pl.connect()  # Connect to Polish server
checker_com.connect()  # Connect to .com server

# Each instance uses its own server configuration
checker_pl.check_domain("example.pl")  # Check Polish domain
checker_com.check_domain("example.com")  # Check .com domain

# Disconnect each instance separately
checker_pl.disconnect()
checker_com.disconnect()

Practical Application: WHOIS Domain Availability Checker

Now let’s look at a practical implementation of a WHOIS domain checker that demonstrates these OOP concepts:

#!/usr/bin/env python3
"""
A WHOIS domain availability checker that reads domains from a file and checks them in batches.

This script implements a WHOIS client that connects to a WHOIS server and checks the
availability of domain names. It's designed to efficiently check multiple domains by
maintaining a single connection and reusing it for all queries, which reduces overhead
and respects server resources.

The script includes robust error handling for network issues and automatic reconnection
capabilities to handle server-side disconnections.

Features:
- Batch processing of domains from a file
- Persistent connection reuse
- Automatic reconnection on failure
- Comprehensive error handling
- Detailed logging of operations
"""

import socket  # Import the socket module for network communication
from pathlib import Path  # Import Path for cross-platform file path handling
from datetime import date  # Import date for timestamp operations (though not currently used)
import time  # Import time for adding delays between requests


class WhoisChecker:
    """
    A class to handle WHOIS queries to check domain availability.
    Uses a single persistent connection to efficiently check multiple domains.

    WHOIS Protocol Overview:
    - WHOIS is a query and response protocol that's widely used to query databases
    - These databases store registered users or assignees of domain names
    - The standard WHOIS port is 43
    - Clients connect to the server, send a query (domain name), and receive a response
    """

    def __init__(self, server="whois.dns.pl", port=43, timeout=10):
        """
        Initialize the WhoisChecker with server details.

        Args:
            server (str): The WHOIS server to connect to (default: "whois.dns.pl")
                          whois.dns.pl is the Polish domain registry WHOIS server
            port (int): The port to connect to (default: 43)
                        Port 43 is the standard WHOIS port
            timeout (int): Connection timeout in seconds (default: 10)
                           Prevents hanging connections if the server doesn't respond
        """
        # Store the WHOIS server address - this is where we'll connect to make queries
        self.server = server

        # Store the port number - WHOIS services typically operate on port 43
        self.port = port

        # Store the timeout value - this prevents the program from hanging indefinitely
        # if the server doesn't respond
        self.timeout = timeout

        # Initialize socket variable as None - we're not connected initially
        # A socket is an endpoint for sending and receiving data across a network
        self.sock = None

    def connect(self):
        """
        Establish a connection to the WHOIS server.

        This method creates a TCP socket connection to the WHOIS server specified
        during initialization. It sets a timeout to prevent hanging connections
        and handles any exceptions that might occur during the connection process.

        Socket Programming Concepts:
        - AF_INET: Address Family for IPv4 addresses
        - SOCK_STREAM: Socket type for TCP connections (reliable, ordered delivery)
        - settimeout(): Sets a timeout for blocking socket operations

        Returns:
            bool: True if connection successful, False otherwise
        """
        try:
            # Create a new TCP socket object
            # socket.AF_INET means we're using IPv4 addresses
            # socket.SOCK_STREAM means we're using TCP protocol (reliable, ordered delivery)
            self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

            # Set a timeout so we don't wait forever if the server doesn't respond
            # This is important to prevent the program from hanging indefinitely
            self.sock.settimeout(self.timeout)

            # Actually connect to the WHOIS server at the specified address and port
            # The connect() method establishes a connection to the server
            self.sock.connect((self.server, self.port))

            # If we reach this point, the connection was successful
            return True
        except Exception as e:
            # If there's any error during connection, print it and return False
            # This catches any exception that might occur during the connection process
            print(f"Failed to connect to {self.server}:{self.port} - {e}")
            return False

    def disconnect(self):
        """
        Close the connection to the WHOIS server if it exists.

        This method properly closes the socket connection to free up system resources.
        It's important to close connections when done to prevent resource leaks.

        Resource Management:
        - Closing sockets releases network resources
        - Setting the reference to None prevents accidental reuse
        - This follows the principle of cleaning up after ourselves
        """
        # Check if we have an active socket connection
        # If sock is None, no connection exists, so nothing to close
        if self.sock:
            # Close the socket connection to free up resources
            # This releases the network connection and associated system resources
            self.sock.close()

            # Set the socket reference to None since it's closed
            # This prevents accidental attempts to use the closed socket
            self.sock = None

    def check_domain(self, domain):
        """
        Check if a .pl domain is available via WHOIS using existing connection.

        This method sends a domain name to the WHOIS server and analyzes the response
        to determine if the domain is available or already registered. It handles
        the low-level socket communication and response parsing.

        WHOIS Response Analysis:
        - Available domains typically return messages indicating "not found" or similar
        - Registered domains return detailed registration information
        - Different WHOIS servers may use slightly different response formats

        Args:
            domain (str): The domain name to check (e.g., "example.pl")

        Returns:
            bool or None: True if domain is available, False if registered, None if error
        """
        # Check if we have an active connection to the WHOIS server
        # Without a connection, we can't send the query
        if not self.sock:
            print("Not connected to WHOIS server. Call connect() first.")
            return None

        try:
            # Send the domain name to the WHOIS server followed by a carriage return and newline
            # This is the standard way to query WHOIS servers
            # encode() converts the string to bytes, which is required for socket transmission
            self.sock.send(f"{domain}\r\n".encode())

            # Receive the response from the server
            # Initialize empty bytes object to store the complete response
            response = b""

            # Loop to receive all parts of the response
            # WHOIS responses may come in multiple chunks
            while True:
                # Receive up to 4096 bytes of data from the server
                # 4096 bytes is a common buffer size for network operations
                data = self.sock.recv(4096)

                # Check if we've received the complete response
                # WHOIS responses typically end when the server stops sending data
                # An empty response (not data) indicates the server has finished sending
                if not data:
                    # End of response detected (server closed connection or finished sending)
                    break

                # Append the received data to our response variable
                # This builds the complete response from potentially multiple chunks
                response += data

            # Convert the byte response to a string for processing
            # Using 'utf-8' encoding and ignoring any problematic characters
            # This allows us to work with the response as text
            response_text = response.decode('utf-8', errors='ignore')

            # Check if the response indicates the domain is NOT registered/available
            # Different WHOIS servers use different phrases to indicate unavailability
            # Common indicators of an available domain include:
            # - "No information available"
            # - "not found" (case-insensitive)
            # - "not registered" (case-insensitive)
            # - "No data is found"
            if ("No information available" in response_text or
                "not found" in response_text.lower() or
                "not registered" in response_text.lower() or
                "No data is found" in response_text):

                # Domain is available (not found in the registry)
                return True
            else:
                # Domain is registered (found in the registry)
                # The response contains registration information
                return False

        # Handle specific connection-related errors that commonly occur with WHOIS servers
        except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError) as e:
            # Handle connection-related errors specifically
            # These errors often occur when the server closes the connection unexpectedly
            print(f"Connection error checking {domain}: {e}")

            # Close the current connection to clean up resources
            self.disconnect()

            # Try to establish a new connection
            if self.connect():
                # If reconnection succeeds, try checking the domain again
                # This implements a simple retry mechanism
                return self.check_domain(domain)  # Retry once after reconnection
            else:
                # If reconnection fails, return None to indicate unknown status
                return None  # Unknown status due to connection issues

        # Handle any other errors that might occur during the domain check
        except Exception as e:
            # Handle any other errors that occur during the domain check
            # This is a catch-all for unexpected issues
            print(f"Error checking {domain}: {e}")

            # If there's an error, close the current connection and try to reconnect
            # This helps recover from various network issues
            self.disconnect()

            # Try to establish a new connection
            if self.connect():
                # If reconnection succeeds, try checking the domain again
                # This implements a simple retry mechanism
                return self.check_domain(domain)  # Retry once after reconnection
            else:
                # If reconnection fails, return None to indicate unknown status
                # This signals that we couldn't determine the domain status
                return None  # Unknown status due to connection issues

    def check_domains_batch(self, domains, delay=1):
        """
        Check multiple domains using a single connection to be efficient.

        This method implements batch processing of domain availability checks.
        Rather than establishing a new connection for each domain, it reuses
        the same connection for all domains in the list, which is much more
        efficient and respectful to the WHOIS server.

        Efficiency Benefits:
        - Reduces connection overhead (TCP handshake, etc.)
        - Faster overall processing time
        - Less load on the WHOIS server
        - Better compliance with rate limiting

        Args:
            domains (list): List of domain names to check
            delay (int): Delay in seconds between requests to avoid rate limiting
                         Helps prevent overwhelming the server with too many requests

        Returns:
            dict: Dictionary mapping domain names to their availability status
                  {domain_name: True/False/None}
                  True = Available, False = Registered, None = Error/Unknown
        """
        # If we're not connected, try to connect first
        # This ensures we have a connection before attempting to check domains
        if not self.sock and not self.connect():
            # Return an empty dictionary if we can't connect
            # This indicates that no checks were performed
            return {}

        # Create an empty dictionary to store our results
        # This will map domain names to their availability status
        results = {}

        # Loop through each domain in the list
        # Using enumerate to get both index and value
        for i, domain in enumerate(domains):
            # Check the current domain and store the result
            # The check_domain method handles the actual WHOIS query
            status = self.check_domain(domain)

            # Store the result in our dictionary with the domain as the key
            results[domain] = status

            # Print the result for immediate feedback
            # This provides real-time updates on the checking process
            print(f"Checked {domain}: {'AVAILABLE' if status else 'REGISTERED' if status is not None else 'ERROR'}")

            # Add a delay between requests to avoid overwhelming the server
            # Rate limiting helps prevent being blocked by the server
            # Only add delay if this isn't the last domain in the list
            if i < len(domains) - 1:  # Don't delay after the last request
                # Wait for the specified number of seconds
                # This gives the server time to process and helps with rate limiting
                time.sleep(delay)

        # Return the complete dictionary of results
        # Contains all domain names mapped to their availability status
        return results

    def check_domains_batch_with_reconnect(self, domains, delay=1, max_retries=3):
        """
        Check multiple domains with automatic reconnection if connection drops.

        This method enhances the basic batch checking by adding robust reconnection
        logic. WHOIS servers sometimes close connections unexpectedly, especially
        when processing multiple requests. This method handles such cases by
        automatically reconnecting and retrying failed checks.

        Robustness Features:
        - Automatic reconnection when connection is lost
        - Configurable retry attempts per domain
        - Proper cleanup of failed connections
        - Detailed logging of retry attempts

        Args:
            domains (list): List of domain names to check
            delay (int): Delay in seconds between requests
                         Helps prevent overwhelming the server with too many requests
            max_retries (int): Maximum number of reconnection attempts per domain
                               Prevents infinite retry loops on persistent failures

        Returns:
            dict: Dictionary mapping domain names to their availability status
                  {domain_name: True/False/None}
                  True = Available, False = Registered, None = Error/Unknown
        """
        # Initialize an empty dictionary to store results
        # This will hold the final status for each domain
        results = {}

        # Iterate through each domain in the input list
        # Using enumerate to get both index and domain name
        for i, domain in enumerate(domains):
            # Track the number of retry attempts for this domain
            retry_count = 0

            # Initialize status as None (unknown)
            status = None

            # Continue trying until we get a result or exceed max retries
            while retry_count <= max_retries:
                try:
                    # Check if we have a connection, if not, try to connect
                    # This handles cases where the connection was dropped
                    if not self.sock:
                        if not self.connect():
                            # If we can't reconnect, mark the domain as unknown
                            print(f"Could not reconnect to check {domain}")
                            status = None
                            # Break out of the retry loop since we can't proceed
                            break

                    # Try to check the domain using the existing connection
                    # This calls the check_domain method which handles the actual query
                    status = self.check_domain(domain)

                    # If we got a definitive result (not None), break the retry loop
                    # A None result typically indicates a connection or query error
                    if status is not None:
                        # We have a valid result, so exit the retry loop
                        break
                    else:
                        # Increment the retry counter for connection issues
                        retry_count += 1
                        if retry_count <= max_retries:
                            # Log the retry attempt
                            print(f"Retrying {domain} ({retry_count}/{max_retries})...")
                            # Brief pause before retry to allow server recovery
                            time.sleep(2)

                # Catch any exceptions that occur during the checking process
                except Exception as e:
                    # Log the exception that occurred
                    print(f"Exception while checking {domain}: {e}")

                    # Increment the retry counter
                    retry_count += 1

                    # Check if we still have retries left
                    if retry_count <= max_retries:
                        # Log the retry attempt
                        print(f"Retrying {domain} ({retry_count}/{max_retries})...")

                        # Disconnect to clean up the current connection
                        # This helps ensure we start fresh on the next attempt
                        self.disconnect()

                        # Brief pause before retry to allow server recovery
                        time.sleep(2)
                    else:
                        # If we've exceeded max retries, set status to None
                        status = None

            # Store the final result for this domain in our results dictionary
            results[domain] = status

            # Print the final result for this domain
            # This provides immediate feedback on the outcome
            print(f"Final result for {domain}: {'AVAILABLE' if status else 'REGISTERED' if status is not None else 'ERROR'}")

            # Add delay between requests to avoid overwhelming the server
            # Only add delay if this isn't the last domain in the list
            if i < len(domains) - 1:
                # Wait for the specified number of seconds before the next request
                time.sleep(delay)

        # Return the complete dictionary of results for all domains
        return results

    def check_domains_from_file(self, filename, delay=1):
        """
        Check domains from a file using a single connection.

        This method provides a convenient way to check domain availability by reading
        domain names from a text file. Each line in the file should contain a single
        domain name. Empty lines are automatically filtered out.

        File Format Expected:
        - One domain name per line
        - Lines starting with whitespace are stripped
        - Empty lines are ignored
        - UTF-8 encoding is assumed

        Args:
            filename (str): Path to the file containing domain names (one per line)
            delay (int): Delay in seconds between requests
                         Helps prevent overwhelming the server with too many requests

        Returns:
            dict: Dictionary mapping domain names to their availability status
                  {domain_name: True/False/None}
                  True = Available, False = Registered, None = Error/Unknown
        """
        try:
            # Open the specified file in read mode with UTF-8 encoding
            # Using 'with' ensures the file is properly closed even if an error occurs
            with open(filename, "r", encoding="utf-8") as f:
                # Read all lines from the file and process them
                # Strip whitespace from each line and filter out empty lines
                # This creates a list of clean domain names ready for checking
                domains = [line.strip() for line in f.readlines() if line.strip()]

            # Pass the list of domains to the batch checking method
            # This reuses the connection for all domains in the file
            return self.check_domains_batch(domains, delay)

        # Handle the case where the specified file doesn't exist
        except FileNotFoundError:
            # Inform the user that the file wasn't found
            print(f"File {filename} not found.")
            # Return an empty dictionary to indicate no checks were performed
            return {}

        # Handle any other exceptions that might occur during file operations
        except Exception as e:
            # Log the error that occurred during file reading
            print(f"Error reading file {filename}: {e}")
            # Return an empty dictionary to indicate no checks were performed
            return {}


# Test the code when this script is run directly (not imported)
# The __name__ == "__main__" guard ensures this code only runs when the script
# is executed directly, not when it's imported as a module
if __name__ == "__main__":
    # Create an instance of our WhoisChecker class
    # This initializes the checker with default server settings
    checker = WhoisChecker()

    # Try to connect to the WHOIS server
    # The connect() method establishes the initial network connection
    if checker.connect():
        print("Connected to WHOIS server")

        # Read domains from file and check them with reconnection capability
        # This approach allows for batch processing of domains from a file
        try:
            # Attempt to open and read the domains.txt file
            # This file should contain one domain name per line
            with open("domains.txt", "r", encoding="utf-8") as f:
                # Process the file: strip whitespace from each line and filter out empty lines
                # This creates a clean list of domain names ready for checking
                domains = [line.strip() for line in f.readlines() if line.strip()]

            # Check domains with automatic reconnection capability
            # This method handles connection drops and retries automatically
            results = checker.check_domains_batch_with_reconnect(domains, delay=2, max_retries=2)
        except FileNotFoundError:
            # Handle the case where domains.txt doesn't exist
            print("domains.txt file not found. Using default test domains.")
            # Fall back to a predefined list of test domains
            results = checker.check_domains_batch_with_reconnect(
                ["xke.pl", "abc.pl", "nonexistentdomain12345.pl"],
                delay=2,
                max_retries=2
            )

        print("\\nResults:")
        # Print the results for each domain with appropriate symbols
        # ✓ for available domains
        # ✗ for registered domains
        # ? for domains that couldn't be checked
        for domain, status in results.items():
            if status is True:
                print(f"✓ {domain} is AVAILABLE")
            elif status is False:
                print(f"✗ {domain} is REGISTERED")
            else:
                print(f"? Could not check {domain}")

        # Always disconnect when done to free up resources
        # This is important for proper resource management
        checker.disconnect()
    else:
        # Handle the case where the initial connection to the WHOIS server fails
        print("Failed to connect to WHOIS server")

Summary of Benefits

1. Encapsulation

2. State Management

3. Reusability

4. Maintainability

5. Abstraction

6. Organization

The object-oriented approach with classes provides a more structured and maintainable way to organize code, especially as programs grow in complexity. Classes allow us to model real-world concepts more naturally and create reusable, modular code. The WHOIS domain checker example demonstrates how these concepts apply to real-world networking applications.

Tags: PythonProgramming-ConceptsObject-Oriented-ProgrammingClassesCliSocketWhoisDomain-Checker