Two Useful PHP Validation Functions

by on June 4, 2011

Data validation is an important aspect of form processing. In this article we'll present two PHP functions for data validation which may be useful to WordPress plugin authors. These are: email validation with domain checking and url validation with domain checking. Both of these are rather comprehensive tests.

Email Validation

If all you want to do is verify that an email address has the correct syntax, you can use the WordPress function is_email(). It's in: wp-inlcudes/formatting.php. We do this exact validation in three lines of code using preg_match(). The WP function uses 50+ lines to accomplish the task in a peculiar, round-about manner.

Also, WordPress used to offer a function that additionally determined if the host was valid and was setup to process email accounts. That function was deprecated in 3.0 and completely removed … even from the deprecate.php file.

Here's our function that does both … checks the syntax and validates the host.

Code: PHP (plus WordPress)wcs_is_valid_email()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
function wcs_is_valid_email($email='', $check_mx=true)
{
    // result: 1 is returned if good (check for >0 or ==1)
    // result: 0 is returned if syntax is incorrect
    // result: -1 is returned if syntax is correct, but email address does not exist
 
    // check syntax
    $email = trim($email);
    $regex = '/^([*+!.&#$¦\'\\%\/0-9a-z^_`{}=?~:-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,4})$/i';
    $is_valid = preg_match($regex, $email, $matches);
 
    // NOTE: Windows servers do not offer checkdnsrr until PHP 5.3.
    // So we create the function, if it doesn't exist.
    if(!function_exists('checkdnsrr'))
    {
        function checkdnsrr($host_name='', $rec_type='')
        {
            if(!empty($host_name))
            {
                if(!$rec_type) {$rec_type = 'MX';}
                exec("nslookup -type=$rec_type $host_name", $result);
 
                // Check each line to find the one that starts with the host name.
                foreach ($result as $line)
                {
                    if(eregi("^$host_name", $line))
                    {
                        return true;
                    }
                }
                return false;
            }
            return false;
        }
    }
 
    // check that the server exists and is setup to handle email accounts
    if (($is_valid) && ($check_mx))
    {
        $at_index = strrpos($email, '@');
        $domain = substr($email, $at_index + 1);
        if (!(checkdnsrr($domain, 'MX') || checkdnsrr($domain, 'A')))
        {
            $is_valid = -1;
        }
    }
 
    // exit
    return $is_valid;
 
    /**********************************************************************
     Copyright © 2011 Gizmo Digital Fusion (http://wpCodeSnippets.info)
     you can redistribute and/or modify this code under the terms of the
     GNU GPL v2: http://www.gnu.org/licenses/gpl-2.0.html
    **********************************************************************/
}

As you can see from the comments at the top of the source code, there are three possible return values. A positive result means that the syntax is correct and the host can process emails. A return value of zero means that the provided email address syntax was invalid.

And, if the 2nd (optional) parameter is set to true (which it is by default), the return value can be -1. This indicates that the syntax was correct but the host either doesn't exist or it is not setup to process email accounts.

In lines 8 – 10, we validate the email address syntax. From this point, we use the PHP function checkdnsrr to validate the host. That function name stands for Check Domain Name Server Routing Record. However, the function is not available for Windows servers until PHP 5.3+. So … we create the function if it doesn't already exist in lines 14 – 35.

Finally, we actually check the host records in lines 38 – 46. And, we return the result on line 50. We do all of this in fewer lines than the WP function is_email(). ;-)

URL Validation

PHP includes the filter_var() function which offers URL validation using the FILTER_VALIDATE_URL filter. However, it has limitations: (1) a known bug with dashes, (2) no support for multi-byte international url's, (3) it doesn't handle some real-world url structures, and (4) it doesn't handle URL's that use IP addresses instead of a top-level-domain (TLD) name.

Our function (below) addresses each of these issues … plus it offers bonuses. First, it optionally verifies that the URL and/or file exists. Second, it auto-corrects a couple of common mistakes of visitors when completing a form.

It's also RFC 3986 compliant. Plus, it checks for both IPv4 and IPv6 addresses.

Let's take a look…

Code: PHP (plus WordPress)wcs_is_valid_url()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
function wcs_is_valid_url(&$url, $check_exists=true)
{
    // result: 1 is returned if good (check for >0 or ==1)
    // result: 0 is returned if syntax is incorrect
    // result: -1 is returned if syntax is correct, but url/file does not exist
 
    // add http:// (here AND in the referenced $url), if needed
    if (!$url) {return false;}
    if (strpos($url, ':') === false) {$url = 'http://' . $url;}
    // auto-correct backslashes (here AND in the referenced $url)
    $url = str_replace('\\', '/', $url);
 
    // convert multi-byte international url's by stripping multi-byte chars
    $url_local = urldecode($url) . ' ';
    $len = mb_strlen($url_local);
    if ($len !== strlen($url_local))
    {
        $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
        $url_local = mb_decode_numericentity($url_local, $convmap, 'UTF-8');
    }
    $url_local = trim($url_local);
 
    // now, process pre-encoded MBI's
    $regex = '#&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);#i';
    $url_test = preg_replace($regex, '$1', htmlentities($url_local, ENT_QUOTES, 'UTF-8'));
    if ($url_test != '') {$url_local = $url_test;}
 
    // test for bracket-enclosed IP address (IPv6) and modify for further testing
    preg_match('#(?<=\[)(.*?)(?=\])#i', $url, $matches);
    if ($matches[0])
    {
        $ip = $matches[0];
        if (!preg_match('/^([0-9a-f\.\/:]+)$/', strtolower($ip))) {return false;}
        if (substr_count($ip, ':') < 2) {return false;}
        $octets = preg_split('/[:\/]/', $ip);
        foreach ($octets as $i) {if (strlen($i) > 4) {return false;}}
        $ip_adj = 'x' . str_replace(':', '_', $ip) . '.com';
        $url_local = str_replace('[' . $ip . ']', $ip_adj, $url_local);
    }
 
    // test for IP address (IPv4)
    $regex = "^(https?|ftp|news|file)\:\/\/";
    $regex .= "([0-9]{1,3}\.[0-9]{1,3}\.)";
    if (eregi($regex, $url_local))
    {
        $regex = "^(https?|ftps)\:\/\/";
        $regex .= "([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})";
        if (!eregi($regex, $url_local)) {return false;}
        $seg = preg_split('/[.\/]/', $url_local);
        if (($seg[2] > 255) || ($seg[3] > 255) || ($seg[4] > 255) || ($seg[5] > 255)) {return false;}
    }
 
    // patch for wikipedia which can have a 2nd colon in the url
    if (strpos(strtolower($url_local), 'wikipedia'))
    {
        $pos = strpos($url_local, ':');
        $url_left = substr($url_local, 0, $pos + 1);
        $url_right = substr($url_local, $pos + 1);
        $url_right = str_replace(':', '_', $url_right);
        $url_local = $url_left . $url_right;
    }
 
    // construct the REGEX for standard processing
    // scheme
    $regex = "^(https?|ftp|news|file)\:\/\/";
    // user and password (optional)
    $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?";
    // hostname or IP address
    $regex .= "([a-z0-9+\$_-]+\.)*[a-z0-9+\$_-]{2,4}";
    // port (optional)
    $regex .= "(\:[0-9]{2,5})?";
    // dir/file path (optional)
    $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
    // query (optional)
    $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?";
    // anchor (optional)
    $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
 
    // test it
    $is_valid = eregi($regex, $url_local) > 0;
 
    // final check for a TLD suffix
    if ($is_valid)
    {
        $url_test = str_replace('-', '_', $url_local);
        $regex = '#^(.*?//)*([\w\.\d]*)(:(\d+))*(/*)(.*)$#';
        preg_match($regex, $url_test, $matches);
        $is_valid = preg_match('#^(.+?)\.+[0-9a-z]{2,4}$#i', $matches[2]) > 0;
    }
 
    // check if the url/file exists
    if (($check_exists) && ($is_valid))
    {
        $status = array();
        $url_test = str_replace(' ', '%20', $url);
        $handle = curl_init($url_test);
        curl_setopt($handle, CURLOPT_HEADER, true);
        curl_setopt($handle, CURLOPT_NOBODY, true);
        curl_setopt($handle, CURLOPT_FAILONERROR, true);
        curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($handle, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
        preg_match('/HTTP\/.* ([0-9]+) .*/', curl_exec($handle) , $status);
        if ($status[1] == 200) {$is_valid = true;}
        else {$is_valid = -1;}
    }
 
    // exit
    return $is_valid;
}
 
    /**********************************************************************
     Copyright © 2011 Gizmo Digital Fusion (http://wpCodeSnippets.info)
     you can redistribute and/or modify this code under the terms of the
     GNU GPL v2: http://www.gnu.org/licenses/gpl-2.0.html
    **********************************************************************/
}

As with our email validation function (top of the page), there are three possible return values. A positive result means that the syntax is correct and the url/file exists. A return value of zero means that the provided url address syntax was invalid.

And, if the 2nd (optional) parameter is set to true (which it is by default), the return value can be -1. This indicates that the syntax was correct but the url/file doesn't exist.

Notice that the url   parameter has an ampersand in front of the string (in the function definition). This is a “by reference” technique which allows us to actually change the incoming variable. We need this feature so that we can correct a couple of common form entry errors.

On line 7, if the scheme is missing, we assume it should be http:// and prefix the url with it … both inside the function AND in the user variable itself. On line 11, we auto-correct backslashes into forward slashes … both inside the function AND in the user variable itself.

In lines 13 -21, we strip normal multi-byte characters from a local copy of the url. Remember, at this point we're only validating syntax … so it's okay to strip out these characters. Then, in lines 23 – 26, we convert pre-encoded multi-byte characters to their UTF-8 equivalents.

In lines 28 – 39, we validate IPv6 addresses. In order to allow the remaining tests to function correctly, we locally transform the address to match an imaginary TLD.

In lines 41 – 51, we validate IPv4 addresses. In lines 53 – 61, we handle the peculiar situation in which WikiPedia can have a second colon in the URL. In lines 64 – 77, we construct the regex to test the URL. On line 80, we test it.

However, there are a couple of possible exceptions with our first (and quite lengthy) regular expression that could skip correct processing of TLD extensions. So … in lines 83 – 89 we correct this. We couldn't do this in the first validation pass without consequences to other validations.

In lines 92 – 107, we use cURL to make sure that the url/file exists. And, we return the result on line 110.

And … here are some sample URL strings that you can pass to the function along with their return values. In this case, we are NOT checking if the url/file actually exists … the 2nd function parameter was set to false.

Code: PHP (plus WordPress)

$url = 'http://wpcodesnippets.com'; // TRUE
$url = 'https://wpcodesnippets.com'; // TRUE
$url = 'htt://wpcodesnippets.com'; // FALSE
$url = 'ftp://wpcodesnippets.com'; // TRUE
$url = 'news://wpcodesnippets.com'; // TRUE
$url = 'file://wpcodesnippets.com'; // TRUE
 
$url = 'http://wpcode-snippets.mainsite.com'; // TRUE
$url = 'http://www.wpcodesnippets.com'; // TRUE
$url = 'www.wpcodesnippets.com'; // TRUE (and the string is prefixed with http://)
$url = 'http://wpcodesnippets.comtoolong'; // FALSE
$url = 'http://wpcodesnippets.com/'; // TRUE
$url = 'http://wpcodesnippets.com/blog'; // TRUE
$url = 'http://wpcodesnippets.com\blog'; // TRUE (and the string is corrected)
$url = 'http://wpcodesnippets'; // FALSE
$url = 'http://wpcodesnippets#$%.com'; // FALSE
$url = 'http://wpcode-snippets.com'; // TRUE
$url = 'http://wpcodesnippets.info/blog/some_page.php#pagelink'; // TRUE
$url = 'http://wpcodesnippets.info/blog/some_dir/#pagelink'; // TRUE
$url = 'http://wpcodesnippets.info/blog/4-nifty-string-functions.html'; // TRUE
 
$url = 'http://72.18.130.89'; // TRUE
$url = 'http://72.18.130.256'; // FALSE
$url = 'http://999.18.130.255'; // FALSE
$url = 'http://72.18.130'; // FALSE
$url = 'http://72.18.130.89/blog/somefile.php'; // TRUE
 
$url = 'http://some_url.com/?feed=rss&test=1'; // TRUE
$url = 'http://some_url.com/?feed=rss&test=1;more=37'; // TRUE
$url = 'http://some_url.com/?feed=rss&test=1$^'; // FALSE
 
$url = 'http://abc'; // FALSE
$url = 'http://'; // FALSE
$url = 'http://.com'; // FALSE
$url = 'some_cool_site.com'; // TRUE (and the string is prefixed with http://)
 
$url = 'http://pt.wikipedia.org/wiki/Guimarães'; // TRUE
$url = 'http://pt.wikipedia.org/wiki/Ajuda:Guia_de_edi%C3%A7%C3%A3o/Como_come%C3%A7ar_uma_p%C3%A1gina'; // TRUE
$url = 'http://llšctžýáírdnäô.com'; // TRUE
 
$url = 'http://[2001:db8:85a3:8d3:1319:8a2e:370:7348]/index.html'; // TRUE
$url = 'http://[2001:db8:85a3:8d3:1319:8a2e:370:73481]/index.html'; // FALSE
$url = 'http://[2001:db8:85a3:8d3:1319:8a2e:370:734g]/index.html'; // FALSE
$url = 'http://[FE80:0000:0000:0000:0202:B3FF:FE1E:8329]'; // TRUE
$url = 'http://[FE80:0000:0000:0000:0202:B3FF:FE1E:8329]'; // TRUE
$url = 'http://[2001:db8:0:1]:8080/this_dir/that_file.php'; // TRUE
$url = 'http://[2001:db8::1]:443/good.html'; // TRUE

 

Share This Article: “Two Useful PHP Validation Functions”

(Also Available: Press CTRL+D to Bookmark this Page)

Comments

Share Your Thoughts  7 Responses to “Two Useful PHP Validation Functions”
  1. 1
    TJ says:

    Good 2 C U have IPv6 validation built into this function. Nice.

  2. 2
    Marlene Marino says:

    cool. i’ve been looking for a while. these email and url validators are the most extensive i’ve come across.

  3. 3
  4. 4
    Steve says:

    Awesome, thanks for sharing this code!

  5. 5
    Felix says:

    Thanks a lot for this function, finally a decent URL validation!I’m not that into regex, so could someone maybe tell me how to modify the above code to allow ‘localhost’ URLs? I use this function in my testing environment, and it always returns false (I suppose because of ‘localhost’.That would be awesome, thank you in advance!

  6. 6
    Luke America says:

    Felix:
    Try changing line #69 to this:

    $regex .= “([a-z0-9+$_-]+.)*[a-z0-9+$_-]{2,4}|localhost”;


    Make sure the first and last character are actual double quotes; not the WP conversion character for double quotes.

  7. 7

    Pretty nice post. I just stumbled upon your blog and wished
    to say that I’ve truly enjoyed surfing around your blog
    posts. After all I’ll be subscribing to your rss feed and I hope you write again soon!

    Visit my web site … dungeon keeper 2 full download

Share Your Thoughts

(Some editor features are restricted unless you're logged in.)

(When replying to a specific comment, your browser may require Shift+Enter instead of just Enter.)


(get a gravatar)


Notify me of followup comments via e-mail. You can also subscribe without commenting.