PHP Regex, short for PHP Regular Expressions, is a powerful tool used for pattern matching within strings. Regular expressions (regex) are sequences of characters that form search patterns, which can be used for string matching, searching, and replacing operations.
PHP REGEX - EXAMPLE
This code fetches a webpage, checks for any errors during the fetching process, and then uses a regular expression to find and print the content of a specific <div> tag.
If the specified <div> tag is not found, it prints "Not found". This is a common approach for web scraping using PHP.
<?php
$curl = curl_init('https://yourdomain.com');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
curl_close($curl);
$regex = '/<div class="classname">(.*?)<\/div>/s';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";
?>
Here's a detailed explanation of each part of the code:
"curl_init('https://yourdomain.com');"
This function initializes a new cURL session and sets the URL to https://yourdomain.com.
"curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);"
This function sets an option for the cURL transfer. Here, it tells cURL to return the transfer as a string of the return value of curl_exec() instead of directly outputting it.
"curl_exec($curl)"
This function executes the cURL session and stores the result (the content of the webpage) in the variable $page.
"curl_errno($curl)"
This function returns the last error number.
"curl_error($curl)"
This function returns a string containing the last error for the current session.
If there is an error, it prints an error message and exits the script.
"curl_close($curl)"
This function closes the cURL session and frees all resources.
$regex = '/<div class="classname">(.*?)<\/div>/s';
This line defines a regular expression to match the content within a <div> tag with a class of "classname".
"/<div class="classname">(.*?)<\/div>/s"
The pattern is explained as follows:
<div class="classname">: Matches the opening <div> tag with the specified class.
"(.*?):"
This is a non-greedy match to capture any content inside the <div>. The ? makes it non-greedy, meaning it will stop at the first occurrence of </div>.
"<\/div>"
Matches the closing </div> tag.
"/s"
The s modifier makes the dot . match newlines as well, allowing the pattern to match content that spans multiple lines.
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";
"preg_match($regex, $page, $list)"
This function searches the string $page for the pattern defined by $regex.
If a match is found, it stores the match in the array $list and prints the first element ($list[0]), which contains the entire matched string.
If no match is found, it prints "Not found".