Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • Finding Banned Words On A Page And Not Within Other Words!

    • 0
    • 0
    • 0
    • 1
    • 0
    • 0
    • 0
    • 582
    Answer it

    Php Lovers!

    I am NOT searching for banned words within other words on a page but searching for banned words within a loaded page.
    I am not actually looking for banned words within other words but within the page (meta tags, content). 

    And so, if I am looking for the word "cock", then the word "cockerel" should not trigger the filter.

    I just tested this code and, yes, as expected the code works but as you can guess there is a lot of cpu power cycling through. One moment the page loads, the other moment it goes grey and shows signs that the page is taking too long to load. And all this on localhost. Now, I can imagine what my webhost would do! 
    So now, we will have to come-up with a better solution. Any ideas ?
    How-about we do not get the script to check on the loaded page for all the banned words ? How-about we get the script to halt as soon as 1 banned word is found and an echo has been made which banned word has been found and where on the page ? (meta tags, body content, etc.).
    Any code suggestions ? 

    Here is what I got so far:

     

    Code:

    <?php
     
    /*
    ERROR HANDLING
    */
     
    // 1). $curl is going to be data type curl resource.
    $curl = curl_init();
     
    // 2). Set cURL options.
    curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
     
    // 3). Run cURL (execute http request).
    $result = curl_exec($curl);
    $response = curl_getinfo( $curl );
     
    if( $response['http_code'] == '200' )
       {
        //Set banned words.
        $banned_words = array("Prick","Dick","***");
     
        //Separate each words found on the cURL fetched page.
        $word = explode(" ", $result);
        
       //var_dump($word);
     
       for($i = 0; $i <= count($word); $i++)
          {
          foreach ($banned_words as $ban) 
             {
             if (strtolower($word[$i]) == strtolower($ban))
                {
                 echo "word: $word[$i]<br />";
                 echo "Match: $ban<br>";
                }
             else
                {
                 echo "word: $word[$i]<br />";
                 echo "No Match: $ban<br>";  
                }
             }
          }
       }  
     
    // 4). Close cURL resource.
    curl_close($curl);

    [/code]

    I am told to do it like this:

    **Load the page into a string.
    Use preg_match with "word boundaries" on the loaded string and loop through your banned words.**
    Here's the update:

    [code]

    <?php
    
    /*
    ERROR HANDLING
    */
    declare(strict_types=1);
    ini_set('display_errors', '1');
    ini_set('display_startup_errors', '1');
    error_reporting(E_ALL);
    mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
    
    
    // 1). Set banned words.
    $banned_words = array("Prick","Dick","***");
    
    // 2). $curl is going to be data type curl resource.
    $curl = curl_init();
    
    // 3). Set cURL options.
    curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
    words-
    you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
    
    // 4). Run cURL (execute http request).
    $result = curl_exec($curl);
    $response = curl_getinfo( $curl );
    
    if($response['http_code'] == '200' )
         {
              $regex = '/\b'; // The beginning of the regex string syntax
              $regex .= implode('\b|\b', $banned_words); // joins all the banned words to the string with correct regex syntax
              $regex .= '\b/i'; // Adds ending to regex syntax. Final i makes it case insensitive
              $substitute = '****';
              $cleanresult = preg_replace($regex, $substitute, $result);
              echo $cleanresult;
         }
    
      curl_close($curl);
    
      ?>

    [/code]

 1 Answer(s)

  • I was having word wrapping problem in my Note Pad++. Sorted now.
    This edited code is working.

    [code]
    <?php
    /*
    ERROR HANDLING
    */
    // 1). Set banned words.
    $banned_words = array("blow", "nut", "asshole");
    // 2). $curl is going to be data type curl resource.
    $curl = curl_init();
    // 3). Set cURL options.
    curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
    // 4). Run cURL (execute http request).
    $result = curl_exec($curl);
    if (curl_errno($curl)) {
        echo 'Error:' . curl_error($curl);
    }
    $response = curl_getinfo( $curl );
    if($response['http_code'] == '200' )
    {
        $regex = '/\b';     
        $regex .= implode('\b|\b', $banned_words);   
        $regex .= '\b/i'; 
        $substitute = '****';
        $cleanresult = preg_replace($regex, $substitute, $result);
        echo $cleanresult;
    }
    curl_close($curl);
    ?>
    [/code]

    Original code newbies can grab:
    http://phpfiddle.org/main/code/0trx-6fng
Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: