Chapter in this post:
Yesterday I had a problem again that at first glance couldn't be solved. A customer has received a warning from a lawyer that he is no longer allowed to use a certain word on his website.
Now you could think: No problem, there is a search function for pages and posts in WordPress. That is also true, but unfortunately the search function does not include sliders, meta tags or the like, which is managed via plugins in the database. Permanently programmed content in the sidebar and in the footer area or in any PHP theme files is also not recorded.
Important note: I am not offering any legal advice or the like. To do this, please contact a lawyer. My guide is just a technical solution on how to find a specific word in a WordPress blog that you want to remove or change.
The Google search with the search query ala "site: www.sir-apfelot.de bad word"is unfortunately not of any help, as it only ejects archived data that is a few days old. And there is no type of search on Google that ensures that you have not overlooked anything when making changes to the website. You would have to wait for that. until the Google bot has re-indexed all pages and then search again.
Unfortunately, with letters from attorneys, you rarely have enough time to force Google to re-index the page in order to check the pages again using a Google search. So another way of searching the website has to be found.
And the second challenge is: If you make the same mistake again after receiving a warning, it will usually be really expensive. For this reason I have to make double sure that the term is not overlooked somewhere on a subpage.
My solution was designed in such a way that I first wanted to load the entire website, including all subpages, onto my Mac and then let BBEdit search through it with the multi-file search.
However, if a website is based on a CMS like WordPress, you cannot simply download the pages via FTP, as these are dynamically built from template files and content from the database.
Help is provided by a free tool called "SiteSucker"(App Store Link), which runs on the Mac and gives you the opportunity to save a complete website with all sub-pages, graphics and other files. SiteSucker was originally intended to offer a kind of offline reader function. This means that you load the content of a website onto your Mac and the URLs are rewritten by the program so that you can also view the website locally offline on your Mac. It used to be useful when you were on vacation, but nowadays you have WiFi everywhere and no need for such programs.
For my purpose, SiteSucker is the perfect tool, because I want the entire website to be available locally so that I can search through the source code.
[appbox appstore 442168834]
Since many WordPress sites are now equipped with security plugins that block automated requests, you should set SiteSucker so that it always allows 3 to 5 seconds between the loading processes of the pages so that no plugin hits. Many hosters also have server-side firewalls that recognize when a bot sends several requests per second.
If you let SiteSucker go without restrictions on a website, there is a high chance that your own IP will be blocked and you will not be able to access the corresponding website for a few minutes.
You can also limit the requests by excluding - in my case - files of the type JS, JPEG, GIF, PNG and CSS. This should actually be very easy to do via the settings, but it didn't work for me.
At some point I worked with a few regular expressions that can also be used to exclude files and URLs. If you want to do that too, you can find the appropriate screenshot with the necessary entries here:
Now click on the start button and watch how SiteSucker works its way through the website and backs up all files one by one. You can also quickly see whether the regex (regular expressions) are working correctly, because all files are displayed live in the list. If you see JPG files there, something on the filter did not work.
Once SiteSucker is through, you have a folder with all the HTML files in which the relevant "bad word" could be hidden. I searched these with the multi-file search function of BBEdit, because it opens the files and searches for the word in the source text.
I couldn't try whether Spotlight would work here too, there my spotlight for weeks has a quirk (Clean Install is due in the next few days!). But it worked with BBEdit without any problems and I think, in principle, Spotlight can also work with HTML content. The only question is whether he would also find words in the source text (image tags, etc.).
When searching in HTML files you have to keep in mind that you will miss words with umlauts on some websites because the HTML special character may have been used for the corresponding umlaut (see SelfHTML).
An example: instead of "Schmörebröt" can there too "Schmörebröt" to stand.
If you know that, you can change the search accordingly and should then find all occurrences. With this "list of references" I then went to the WordPress admin to clean up all the pages.
I only noticed two possible problem areas in the matter later: The word searched for can also be hidden in image file names or even in the image itself. The customer couldn't tell me whether just naming a file with the problematic term would be enough to cause problems again, so we decided to defuse everything.
The search for files with the corresponding name was also done locally in the "Uploads" folder of WordPress. For this I have to use the tool (because of the defective spotlight) Find Any File from Thomas Tempelmann, who had successfully completed the job in a fraction of a second.
The last construction site is the search in image content. That means photographs, banners or the like in which the searched word has been incorporated by image processing. These occurrences must also be removed. However, no tool I am familiar with helps here and you simply have to "scroll" through the graphics by hand with the preview.
I then "cleaned up" problematic graphics and photographs using Photoshop and reloaded them onto the server using FTP. Since I didn't want to revise every thumbnail, I only revised and exchanged the "large" image versions and then all thumbnails with the plugin "Force Regenerate Thumbnails"can be regenerated by Pedro Elsner.
On the whole, things could be resolved in a reasonable time despite the many sub-pages and graphics. If you have to struggle with such problems and don't know how I can solve a certain task semi-automatically, write a short comment or email me directly. Maybe I can help you!
Effectively for free: iPhone 13 Mini and iPhone 13 deals with top conditions at Otelo - Advertisement
Jens has been running the blog since 2012. He appears as Sir Apfelot for his readers and helps them with problems of a technical nature. In his free time he drives electric unicycles, takes photos (preferably with his iPhone, of course), climbs around in the Hessian mountains or hikes with the family. His articles deal with Apple products, news from the world of drones or solutions for current bugs.