Catching Online Content Scrapers

Content scrapers are all over the Internet. They steal your content and use them for their own blogs without your permission. Some scrapers merely copy the content from your blog but many take content and present it as new.

It is very disconcerting to see your content appear, word for word, on someone else’s website and you know that you had absolutely nothing to do with that (aside from actually writing the content) and you certainly did not give your permission to anyone to use your content without the proper (or any) attribution for you. On the other hand, however, if a person doesn’t change your article and gives you credit and links back to your original article, that is okay.

Catching content scrapers in the act

Most likely, you don’t even know where to begin when it comes to figuring out exactly who is stealing your content. Intellectual Property Protection There are several websites that will help you to reveal exactly who is doing you wrong.

Copyscape: Copyscape is a search engine in which you can put the full URL of where your content lives and it will let you know if and where there are duplicates. Copyscape has a search function that won’t cost you anything. If you prefer their premium service, it will allow you to check up to 10,000 pages.

WordPress trackbacks: You can see when someone includes your content in their blogs. If they don’t change the article and give you the credit and link to the original article, that is fine. This is not scraping. If the person puts their noame on your article, it can be considered plagiarism.

Webmaster Tools: If you go to Webmaster Tools, click on “Look Under Your site on the Web” and then click on “Links to Your Site,” columns will appear with linked pages. From this, you can see that websites that aren’t social media websites, social bookmarking websites or loyal fans and that link to a large number of your posts is very possibly a content scraper. If you want to verify this, you should go to those particular websites. In order to do that, you should click on any of the domains to be able to see the details of specifically which pages on your websites they are connecting with.

Using Google Alerts: If you don’t happen to post a high volume of content and you aren’t interested in paying attention to who and how many times your business is mentioned, you can create a Google Alert that matches the titles of your posts verbatim. You do this by putting quotation marks around the titles. You can set it up so that they come to you automatically every day.

Once you have established that your content is being scraped: Once you have figured out that your content is being scraped, you can get credit for your posts that have been scraped. If you use WordPress, you can try the RSS footer plugin, which will let you put your text (or at least a portion of it) at the top or bottom of the RSS feed. An attribution line will appear with your title, you as the author and a list of social media channels where people can connect with you. This is an excellent way to counteract the fact that your content is being stolen and still get something for your business. That scenario is a lot better than you just being a sitting duck and scrapers coming along and taking from you whatever they wish to take.