Duplicate content is a high-level SEO topic. Site owners dread at the mention of it.
Ask anyone about duplicate content, including some of the self-proclaimed marketing gurus, and most of them will have you believe your website or blog is a veritable time bomb courtesy of duplicate content. It’s just a matter of time before you’re slapped with a Google penalty.
But how much of this statement is true?
Very little – while duplicate content can still affect SEO, it most certainly won’t get your site penalised, except for a few extreme cases.
What’s Duplicate Content?
Duplicate content is the term used to describe similar content in multiple locations or URLs on the web. It’s confusing because search engines wouldn’t know which of the URL to display.
This might hurt how a page is ranked. Worse is when other websites start linking to the different versions of the content, confusing search engines even further.
Let’s Illustrate this Using a Real-life Example
Duplicate content compares to a crossroad with different road signs pointing to different directions, but all leading to the same destination.
Which direction should you follow?
Worse, the destinations are different too, but only slightly so. As a reader, this isn’t much of an issue because it’s the same content. But search engines have to choose which one to display in the search engine result pages and make sure that they do not show the same piece of content twice.
Will Google Penalise You for Duplicate Content?
Duplicate content isn’t the same as copied content. While Google will not hesitate to penalise you for copied content, they will not penalise you for having duplicate content on your site.
While copied content is deliberate, duplicate content is mostly caused by technical faults.
Google is clear on this. They won’t penalise your site for duplicate content. But in extreme cases, where hundreds of your pages have been duplicated, then you’re hanging on a delicate thread.
Google always ranks websites with original, high-quality content. Suppose you try to manipulate other people’s content and republish it on your site, spinning around sentences and splashing a few keywords here and there. In that case, Google will assume that you’re trying to game their system and, as such, drop your rank in their result pages.
How Much Duplicate Content is Acceptable?
The best thing is to have no duplicate content at all.
Always strive to publish original content. And if you must post duplicate content, then the least you could do is to be smart about it.
Go through your content sentence by sentence and reword everything. Overall, you’re safe with only 10% duplicate content.
Here’s what the former head of search quality at Google had to say about duplicate content, Matt Cutts:
According to Matt, the web consists of 25% to 30% of duplicate content. He even went on to add that Google never considers duplicate content as spam. They will never penalise your site for it unless it turns out you’re using it to manipulate search engines for a higher ranking.
Can a Duplicated Article Outrank the Original?
Yes, it can.
But in rare cases, and only when the website duplicating your content has a higher authority.
Must You Block Google from Indexing Duplicate Content?
There’s no need for this.
Google also has an interesting post on this (on how to handle the identical posts on your site).
Google advises you against blocking identical or similar content on your site, whether it’s by robots.txt or any other method.
Does Google Penalise Sites for Syndicating Content?
No. Google has made it clear that it does penalise sites that syndicate content.
Here’s what Google has to say about syndicating your content to other sites (paraphrased statement):
“Be careful with how you syndicate your content. Google will analyse all the versions of the content, and choose the one they think is most valuable to the user. You’re reminded that the version they choose to serve the user might not be the version you preferred.
You may also want to make sure the content you syndicate links back to the original content on your site.
Better, ask those syndicating your materials to use the no index metatags so that no search engine ends up indexing syndicated versions of your content (Google 2020).”
The Problem with Syndicating Your Content
The problem with content syndication is that you’ll never be sure if that will ultimately affect your organic traffic.
Since the content is on other people’s sites, that means they’re the one benefiting from all the positive SEO signals the content is generating – not you.
Asking them to link back to your original post might help. But let’s not forget that those links might be considered unnatural.
So, What’s the Best Way Forward?
Ask the sites that republished your articles to add a “rel=canonical” that points the articles to the original article. That way, your site gets to enjoy all the SEO benefits generated by the articles republished.
Can Google Penalise You for Thin Content?
Yes. Google will penalise you for thin content.
Here’s what they have to say about thin content (rephrased):
“Be careful with publishing stubs. Users don’t like it when your site has so many empty pages. You want to avoid using placeholders where possible.
You’re not to publish a page for which you have no real content. And if you must create a placeholder page, then be sure to use the noindex metatag to stop Google from indexing it.”
Types of Duplicate Content
No two website has the same set of characteristics. In other words, not every website is bound to experience the same content duplicate issues.
A static website is small and only has a limited number of pages. On the other hand, a CMS website has a lot of customised and autopilot features that might trigger content duplicate.
You might also be dealing with a more prominent site with millions of pages and dozens.
In this section of the post, we’ll try to break down different types of duplicate content and group them accordingly:
Internal Technical Duplicate Content Issues
This is where you have the same content appearing in multiple URLs on your website.
It includes issues like:
- Duplicated homepage at html.php and index.html
- Flash microsites, orphaned or broken
- Same content duplicated across multiple pages on your site
- Excessive reuse of snippets or content in a paginated series
- Faceted navigation
- Analytics tracking parameter on your internal page links
- Session ID parameters
- Duplicate content triggered by inbound links
- Inconsistent URL
- Inconsistent use of trailing slashes
- Numerous similar articles
- Thin content or no content on some of the pages
- Repetitive boilerplate snippets
Duplicate Content that’s Very Specific to the Types of Website
These issues are specific to a particular type of websites, especially ecommerce ones.
- Reuse of review copy
- Duplicated titles and meta descriptions
- Blank category pages
- Product copy distributed across marketplaces and affiliate sites such as Amazon or eBay
- Repeated content in the tabbed sections of your product pages, delivery terms or terms and conditions
Hosting-related Duplicate Content
These issues are mostly caused by server misconfiguration.
- No http to HTTPS redirections. Your site can be accessed on both protocols
- Site available on both non-www and www
- Indexed staging site
- Indexed load balancer on alternative subdomains, e.g., the IP Address or www3
External 3rd Party Duplicate Content
This occurs when a third-party website copies part of your blog or content.
- Lazy syndication of a press release that was initially posted in the news section of your website
- Scrapers republishing your posts through your RSS feed
- Sites that directly copy your posts and publish them on their blogs and websites
- Sites that rewrite your content verbatim and pass it as their own
Own External Duplicate Content
This is where you duplicate your content on other sites or blogs.
- Similar versions or copies of your content on the other sites or blogs you own
- The separate mobile version of your website, without rel-alternate declaration or canonical header in the header of the primary site
- Official syndicators
- Misconfigured geo-IP detection
- International domains, subfolders, or subdomains, without href=” lang.”
Most SEO experts flinch at the mention of “duplicate content penalty.” Online marketers who have little or no SEO experience love using this term even though most of them are unaware of Google guidelines on duplicate content. They assume that if an article or just a paragraph appears twice online, Google penalties must be close behind.
Today, we will debunk three common myths about duplicate content that has over been misleading people for years now.
Myth 1# Unoriginal Content on Your Site will Compromise your Rankings Across your Domain
Ever since I started offering SEO services, I’m yet to see real evidence that non-original content affects site ranking except for one extreme case. In this case, a new website was launched, and one of the personnel at the contracted public relations company copy-pasted the home page text into a press release and distributed it to thousands of platforms thereby creating hundreds of versions of the original page. This move caught the attention of Google who manually blacklisted the domain.
It was ugly since we were the web development company that had been hired to develop the site. We were blamed for the misfortune, but luckily the domain was re-indexed after we filed a reconsideration request and explained the situation to Google moderators.
Based on this example, there are three points to note:
- Volume: There were thousands of the same texts on the web
- Timing: All the content was duplicated and published online at the same time
- Context: The content was for a home page of a brand new domain
But this is not what people mean when they use the phrase “duplicate content.” A 1000 words article on a page of a well-established site is not enough to trigger Google to blacklist the site. Most of the sites, including the authority blogs, periodically repost articles that were first published on other sites. Sure, they do not expect the content to rank, but they also know that it will not adversely affect the credibility of the domain.
Myth 2# Scrappers Will Compromise your Site
One of my friends who is blogger is very keen on making sure that he does not violate Google Webmaster Tools. Whenever a scraper site copies one of his blog posts, he quickly disavows any links to his site to avoid hurting the credibility of his domain. He is yet to read Google’s guidelines for disavows and duplicate content.
In the past, I have checked the analytics of several major blogs, and surprisingly, their content gets scraped multiple times per day. The thought that they have a full-time employee whose role is to watch GWT and disavow links is outrageous. They know that duplicate content will not affect their credibility.
The bottom line is, scrappers will not help or hurt your domain or brand name. Most of the scrapers copy-paste the entire article together with the links. Even though the links in the scrapped version of the article will not pass authority to your site, you may get occasional referrals.
However, if the actions of the scraper outrank your site, you need to report the case to Google. Submit the complaint using their Scrapper Report Tool.
Digitally signing your content using Google Authorship will help the search engine to know that you are the original owner of the content. No matter the number of times an article that is scrapped, it will still be linked back to you if you signed it.
It is also important to note that there is a difference between copyright infringement and scraped content. Someone might decide to copy your entire site content and claim it to be their own creation.
Plagiarism is the practice of using someone’s work and passing it off as your own. Scrapers will rarely do that, but some could decide to sign their name on your content. That’s illegal and is the main reason why you need to have a copyright symbol in your footer.
Myth 3# Republishing Your Guest Posts on Your Site Will Hurt its Ranking
I write hundreds of guest posts per month, it is highly unlikely that my audience see all these posts. So, I often republish the posts on my blog to get as much readership as possible. Personally, I make sure that the content is 100% original, not because of fear of a penalty, but the desire to consistently offer value to my users.
Have you ever written an article for an authority blog? I have, and they usually request me to republish the post on my site a few weeks after it’s published. Some could even ask you to incorporate a small HTML tag to the post “rel=“ canonical” Tag.
Canonical is a term that is used to mean the “official version.” When you republish an article that was posted on other sites, you can inform search engines of the particular site where the article was originally posted by using a canonical tag.
Apply the Evil Twin Tactic
If the original article that you are considering to republish is a “how-to” post, you can change it into a “how not to” post. Base the contents on the original research and concept but makes sure that you use different examples and offer more value to the readers. The “evil twin” will look similar to the first one, but it will still be original.
Duplicate content is one of the issues that SEOs and Singapore webmasters have to deal with on a daily basis. Over the years, Google and other search engines have put in place stringent rules to prevent this vice. Sure, your site current ranking and credibility can be lowered if a section of the content on your website is not unique. However, there are specific steps that you can take to prevent such a scenario.
However, before we look at these tips, it is important to note that Google has in the past stated that duplicate content on a site does not attract a penalty unless it appears that the intent of publishing the material was to manipulate search engine results.
There are three categories of duplicate content namely:
- Exact duplicate: Two URLs with identical content
- Near-duplicates: Content that has small differentiators.
- Cross-domain duplicates: Multiple domains that have exact match or deal duplicate content
Consequences of Duplicate Content
1# Wasted Crawls
Search bot lands on your website with a crawl budget. This means that if you have duplicate content, you will waste the bots crawl budget and only a few of your essential pages that do not have duplicate content will be crawled and indexed.
2# Wasted Link Equity
It is possible for pages with duplicate content to gain link authority and PageRank. However, Google will not rank the content, and so you will waste the link authority from such pages.
3# Wrong Listing in Search Engine Results Pages
No one knows how search algorithms work. If you have pages with exact match or near duplicate information, you have no control over which pages are ranked or filtered out. Therefore, the pages that you want to rank may be suppressed by the other less relevant pages.
When to Worry About Duplicate Content
If the duplicate content on your site or blog isn’t malicious, then you have absolutely nothing to worry about.
Let’s hear it straight from the horse’s mouth, from the mother of all search engines herself – Google.
Here’s what Google has to say:
“Duplicate content on a website is only grounds for action on the site of its intention is to manipulate and deceive search engine results. When Google runs across duplicate content, they’ll rank the most authoritative version of the content.”
Google knows how to handle duplicate content. The search engine does a commendable job in sorting out duplicate content.
Still, it’s always best to manually get involved instead of waiting for search engines to sort out the issue.
Why Make Duplicate Content Such a Bad Thing?
Duplicate content won’t get your site penalised, save for a few extreme cases, as we said. But that’s not to say it won’t hamper your SEO effort.
Again, as we said, duplicate content confuses search engines. How? Because search engines can’t decide which page is most relevant to what’s queried.
Search engines are programmed never to display the same piece of content twice. The user wants to be served with options, and it’s the job of search engines to make sure that they do not see the same search result twice.
Duplicate content dilutes your authority, especially when different websites start linking to different versions of your website content.
Causes of Duplicate Content
Not all cases of duplicate content are deliberate. Most cases of them occur by accident.
One deliberate example of duplicate content is when you create a print version of a webpage. The print version is still on the same page with the same content, and when it gets indexed, it creates a duplicate of itself..
That’s one example of how people deliberately create duplicate content. But there are a few other situations when it’s created unintentionally.
Here are five different causes of duplicate content:
A session ID is a string of randomly generated numbers that web servers assign website visitors to track their site activities. They can be found in shopping carts.
Here’s how one looks:
The problem with the session IDs created is that they result in hundreds or thousands of duplicates.
How to Resolve Them?
The best way to resolve this problem is by storing your session IDs in cookies. But take your time to read about EU laws on cookies.
Sorting options aren’t limited to product catalogues only, where buyers can sort products based on price, type, date, and so on. The sorting function is common with almost any kind of website, including a simple blog.
Usually, it looks something like:
The URL with this sorting option is exactly the same as the original page. It carries the same content, only that it’s sorted differently.
Affiliate codes are all over the place. Web owners use them to identify individual referrers and reward them every time they bring in a new customer or visitor.
An affiliate code looks something like this:
Once again, the codes create a replica or duplicate of their original page, which end up affecting your SEO effort.
Domains are another culprit in this.
When not handled with care, they can prove problematic.
Here are two types of domains to look at:
Search engines have advanced a great deal. But for some reason, they still find this confusing.
Both URLs lead to the same page (homepage), but since they both look different, they’re sometimes interpreted as two different pages.
When many users comment on your posts, some of the comments may appear on the next page. The created pages will show the same content, but with a different comment page.
Geotargeting Users with Different Content
Assuming your website targets users from the US, Australia, and the UK. The content you create for each region will be the same but localised to target the different groups of users.
It’s not uncommon to come across a website where each image has its own webpage.
Of course, you can still access the images on your content page, but upon clicking on them, they’re enlarged and open in their separate pages, thus creating duplicate content.
This is common with content management systems.
When you assign a post to more than one category, they’ll create duplicate content.
Unless you choose a primary page, category pages will be marked as duplicates.
Copied Content and Duplicate Content
Duplicate content can also occur when you copy content from another page and publish it elsewhere. This is the textbook definition of duplicate content, but it doesn’t have to be that direct.
Here are a few ways people copy content without knowing they’re creating duplicate content:
When Creating Dedicated Landing Pages for Paid Searches: When creating a dedicated landing page for paid searches, most of the time, you’ll be creating a page that’s almost similar to the original page. Most people only tweak words to accommodate specific keywords, without doing much to make the content unique.
Other Websites and Pull Content Off Your Website or Blog: Unfortunately, immediately you hit the publish button, other websites and blogs will pull the information you share and post it on their own blogs or websites. The problem comes when the website that does this has a higher domain authority than you. They rank better than you, thus giving search engines even more reasons to consider their version of the post over yours.
Using Content from Another Website: Any attempt to copy someone else’s content will hamper your ranking but also taint the relationship you have with other bloggers and web owners.
How to Proactively Address the Duplicate Content Issue?
Luckily for you, there are a lot of ways to optimise duplicate content. Here’s what you can do:
- Delete them: Weigh on the point of having duplicate content on your site in the first place. Of what use is the content? If none, then don’t hesitate to delete it.
- Update duplicate content: If the content has to be there, you can always rewrite it. Replace the content with something original.
- Redirect the content: Instead of having two similar content on two different pages, why not redirect one of the pages to the other? That way, only one of the links end up containing the content, whereas the rest of the links redirect to it.
- Use the canonical link element to specify content authority.
- Use 301 Redirects: After restructuring your website, use 301 redirect to redirect users, search engine bots, and other spiders.
In Apache, this can quickly be done with a .htaccess file. In IIS, you can easily edit this via the administrative console.
- Be Consistent with Your Internal Linking: Decide on the type of linking you intend to use, and stick with it all through.
In which case, you’re not to link to the following type of links:
- Use country-specific TLDs When Serving Geotargeted Content:
When serving geotargeting content, you want to use country-specific top-level domains to serve the content.
- Use https://www.abcdef.de for Germany
- https://www.abcdef.au for Australia
- https://www.abcdef.us for the US
and so on
Minimise Boilerplate Repetition: Instead of writing a very long boilerplate text and copying it on every page, write a summary that you can link to a dedicated page with more details.
You also want to use Google’s Parameter tool to specify how you want their search engine to treat your URL parameters.
How Can You Avoid Duplicate Content
Use Robot.txt File Block
Robot.txt file can help you to block pages that have duplicate content from being crawled. Google, however, does not recommend this approach because if the engine is unable to crawl such pages, it will not tell if the URLs are directed to the same content and will, therefore, have no option but to treat them as unique and separate pages.
Use 301 Redirects
If you are planning to get rid of duplicate content from your site, 301 redirects is an ideal approach for you. If some of the pages have received links, redirecting them to the correct URL will ensure that you still profit from the links. This move will help the search bots to know where to find proper content.
Use rel=”canonical” Link Element
Rel=”canonical” link element will help search bots to know which version of the content is true or original. All you need to do is add the link to the header of the duplicate article or page. For example
<link rel=” canonical” href=“https:mytruecontent.com”>
Should you worry?
From MediaOne’s experience – Google does not penalise duplicate content. Here are some reasons why:
- if you are a subsidiary or reseller, you may not be allowed by your principal to vary the specifications and description of your product or service – penalising you would be grossly unfair
- if you have an e-commerce website – its virtually impossible to vary your content because you have thousands of products from hundreds of suppliers – Google understands the e-commerce sites have this innate business challenge
- if you have multiple branches in different countries – you really wouldn’t want to say things too differently in different countries if you can help it as it can cause significant corporate branding and product/service variations
So what Google does is not penalise but it will not award points either.
Then How Do I Get Around This Issue?
Therefore in order to rank, you will need to vary your content if you are allowed to; OR add more content to reduce the duplication. Here is an illustration on how this is done:
Try to remember the real core reason why Google will want to put you on 1st page. Its because it thinks that what you are saying is USEFUL + ORIGINAL. “Useful” as Google wants to be the Oracle Of Everything so you will learn to depend on it from the moment you wake to the moment you close your eyes. “Original” because if you are simply copying or rehashing what others are saying – why should you be entitled to 1st page is where Google wants to put the BEST answers. To think in another way: if everybody is called “Simon” why should “Simon #9” be promoted above all the other Simon’s?
5 Duplicate Content Checkers
To avoid possible SEO curses triggered by duplicate content, you’re advised to take a few precautionary measures across all websites and within your websites.
There are a few duplicate checkers to help you out with this:
- Copyscape: Is a premium, paid plagiarism checker that lets you identify which part of your website content is similar to the other blog articles on the web. It’s efficient and fast, and can quickly point out any duplicate content and even provide an exact percentage of how much of your content is already floating on the web.
- Grammarly: Grammarly is a free plagiarism checker, designed to detect punctuation mistakes, word choice, spelling, and poor grammar. Their premium account provides some critical suggestions on how to improve your writing style best. Other than that, it lets you check for plagiarism from billion of websites on the internet.
- Duplichecker: Duplichecker will check your article for originality. It’s a free account that allows you to run up to 50 searches per day once registered.
- Siteliner: Siteliner allows you to run monthly check-ups for duplicates or plagiarised content on your site. Other than that, the tool can also help you to identify broken links and which part of your website isn’t performing well in the SERPs.
Small SEO Tools: SmallSEOTools is more of a plagiarism checker. It lets you identify which part of your content is not original or has been duplicated from the content that’s already available on the internet.
Vital Statistics on Duplicate Content
29% of webpages on the internet have duplicate content (Raven Tools).
80% of websites aren’t using microdata.
One of the biggest SEO pitfalls uncovered is the issue of duplicate content.
Of the pages with duplicate content, 22% of title tags and 17% of meta descriptions have duplicate content.
Schema microdata is all the rage. But only 20% of websites have successfully implemented it.
Only 36% of the results in the SERPs display schema mark-up.
83.13% of websites use Google Analytics to track their online performance.
An average site has about 4500 SEO-related issues – 250 of which are link-related, and 3672 have to do with the images used. Follow this link to read the full report.
The Bottom Line
Googlebot crawls sites multiple times per day; it can tell where the original article was published if it finds a copied version of an article a week later on another website, But, does it get angry and impose a penalty on the site? No. That’s basically everything that you need to know about duplicate content.
Even though duplicate content can affect your site ranking on search engine results pages, it is not as scary as most people perceive. Unless the reason you posted the content is to manipulate SERPs results, search engines will not typically impose a penalty. That does not mean that there are no adverse consequences of having such content on your site. It is recommendable to crawl your site and resolve such issues to be on the safe side.
Here at MediaOne we will assess your site, reduce the duplicate content where necessary and create strategies to add in new original content to help you score. Give us a call at 6789 9852 today!