There seems to be some confusion when it comes to duplicate content. We need to have more clarity on this issue and the first step to achieve that is to consider what the major search engines themselves say about duplicate content. I choose Google and Yahoo, because at the present time, they seem to have the most users.
Before we dive into the main discussion, there are some things that I would like to tell you. First, this article only covers duplicate content across sites/domains, so the term “duplicate content” used in this article always refers to that type of duplicate content and not to duplicate content within one site/domain.
Second, while I tried my best to interpret what Google and Yahoo say, it’s still an interpretation which involves my own subjectivity. So, I suggest you to also check the references used in this article. The complete list of the references can be found at the end of the article.
Having said that, let’s find out what Google and Yahoo say about duplicate content…
According to them, what is duplicate content?
Google defines duplicate content as “substantive blocks of content…that either completely match other content or are appreciably similar.” It doesn’t include “occasional snippets” such as quotes and different language versions of a content (see “Duplicate content” and “Deftly dealing with duplicate content”).
As of Yahoo, they don’t specifically define duplicate content. However, they mention three types of unwanted pages/sites that can be commonsensically categorized as duplicate content, namely “multiple sites or pages offering substantially the same content,” “pages that rely heavily on content…created for another website” and “pages that harm…diversity…of search results” (see “Yahoo! Search Content Quality Guidelines”).
According to them, what are the consequences of having duplicate content in your site?
Yahoo only says a little about this. They only say that they can take any action to ensure the quality of their index and it may include excluding sites/pages that violate their site guidelines or removing such sites/pages from their index. Since the aforementioned three types of unwanted pages/sites can be deemed as sites/pages that violate their site guidelines, they all have the risk of being excluded or removed from Yahoo’s index.
As of Google, they divide duplicate content into two types, namely malicious and non-malicious duplicate content. The former refers to content that is duplicated to manipulate search engine results, while the latter means the reverse, namely content that is duplicated without the intention to manipulate search engine results.
Google doesn’t apply any penalty to non-malicious duplicate content. But they do apply filtering, which means that in a given search result, they will only show one version of a duplicate content that they view as the most appropriate and dump the rest to the omitted search result.
But how Google decides which version of a duplicate content is the most appropriate to be shown in a given search result? Well, there are at least two factors that Google uses to identify which version is the most appropriate, namely (1) the site’s authority and (2) the amount of links pointing to the duplicate content’s page.
As of malicious duplicate content, Google will apply a penalty which may take the form of making the site’s ranking suffers or removing the site completely from their index. Unfortunately, they don’t give us much information on how they determine whether a certain duplicate content is malicious or not.
References:
From Yahoo:
Yahoo! Search Content Quality Guidelines
Why can’t I find my web pages in your search engine
My site used to be in your database, but it is no longer showing in search results. What’s wrong?
From Google:
Deftly dealing with duplicate content
Duplicate content summit at SMX Advanced
Duplicate content due to scrapers
Demystifying the “duplicate content penalty”
Related Articles:
- Google Adsense. Interesting Points to Take Into Consideration
- step to the summit of google by utilising article promotion, form backlinks quickly without mutual linking
- Get Ahead In Yahoo, Google And Bing With Top SEO Software
- Simple Trick To Come Up With Idea For Product’s Content
- Google Page Rank Check and Commission Ritual: Effective tools to get your site on top
