Friday, September 12, 2008 at 8:30 AM
Duplicate content. There's just something about it. We keep writing about it, and people keep asking about it. In particular, I still hear a lot of webmasters worrying about whether they may have a "duplicate content penalty."
Let's put this to bed once and for all, folks: There's no such thing as a "duplicate content penalty." At least, not in the way most people mean when they say that.
There are some penalties that are related to the idea of having the same content as another site—for example, if you're scraping content from other sites and republishing it, or if you republish content without adding any additional value. These tactics are clearly outlined (and discouraged) in our Webmaster Guidelines:
- Don't create multiple pages, subdomains, or domains with substantially duplicate content.
- Avoid... "cookie cutter" approaches such as affiliate programs with little or no original content.
- If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.
(Note that while scraping content from others is discouraged, having others scrape you is a different story; check out this post if you're worried about being scraped.)
But most site owners whom I hear worrying about duplicate content aren't talking about scraping or domain farms; they're talking about things like having multiple URLs on the same domain that point to the same content. Like www.example.com/skates.asp?color=black&brand=riedell and www.example.com/skates.asp?brand=riedell&color=black. Having this type of duplicate content on your site can potentially affect your site's performance, but it doesn't cause penalties. From our article on duplicate content:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don't follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.
This type of non-malicious duplication is fairly common, especially since many CMSs don't handle this well by default. So when people say that having this type of duplicate content can affect your site, it's not because you're likely to be penalized; it's simply due to the way that web sites and search engines work.
Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy. You can find details in this blog post, which states:
- When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
- We select what we think is the "best" URL to represent the cluster in search results.
- We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.
Here's how this could affect you as a webmaster:
- In step 2, Google's idea of what the "best" URL is might not be the same as your idea. If you want to have control over whether www.example.com/skates.asp?color=black&brand=riedell or www.example.com/skates.asp?brand=riedell&color=black gets shown in our search results, you may want to take action to mitigate your duplication. One way of letting us know which URL you prefer is by including the preferred URL in your Sitemap.
- In step 3, if we aren't able to detect all the duplicates of a particular page, we won't be able to consolidate all of their properties. This may dilute the strength of that content's ranking signals by splitting them across multiple URLs.
In most cases Google does a good job of handling this type of duplication. However, you may also want to consider content that's being duplicated across domains. In particular, deciding to build a site whose purpose inherently involves content duplication is something you should think twice about if your business model is going to rely on search traffic, unless you can add a lot of additional value for users. For example, we sometimes hear from Amazon.com affiliates who are having a hard time ranking for content that originates solely from Amazon. Is this because Google wants to stop them from trying to sell Everyone Poops? No; it's because how the heck are they going to outrank Amazon if they're providing the exact same listing? Amazon has a lot of online business authority (most likely more than a typical Amazon affiliate site does), and the average Google search user probably wants the original information on Amazon, unless the affiliate site has added a significant amount of additional value.
Lastly, consider the effect that duplication can have on your site's bandwidth. Duplicated content can lead to inefficient crawling: when Googlebot discovers ten URLs on your site, it has to crawl each of those URLs before it knows whether they contain the same content (and thus before we can group them as described above). The more time and resources that Googlebot spends crawling duplicate content across multiple URLs, the less time it has to get to the rest of your content.
In summary: Having duplicate content can affect your site in a variety of ways; but unless you've been duplicating deliberately, it's unlikely that one of those ways will be a penalty. This means that:
- You typically don't need to submit a reconsideration request when you're cleaning up innocently duplicated content.
- If you're a webmaster of beginner-to-intermediate savviness, you probably don't need to put too much energy into worrying about duplicate content, since most search engines have ways of handling it.
- You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties! The remedies for duplicate content are entirely within your control. Here are some good places to start.


52 comments:
I blog about Colorado Hiking, Biking, and Camping and wrote some Knols with basically the same content. Do my Knols put me at risk for getting my blog penalized for duplicate content?
Thanks.
I ran into the similar content issue when Classmates wanted to expose the user profiles, but didn't want the user's information exposed on the page. So how do you make over 40 million pages unique when all you have to work with is a name and a school they attended?
I talked to an SEO expert about it and she suggested writing unique content for each page "just block out x amount of pages each week to be written" - seriously? on 40 million pages?
I ran a test to see what was considered unique content to see if we could pull the information from the database into the content on all the pages in the way that would have them recognized as unique enough not to get filtered out.
You can read about it
here
What about a blog portal website that aggregate RSS content from multiple blogs or websites on a similar theme?
For instance, what if I have website that aggregates and republishes blog posts from blogs by Amish farmers, and each aggregated post receives its own permanent url on the aggregating website that adds information about related posts from other aggregated blogs, and cross blog tags?
Some of the blogs post full-content feeds and so their post is substantially duplicated on the portal.
The portal version clearly attributes the source of the post to the originating blog and includes a link back to it.
No advertising is displayed on content from other blogs.
Will the portal get flagged for duplicating content?
This is a real world scenario. I currently operate such a portal--though not for Amish Farmer blogs :), if there are any.
What about scraped content from for example wikipedia.org? I've seen some sites having 1:1 copies of wikipedia articles. Is there a penalty in those cases?
Question: what about stuff like this..
http://googlewebmastercentral.blogspot.com.tynted.net/2008/09/demystifying-duplicate-content-penalty.html
what happens when this shows up in search results like...
http://www.google.com/search?hl=en&q=site%3Atynted.net
So is it ok to use services like blogburst that use my content on other sites?
Ok... Identify if I have multiple domains, and have the same content on both URLS... I guess that a redirect would be a better idea?
I'm assuming it's bad if I own both domains (for instance, not a real example)
bikeridingsomething.com
and
bikeridingsomethingelse.com
and they have basically the same data? Ideas on best practice? Redirect? Leave it as is? Muh...
1. Duplicate content is often necessary due to geolocalisation of SERP (for instance, information posted in French in a Canadian site cannot be found by people searching in France within France pages)
2. Google directory is duplicate content of DMOZ directory.
No penalty. Why? Why?
hai, i'm newbloggers.oh my..now i knew if duplicate contant can make penalty. thank for information.
mesotheloma
I often re-write ads that I'm an affiliate to. I do this because the ads are often poorly written and I can't get clicks through them. Sometimes my ads do have a testimonial that is the same or I include one of the good sentences that are unique to the original ad. Is this considered duplicate content?
Sorry! i don't talk about technique
but I request from you
Please added some help or tips in first page about Hurricane like a
- Catarinas
- Google Logo same Olympic
- The most popular search about its (this can help what happen and maybe forecast events..)
thank
What about Hostheaders? I hear these are also condisidered to be duplicate content.
Excellent article, but it missed an important source of site duplication. As @jasmin pointed out, some duplication comes for geolocating domains. I mean, "domain.com/french" may have the same content as "domain.fr". The issue here is that we couldn't do a redirection from .fr to .com because geolocation: the .fr domain is intended to people living in france, so they have diferent products. BUT although they have SOME diferent products, 90% of the site is the same.
Thanks for writing this post. But I am still very concerned about how sites are **still** disappearing from Google for pages main search phrases, when other websites pages copy either meta descriptions, or content including the search phrase on the pages.
You did not write a response to my post in popular picks http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/e213eec10610a481, but I hope that this post of yours is a try at a response. So thanks.
BUT - You say that we should not be concerned about scraper websites copying meta descriptions and content around search phrase on our pages. How then can I have a client website where people have copied my clients content, then I make it unique again, and the homepage reappears on Google??? How can it happen a second time, and even when I make the content unique again, the homepage is still not appearing for the main search phrase.
This seems to be firstly a simple dup content penalty that I was able to fix by making content unique, then a more draconian penalty that is keeping the homepage ranking for the phrase despite the now unique content.
Full case study on my blog:
http://www.searchmasters.co.nz/articles/160/sites-disappearing-from-google/
I have filed a reinclusion request for the clients site to try and get it back. However, I would rather the issue did not exist in the first place.
Google - please get your heads out of the sand. The issue STILL exists, despite your protestations to the contrary.
my site showed a lot of dup title and discription meta tags. a lot of coping pages using them as a template even though the content was very diferent on each page.
i have bee updating and correcting these errors but in the dashboard diag it still shows the old tags. i was wondering hoe often the site is crawled and when would the changed tags show up as changed instead of the old tags.
in the dashboard it does not show when the site was last crawled or when it will be crawled next? just wondering
Amen to all of that but I've noticed that the scrapers, if they are from a well established site, can take the blog content I've written re-produce it verbatim and rank instead of me.
In some cases I've noticed plagurists have taken articles from me and put them on their sites and then embedded links to the images from the article too. Not only are they stealing my content, they are also stealing my bandwidth and they still rank instead of me.
Nicely put. Well, i still have few questions arising in my mind. First of all what is Google's perspective on sites which have to use almost similar content e.g cheat sites as cheat codes cannot be altered to make them unique as they totally loose their meaning plus a lot of cheats are submitted by users who submit their content on several sites which makes it impossible to make every page unique.
Secondly some times when i publish content on my site it is instantly copied on few blogs which then get indexed way before me.
Now since i have thousands of pages of content and i wonder how to locate each page which is being copied and each site copying my content and filing DMCA against them.
I will highly appreciate if you could at least reply to first part of my question as my site has lost its serps and your answer might put me in right direction to fix my rankings.
The general definition of "duplicate content" that effects rankings for a certain phrase:
- the first instance of a search phrase in text should have unique content in the two words before and after that search phrase (its my educated guess at "two". One might do, might be "three", but I do it for two and it seems to work.)
ie if the search phrase is "blue widgets"
then you need to make sure that your phrase:
"the best blue widgets in America" is not on any other website.
so its easy enough to make even your cheat code pages unique - just have some machine generated text before and after the exact search phrase:
ie "bestcheatcodes.com warcraft3 cheat codes for free" - and make sure that phrase is unique.
To make sure a phrase is unique - Google for it with quotes around it.
And the great thing about machine generating the before and after text is that when other people copy your sites content, you make just the one change in your template, and your thousands of pages are unique again.
Simple - apart from when Google bans your page because its been copied too many times - or that is my take based on some case studies.
Spanish translation of this article here
Traducción al español de este artículo
Hi: A fraudster has set up a website which duplicates all our pages. As soon as we upload a new item to our site it appears also on the fraudulent website. The website uses our names to make it look genuine. What can we do? How can we inform Google? Any advice would be appreciated. Thanks
I have duplicate content on my new blog (www . webmasterinter . net) and on my blogspot.
It's because I haven't got good domain when I decided to start.
I;m happy to see that I'm not gonna be penalized much, cause new blog has updated version of that old post.
Anyway, when do we read about google sandbox?
This is very useful information, thank you. I don't think I'll worry about getting these 'penalties' so much anymore :)
You see people "warning" about duplicate content all the time when talking about articles submitted to article directories and the like. I've been reminding people that if "duplicate content" the way they are saying is a problem then the article directories would have died out long ago. And what about the sites that syndicate the articles? Wouldn't they have disappeared as well?
My query relates to my site “http://youpark.com” and feedurmobile.com
Problem is my site URL were query based and now I have made the SEO friendly and wrote rule for redirection. In normal browse all links are valid. But on webmaster tool I saw that following type of links are being reported. I don’t know how crawlers are crawling and generating these from my application.
http://www.feedurmobile.com/o2-xda-phone&manufacturerName=o2/MobileApplications/WL/147/2022/portalbrowsebycategorycb?categoryName=synchronization&devicename=o2-xda-phone&manufacturername=o2-software&portalID=147
whereas above is the combination of two urls that webmaster reported. It should be
Current URL – SEO friendly
http://www.feedurmobile.com/o2-xda-phone&manufacturerName=o2/MobileApplications/WL/147/2022/
and OLD URL
http://www.feedurmobile.com/portalbrowsebycategorycb?categoryName=synchronization&devicename=o2-xda-phone&manufacturername=o2-software&portalID=147
or
http://www.youpark.com/Symbian/IM for Skype for Symbian S60 v.3/25800/Product/
it should be
http://www.youpark.com/Symbian/IM for Skype for Symbian S60 v.3/25800/ without word "product" at the end.
Please help me to dig this.
My site http://zaxarius.altervista.org/ seems to be considered a duplicate of other websites that actually uses the same CMS (actually phpNuke). You can spot them in here: http://www.google.com/search?q=5+clone at the 1st and 2nd place. The first title is taken from the ODP, in this category.
How is this possible?
The three sites actually DO NOT resemble the same at first glance. I guessed that the similar header and table structure generated by the CMS would make this possible, but looking from user-side I think this should not happen, because pagese ARE NOT similar! They are different websites at all! Can anyone explain this?
I have several sites on different medical topics, however I license syndicated content from several sources. I know this content is not "unique" per se, but it is not scraped or "duplicated"... it is content I pay a lot of money for... I know this content appears in many other sites and in many cases I cannot alter the text as that would be a breach of contract. Do you have any ideas on how I can avoid the "duplicate" content penalties for this. Thanks
I have uploaded a website which is http://www.utsav-collections.com
BUT, I haven't found into google search results. Which is i have uploaded last 20-35 days. I ahve uploaded also sitemap. but still it is not found. what is the reson?
I work for a company that is creating over 200+ local pages & we've
stressed to them the importance of creating unique copy for each
page. As we know large companies don't want to put as much time into
creating that many unique pages. The company & services are all the
same in all the areas so to them all the pages should be the same. So
the pages will be under a root domain & sub folders, such as "www.root-
domain.com/DMA-city/product", where the DMA would be the only thing
that would change.
Also, the meta data & titles would be different only by the DMA & then
the content on the site lets say its 500 words that speak to the
product & DMA (which would be the only thing changed).
These are all under the same root domain & are duplicate content how
can they build all 200+ pgs & make them indexable while avoiding
duplicate content. Would changing the DMA in the 500 words, meta data
& title be sufficient if the DMA is listed at least 10-15 times on the
page.
It seems to me that they are willing to create 200+ unique pgs when
the products are the same for every market, so to have individual
local pages for each market would cause them under the current method
to have duplicate content issues. I know it says large blocks of
content the same or similar, if these were individual URLs or sub-
domains would they resolve the issue, e.g. "DMA.root-domain.com, or
www.root-domain-DMA.com
hello, google!
i have these 2 blogsites, i pray google crawls and ranks them good -but i am not a web lady so if anyone can help to make http://gifts-parasayo.spaces.live.com and http:www.linkedin.com/in/philippineslifestyle
i thank you,
regards!
gem
would laborpains.org/, server1.laborpains.org and http://www.unionfacts.com/blog/ which are all carbon copies of the same pages be good examples of bad ways to duplicate your site?
@j. max wilson:
If your aggregation site only republishes the feed content and doesn't add significant value, it's possible that it will be filtered out in search results (we'll may show the original articles instead of your aggregation site, depending on what a user is searching for). However, if your site adds some type of significant value, you may be able to rank for that aggregated content. This article may shed more light on it (the tone is a bit wrong for your situation, since you say you're not scraping, but the recommendations at the bottom of the article may be helpful).
@aroedl:
That kind of scraping could be eligible for a penalty.
@paisley:
That type of stolen content may still appear in a site: search, but you'll notice that if you search for any portion of the actual article, our blog (the originator of the content) shows up first and all other results are hidden with the message "We have omitted some entries very similar to the 1 already displayed." That's what we mean when we talk about grouping and filtering: we may index duplicate content, but we try to group it and only show 1 version when we actually serve search results. Searching for text will give different results than a site: search.
@steve:
If you own more than one domain and they all have the same content, I'd definitely recommend doing a 301 redirect from each page on domain A to the corresponding page on domain B. This makes sure that people will end up at the right place no matter which URL they enter, but only one version will be indexed.
@susan moskwa
Thanks... I appreciate the input. I was thinking that I was going to have to do something of that nature. Again, thanks.
@Susan Moskwa
Thanks for your response. I have carefully read every page that I could find related to duplicate content in the webmaster knowledge base, and it is still unclear.
It seems to me that determining whether or not a site "adds some type of significant value" can be pretty subjective. Is that determination made by an automated algorithm or does it involve human evaluation?
Would showing potentially related posts from other blogs by analyzing links and keyword count as significant added value?
I think that it should.
In either case, I appreciate your input and will see what I can do to make sure that my aggregation website adds valuable additional information such as community ratings.
Thanks!
@dijkstra:
In the example you give (example.com/french vs. example.fr), instead of serving the same content on both of those domains, you can redirect one to the other and then use Google's geographic targeting tool to target example.com/french to France.
@searchmasters:
I made an open call here for people to send in examples where they still feel we're not handling duplicate content correctly. I'll pass your case study along to the right folks.
@chris:
If you're displaying the same content that many other sites do (through a syndication deal), then that content is duplicated; not maliciously, in your case, but it is still being duplicated (the same content appears in multiple places). As I mentioned in my article, if you want to rank for that content, you'll need to add a lot of additional value for users that isn't available on any of those other syndicating sites. It's also fine to syndicate content without adding value if your site's business model doesn't rely on search engine traffic.
For those of you concerned about scrapers, you can read this post and/or file a DMCA takedown request.
@j. max wilson:
I agree it's a bit subjective, but it's hard to give specific examples because what "adds value" will be different for every site. Usually it means having something on your site that can't be found anywhere else. Ask yourself: why would someone come to my site rather than to the site that originated this content? What kinds of content or features would make them want to bookmark my site and share it with friends?
You may want to try experiment with a website testing/optimization tool to see what users respond well to.
@Susan Moskwa - thanks
The clients website has still been penalised (now a month later) based on being copied twice within a month.
I have other clients sites that have been similarly penalised, so I am very keen for your engineers to look at this with all urgency.
Appreciated.
@ Susan Moskwa
Thanks again for your helpful feedback.
Not to be a pest, but I am still curious about whether the "adds value" determination is made by algorithm, a person, or a combination of the two?
Thanks for all your work!
@J. Max Wilson
Please refer to my post - http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html?showComment=1221566400000#c5859499633564154704
Get unique words around your search phrase for starters. Then experiment with how much additional unique content per page you need, to be able to get the page ranking for that search phrase.
Yes, my post was a little simplistic, since Google is able to see past the opening paragraph, and see that the remainder of the page is the same. ie Google can see that many article submission websites have the same article, even when there are unique words around search phrases in opening paragraphs.
I have often used what I call "semi random" - Creating say 5 variations of an opening paragraph that I plug search phrases into for database generated pages. Then based on say the record number, I always use say the 4th variation for a certain page, and the 3rd variation for another page.
I have very successfully used this method for directories say of the local outlets of a retailer on their website, or jobs/categories on a jobs website.
The "adding value" is an algorithmic check, so get your algorithm working and you can get pages ranked.
OMG Google is revealing information the common man can understand. After years of trying to explain this thankfully your company came out of the closet. Not bad only took a decade!
@ J. Max Wilson said...
Not to be a pest, but I am still curious about whether the "adds value" determination is made by algorithm, a person, or a combination of the two?
Thanks for all your work!
With billions of webpages do you think there is much human hand work mucking about in the writing of the search algorithm??
I would say, most everything is done by algorithm, to enable Google to secure their environment as much as possible, with the max productivity and without the need for continual human intervention.
;->
Italian Traslation
http://blog.imevolution.it/91/cose-veramente-la-sanzione-del-contenuto-duplicato/
I duplicate content to areas where I know I have viewers who strictly stay inside their own "walled community," like Blogger, for example (which you are required to have a log on in order to comment) or Live Journal. Will I be penalized for duplicate content when my intent is getting to people who refuse to visit my site rather than aggregate in their own community?
Wow this is the most helpful tool I have ever found on duplicate content so far!
I have a small service company in Phoenix Az and I am pretty sure my site has been punished for duplicate content.
I created a site windowcleaneraz.com, however I am new to this whole web development thing and have learned about how domains with your keywords are better for search results and since acquired a better domain windowcleaningscottsdale.com, I developed the new site and all of sudden one day my traffic for both sites went from about 3 or 4 a day to zero and has been zero ever since, I found both sites on the 6 page results when they used to be #1 and #2 for my search of "window cleaning scottsdale"
While this was happening I had just finalized a purchase on a new domain windowcleaningphoenix.com for $600 which i think is the best I could have for my service and area and now have finished the design of that site.
I really dont even want the other two domains, windowcleaneraz.com and windowcleaningscottsdale.com but I own them for a year. I have since done a little research and confirgured a 301 redirect from both the old domains to the new windowcleaningphoenix.com but I fear it may be too late. The damage has already been done.
Is there anything else I can do to help the matter and get back in the results again?
I have done extensive research and all I seem to find is poeple's opinions, this blog my be just what I need.
I really wish I could just remove the two old sites from googles index altogether and let the domain registration run out.
Thanks for any advice/help/ideas
it is much appreciated
Ryan
Where does eBay fall into all this? They list duplicate content all the time that also appear on other eCommerce sites. They also publish content based off of SKUs.
I was wondering what would happened if someone stole your content and re-published it on their own site. Would you be penalized for this?
@Gods Princess:
Check out this article.
A couple of previous comments have pointed out a similar problem to mine. I would like to have 2 similar sites running along each other for the .com and .co.uk top-domains. Most of the content would be the same except for some portfolio examples and the contact details. Google Searches within the UK would probably rank www.mydigitalpartner.co.uk higher up, while the "generic" mydigitalpartner.com does quite well when not selecting "Pages within UK".
So what do I do?
Redirect the .co.uk domain to .com, which would impact negatively on UK searches.
Upload the same content to .co.uk domain, taking the risk of being penalised?
How do other companies create country specific pages when the content is written in English and over 90% is the same?
Many thanks in advance,
Regards,
Greg
Let suppose i am publising pages as it originally posted at somewhere else but i add some remarks/ comment on the site. Is this a case of penalization by google
Very nice post. I wish there could be a little more info coming from google about how they are interacting with your site. The newest updates to google webmasters is great, although not updates as much as I'd like. I'm on there everyday! We have content that is coming from our supplier for our products and it appears we got dinged for it but I can't be sure. Since then we've re-written all unique product descriptions for about 600 products. I wish I could know for certain if this was the cause. I'll keep reading this blog. :)
- John
Dog Supplies
RSS Aggregators such as Bloggapedia or Blogcatalog will take the legitimate site out of the search results, inluding quoted text (I think in the range of the first 200+ letters). This is a fact, not fiction. It is clearly a duplicate (aggregated) content but their site will be the only one in the Search Results.
So whatever you do, do not give your feeds to aggragators. I am resorted to putting IP bans on their bots.
All my web sites are set up to deliver the same page for the following URLs:
http://www.mydomain.com
http://www.mydomain.com/index.html
http://mydomain.com
http://mydomain.com/index.html
I'm guessing from this post that there is no penalty for any of these. Is that correct?
I only publish the mydomain.com on printed materials but some people still put the www. on the front out of habit.
Why does anybody use the www. on the front anymore? It seems like a waste of four keystrokes and bandwidth.
www. is actually a subdomain and could have a different content. usually it doesn't.
Post a Comment