Feb
28
Are you a web programmer familiar with LAMP stack and want to work from home? Please fill out an application here! Full time job, salaries range from around $1,000-$6,000/month.
Tricks For Avoiding Duplicate Content Penalties
Having quite a bit of experience handling duplicate content problems in A Little Google DeIndexing Puts Things In Perspective, An Accident Discovers The Cause Of My Google Deindexing, Site Pages Reindexed, and from running WM Media, here’s a few tips for bloggers (and other website owners) to avoid Google from assigning a duplicate content penalties to their blogs.
www vs non www – If your website is accessible via both www.domain.com and domain.com, then Google will end up indexing both at some point. To fix it, you should redirect everything (using a 301 redirect) to www.domain.com. Just modify the .htaccess file of your public_html folder and use the following code:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com [NC]
RewriteRule ^(.*) http://www.yourdomain.com/$1 [L,R=301]
slash vs no slash – If any of your website addresses is accessible with a trailing slash or with no trailing slash (for example, www.whatithinkabout.com/do-you-get-money and www.whatithinkabout.com/do-you-get-money/), Google also treats those as two separate webpages. Again, you’ll have to redirect no slashes to slashes. For wordpress, simply use the redirect plugin.
index.php vs no index.php – Another way the same page can be indexed in Google is if your webpage is available via the index file also. For example, www.whatithinkabout.com and www.whatithinkabout.com/index.php. In this case, you should redirect index.php to www.whatithinkabout.com via the following snippet in your .htaccess file:
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} index.html
RewriteRule .* http://www.yourdomain.com/? [R=301,L]
Categories & Archives – Make sure you add a noindex tag to these. Since these pages are all snippets of your other pages, it’s all just duplicate content organized differently. Therefore, you don’t want Google to index these pages and possibly devalue your article page(s)! Here’s another handy plugin for this: No index plugin
Search Box – The search box is another dangerous place where duplicate content may be indexed. For example, www.whatithinakbout.com/?s=development would be indexed if someone referenced this search result from another blog. Therefore, it is recommended that you disable search all together (by redirecting anything with parameters) and just putting in a Google search box (you make more money from that anyway!)
Oh, so to disable it, you can probably just redirect everything with parameters (since you should have permalinks for your posts anyway):
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/wp
RewriteRule .* http://www.yourdomain.com/? [R=301,L]
Title And Description Tags – If all your pages have the same title and description, then you’re making Google work that much harder to decipher what your pages are about. It would be wise to add a different description tag to every one of your pages, just in case the content seems similar. Besides, search engine visitors need to find your description compelling to click through to your blog! If you don’t have one, Google may not be so great at picking one for you! To do this, just install the description plugin!
Why Avoid Duplicate Content?
There are three good reasons to avoid duplicate content.
1. The ranking power of your page is decreased if the pages are spread apart. For example, incoming links to www.whatithinkabout.com and www.whatithinkabout.com/index.php could have been pointing to the same page. That means neither page ranks as highly on Google as just one would.
2. Possible duplicate content penalties. Since Google’s algorithm isn’t completely open and they need to prevent spam, it’s logical to assume that they may assign some sort of penalty to your website if there’s too much duplicate content. Imagine if tons of your blog’s searche pages got indexed (such as www.whatithinkabout.com/?s=blog), in which case you may have 1000s of pages of duplicate content!
3. The number of indexed pages is capped. Google has some limits on how many pages of a website can appear in their index. For example, site:www.google.com only shows 30 million pages or so indexed, whereas it probably has trillions of pages. Therefore, if you have 10 duplicate pages indexed, then that might just crowd out that all those important page you spent hours writing!
The Wrath Of Duplicate Content
That’s about it for duplicate content. It might not seem like a big deal, but it actually makes the difference between your blog being virtually unnoticed in search engines vs. your blog being one of the best ranked ones out there!
Just think about all the points above! Let’s say you don’t do any of them. Then google may index all of these URLs:
http://www.whatithinkabout.com
http://www.whatithinakbout.com/
http://www.whatithinakbout.com/index.php
http://www.whatithinakbout.com/index.php/
http://whatithinkabout.com
http://whatithinakbout.com/
http://whatithinakbout.com/index.php
http://whatithinakbout.com/index.php/
Additionally, let’s say you have one category and 5 pages of content. You’ll also have these pages indexed:
http://www.whatithinkabout.com/category
http://www.whatithinkabout.com/category/
http://www.whatithinkabout.com/post-1
http://www.whatithinkabout.com/post-1/
http://www.whatithinkabout.com/post-2
http://www.whatithinkabout.com/post-2/
http://www.whatithinkabout.com/post-3
http://www.whatithinkabout.com/post-3/
http://www.whatithinkabout.com/post-4
http://www.whatithinkabout.com/post-4/
http://www.whatithinkabout.com/post-5
http://www.whatithinkabout.com/post-5/
http://www.whatithinkabout.com/2008/02/post-1
Plus other archive links
…
Plus the non www versions of these
…
Not to mention if someone links to www.whatithinkabout.com/?s=post-1 in a search, that’s another 10 or 20 links.
Yet, how much content do you actually have here? Just 5 posts! What if Google indexes say only 90% of your website? With this setup, it’s possible that Google indexes all the duplicate content pages and none of your real content pages get indexed! Compare that to five pages ranking really well!
It seems like such a small thing, but it’s such a huge difference!
If you feel that this post has been of value to you, please leave a donation to show your appreciation and allow me to bring this value to other people as well!
Ask a question or discuss this post in the personal development forum.
Related Posts
How Much Is Your Life Worth?
How To Start A Successful Blog
About
The Guy Who Keeps Hitting Himself
An Accident Discovers The Cause Of My Google DeIndexing
An Eye Experiment Begins
How To Find A Good Investment
My Google Ranking For The Keyword “Blockbuster”
Taking Risks
Why You Should Be Careful When Your Business Is Going Well
Free Personal Development Email Updates
Not sure when the next article will appear?
Why not subscribe to email updates and get articles delivered to you instead?
Hi An INTJ,
Just a question. How much should I change the content so as not to be considered duplicate. Would a 30% change be enough?
Thanks,
Horacio
Hey Horacio,
Yeah, probably. Although, best to check with Copyscape.
Thanks,
Warren
The duplicate content penalty does not exist.. to an extent.
“There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.”
With that being said they also note that such a penalty does exist but only under certain sitations. An example that would get you penalised for duplicate content is as follows.
- If you are caught leaching or scraping information from other sites and republish it without adding any additional value to the information.
- Don’t create multiple pages.. subdomains.. or domains with substantially duplicate content.
- Avoid… “cookie cutter” approaches such as affiliate programs with little or no original content.
- If your site participates in an affiliate program make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.”
You can read more about it @ http://www.domainstructure.com/2008/09/duplicate-content-penalty/
and also @ http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html
These are great suggestions. Modifying the htaccess file is a wonderful idea that I think most people do not implement. Thanks, Tim
Some interesting points…
It would really great if we can get some ‘official’ person say from google to spill the beans on how all this works.
But, as they say the algorithms change all the time. I suppose a lot of the duplicate content is not intentional and I’m sure it would be long before google (and others) have a way around this. At the end of the day, consistently good content which developes support and links should be the logical way to go… I get the impression that Google is always trying to make things easier and reduce SEO cheating. In the ideal world can’t we just publish content and thats it???
As I read a lot about this I would say, that it is true, that the “duplicate content penalty” does not exist in a way a lot of people use the phrase.
But! Duplicate content makes your pages rank lower in the search results because they will divide the ranking factors between the seemingly duplicate pages.
So I would say that “duplicate content penalty” exists, just not in the way most people use the phrase. Therefore I advise everyone to make use of these suggested changes in the .htaccess file. At least it can not do any harm to do it.
Thanks for such a nice help. i have a question that how to block all the queries in search page except few in robot.txt?
http://bharatclick.com
Good article and important advice, and thanks for calling a spade a spade. Google’s stand on this is just spin IMO. If it finds two web pages with the same content, it shows one and drops the other. Fair enough, there is too much redundant content on the web, and no-one wants to see successive search results that are all basically the same. But Google tries to say it’s not a penalty. That’s just playing with words, ISTM. If you got sent to the back of the queue at the airport because your bag was the same colour as the person in front, I think you would feel you’d been penalised…
Good advice to any fellow blogger. I would also recommend keeping 80%+ of your content as unique as possible. So many bloggers are trying to use duplicate content and your ruining your own site, and possibly hurting others who work hard at writing quality content for their readers.
Thanks for the great post and the great ideas of How To Avoid Google’s Duplicate Content. By this post I get great info about it. Thanks for sharing.
Thanks for this informative article. I implemented everything you suggested (aside from the second plugin, cuz I have one already that does that, thanks tho). Let’s hope my sites do well this little tweaks.
Thanks again!
Take care!
Thank you,this very helpful post..
i change my blog url to premium, and change my blog title, but my new blog url and new title not indexed by google,,, i’m very sad