Wednesday, June 27, 2012

How to reduce invalid URL crawling and indexing:Website optimization

Many websites, especially B2C, the condition of product filtration system (such as the choice of product brand, price, size, performance, parameter and so on) will produce a large number of invalid URL, of course, known as invalid only from the SEO perspective, these URL and cannot generate SEO role, but had the negative effect, so the URL is not included for reasons including:
A large number of filter conditions page content duplicate or very similar to (a lot of copied content will make the site as a whole decline in the quality)
A large number of filter conditions page does not correspond to the product page no content (eg, select "less than $ 100 42 inch LED TV)
The vast majority of filter conditions page ranking ability (rank much lower than the category page), but a waste of some weights
These filter conditions page is not the product pages will include the necessary channels (product page should have the other within the chain to help crawl and included)
Great waste of crawling a large number of filter conditions page spidering time, resulting in useful pages will include the opportunity to decline (filter conditions page combination is a huge amount of)
So how do you try to make these URLs will not be crawling and indexing, and included it? A few days ago a post how to hide content may also become the SEO issues discussed similar issues, such filters page is to think one of the hidden content types. But unfortunately, I can not think of the perfect solution. Cloud morning watch, I think that can not be a perfect solution.
Steps / methods:
One is not to record the URL remain dynamic URL, intentionally even more dynamic and better, to stop is to crawl and index. However, search engines are now able to crawl, indexed by dynamic URL technology, and more and more is not a problem. Although many parameters to a certain extent is not conducive to collect, but 4, 5 parameters are usually also included. We cannot be sure how many parameters need to stop included, so cannot be used as a reliable method. But these URL received within the chain, and there is no ranking ability, will waste some weight.

Second methods, robots prohibition included. Similarly, URL received within a chain will receive a weight, the robots file forbidden creeping these URL, so the receiving weight cannot pass out (search engine crawling won't know what export link), page became the weight not only into the black hole.
Even to the URL link with nofollow also is not perfect, and robots bans similar, nofollow in Google effect is that URL is not receiving weight, weight but has not been assigned to other links, so the weight also wasted. Baidu said to support nofollow, but how to deal with unknown weight.
The URL link in the Flash, JS is not used, the search engine has to crawl Flash, JS links, and estimated later become more adept at climbing. Many of the SEO overlooked point is, JS link can not only be climbed, but also can transmit the weight, and the normal connection.
Can also be made into AJAX filter links form, after the user clicks will not visit a new URL, or in the original URL, URL was added at the back of #, were not regarded as different URL. And JS, search engines are actively trying to crawl, crawl the content of the AJAX, this method is not insurance.

On page head with noindex+follow tags, which means this page don't index, but follow the links on the page. It can solve the problem of duplicate content, also solves the problem of weights (weights is black hole with export links to other pages), can not solve the waste of spider crawling time, these pages are the crawling spider crawling (then can see page HTML noindex+follow tag), for some sites, filter pages the amount is huge, crawl the page, the spider did not have enough time to climb useful page.

One can consider the way to hide page (cloaking), also is the use of procedures for the detection of the visitor, is the search engine spider words back pages removed these filter link, users are then returned to normal with a filtering condition page. This is a relatively ideal solution, the only question is, can be used as a cheat. Search engines often talking about SEO to judge whether or not cheating is the supreme principle: if no search engine, would you do so? Perhaps say, a certain method is just to search engine and uses? Obviously, using cloaking do not want to be crawling URL is for a search engine to do, not for the user to do. Although the circumstances of cloaking is better, not malicious, but there are risks, daring to try.

Also a method is to use the canonical tag, the biggest problem is Baidu whether support unknown canonical label, and is on the search engine is proposed, not instruction, it is said that the label search engines may not comply with, is useless. In addition, canonical tags are specified standardized URL, page whether to apply some filtering condition of doubt, after all, the page content is often different.

The better method is one of the iframe+robots prohibited. The filter portion of the code into the iframe, equal to call other file content, of the search engine, this section does not belong to the current page, or hide content. But do not belong to the current page is not equal to not exist, the search engine can be found in iframe content and links, or may crawl these URL, plus robots forbidden creeping. The contents of the iframe will still have some weight loss, but because the iframe links in the current page is not from the shunt weights, but only from the called file shunt, so weight loss is less. In typesetting, browser compatibility, headaches, iframe method is a potential problem is that the risk of cheating. Now the search engines generally don't think iframe's cheating, many ads are placed in the iframe, but hide a bunch of links and hidden advertising some subtle difference. Back to the search engines judge cheating general principle, it is specifically for the search engines do. Remember the Matt Cutts said, Google may change with iframe mode, they still hope that in the same page to see the ordinary user can see all content.
Thanks for reading the article,if you like it,pls share it. my website:http://www.allbatteryshop.com glad to make friend with seoer.

No comments:

Post a Comment