Thursday, 13 March 2014

Affiliate Sites & Dynamic URLs

Affiliate Sites & Dynamic URLs




One good tip is that you should prepare a crawler page (or pages) and submit this to the search engines. This page should have no text or content except for links to all the important pages that you wished to be crawled. When the spider reaches this page it would crawl to all the links and would suck all the desired pages into its index. You can also break up the main crawler page into several smaller pages if the size becomes too large. The crawler shall not reject smaller pages, whereas larger pages may get bypassed if the crawler finds them too slow to be spidered.

You do not have to be concerned that the result may throw up this “site-map” page and would disappoint the visitor. This will not happen, as the “site-map” has no searchable content and will not get included in the results, rather all other pages would. We found the site wired.com had published hierarchical sets of crawler pages. The first crawler page lists all the category headlines, these links lead to a set of links with all story headlines, which in turn lead to the news stories.

Another way to avoid such dynamic URLs is to rewrite these URLs using a syntax that is accepted by the crawler and also understood as equivalent to the dynamic URL by the application server. The Amazon site shows dynamic URLs in such syntax. If you are using Apache web server, you can use Apache rewrite rules to enable this conversion.

One good tip is that you should prepare a crawler page (or pages) and submit this to the search engines. This page should have no text or content except for links to all the important pages that you wished to be crawled. When the spider reaches this page it would crawl to all the links and would suck all the desired pages into its index. You can also break up the main crawler page into several smaller pages if the size becomes too large. The crawler shall not reject smaller pages, whereas larger pages may get bypassed if the crawler finds them too slow to be spidered.

No comments:

Post a Comment