Googlebot will eventually crawl through HTML forms
by AJ on April 11, 2008
RWW wrote a very interesting article today about Google’s plans into crawling through HTML forms.
Quoting an excerpt:
“For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made,” explained Jayant Madhavan and Alon Halevy in a blog post. “If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.”
Does this mean we now need to add additional access permission for the Googlebot to index our dynamic form results if we wanted to? What about forms with CAPTCHA? What If we didn’t want Googlebot to access our form results and forgot to deny it on our robots.txt file, will it be fast and easy to request removal from the Google.com SERPs?
I wildly assume that most webmasters already know that it takes time to request pages to be taken out of the Google’s SERPs. It sometimes takes days, even weeks.
There’s so much questions right now with this news that just came out. But I guess only time will tell what will really happen when it happen.
What’s your take?
–aj



One comment
Read more of it from Matt Cutts:
http://www.mattcutts.com/blog/solved-another-common-site-review-problem/
And of course, the official Google Webmaster Central:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
by aj on April 11, 2008 at 7:44 PM. #