Recent Events for MainPageDiary (Blog)


Updates since 1970-01-01 00:00 UTC

(for 2006-11-05_Web_Indexing_and_Bot_Behavior only)

1 days 3 days 7 days 30 days 90 days
List latest change per page only List only major changes
List later changes RSS RSS with pages RSS with pages and diff


  • 08:45 UTC (diff) 2006-11-05 Web Indexing and Bot Behavior . . . . Update 2007-04-14 : [ Dave Winer] made an [ interesting blog entry] in Scripting News about sitemap and talked about a similar approach he made in 1997. The idea was a simple reverse and readable file with the changes of a specific web site. I hope that the sitemap protocol version 2.0 will include a similar approach in order to grab efficiently the content that has changed.



  • 21:05 UTC (diff) 2006-11-05 Web Indexing and Bot Behavior . . . . Update 2007-03-27 : I have contacted Voila (France Telecom) about the strange bot behavior of their bot (always downloading the robots.txt, downloading document that never changes, looping on urls). The funny part is they don't understand what's the Voila bot is and their general answer is "Your computer is infected by virus"... arf arf.


  • 21:10 UTC (diff) 2006-11-05 Web Indexing and Bot Behavior . . . . adulau Update 2006-12-02 : It looks like that Google, Yahoo! and Microsoft agreed around the sitemaps protocol proposed by Google mid November. For more information []. The only missing part (IMHO) is a simple META entry to specify the sitemaps location in the root document of a web site. That could ease the job of the crawler when gathering the root page and avoid the webmaster to ping all the respective search engine.