Normalizing URL/(URI) or URL canonization is often important when you want to store and query a database containing url. The main question is the definition of URL normalization. I'm just trying to gather the various definitions available.
" Returns a normalized version of the URI. The rules for normalization are scheme-dependent. They usually involve lowercasing the scheme and Internet host name components, removing the explicit port specification if it matches the default port, uppercasing all escape sequences, and unescaping octets that can be better represented as plain characters. "
public static java.net.URL normalize(java.net.URL url) if the url points to a file then make sure we cleanup ".." "." etc. "
http://www.phpclasses.org/browse/package/1844.html " Normalization consists in making the pages be served under an URL without any query parameters that usually follow the question mark in the original URLs. The normalized URLs make the query parameters appear as if they are directory path names of site page virtual files. "
" A Go package to normalize a URL as per
This package is not overly aggressive and errs on the side of preserving a working URL. "
Currently, the rating of a web page applies to the page as a whole. The lookup mechanism is also just providing an aggregated rating of the complete page, so there is (yet) no point in processing anchor links. In short, we just remove them for now.