Products
Database Search Solution (New Version) Search Control SEO Pager Highlighter Shortcut Controls Crypt Package Free ASP.NET Controls
Geotargeting Component ASP.NET Media Player Control Flash Video Player Control Services
ASP.NET Telecommute Jobs Free IP Location Lookup Test .Net Regular Expressions CSS/Table/DIV Page Layouts Custom Programming Article Sites Master List |
Dangerous paths - URI DesignASP.NET is essentially an advanced request-processing framework. Naturally, the URI is the most important part of any request (or should be). URIs should be well designed, and should represent the request content accurately and succinctly. Unfortunately, they are frequently misused, which causes browsers, users, and search engines no end of trouble. Some misuse URIs by making them too generic; some sites have only the home page. Flash, AJAX, and frames are the biggest culprits here, as they are capable of making big changes to the current content of the page without affecting the address bar. Users of this type of site are frustrated because if they bookmark a buried page in the site, it only records the address of the home page. The back button also betrays them - it doesn't undo their actions anymore, but plops them completely off the site. Search engines dislike these sites because either (1) they can't access buried content due to its form (JavaScript or Flash) or (2) they can access it, but all keywords are diluted from the massive amount of content available on one page. Some developers take the misuse to the opposite end. The feel that the address bar is the perfect place to store all variables, interface state data, and user preferences. They, too, cause problems for both users and search engines. Users bookmarking or e-mailing such links often find that they no longer work after their session has expired, or after a change was made on the site. Their length and lack of simplicity also makes them hard to understand, as many users depend on the address bar to understand where they are located on the site. Search engines find them confusing, because they see (and rank) each URI as a separate page, and dilute the ranking accordingly. So, you ask, what makes a good URI?
Further reading (written by Tim Berners-Lee): http://www.w3.org/Provider/Style/URI. Bad examples:
Better examples:
Best:
URIs in the HTTP protocolLet's look at how URI is sent to the server using HTTP Here is a basic GET request. The first line consists of the HTTP method, followed by a root-relative path, then the protocol version. The subsequent lines contain the header collection, in the form of simple name-value pairs. The two parts of the URI here are the path (/blog?page=2), and the HOST-header (youngfoundations.org). We know that the scheme is probably "http" since we are communication using the HTTP protocol. IIS tells us which port the request arrived on, so between the pieces we can reconstruct the original URI somewhat accurately. Note: there are LOTS of schemes out there that use the HTTP protocol, like firefoxurl://, etc. Note: The HOST header is important, since some servers host dozens of domains, and this allows IIS to forward the request to the appropriate application in shared hosting situations. Multiple domains (hostnames) can be pointed to a single application. The path and the query are divided by the first question mark. GET /blog?page=2 HTTP/1.1[CRLF] Content can accompany any request, although it usually only accompanies the POST method. The header collection is separated from the request body by the character sequence CRLFCRLF (2 newlines). The content in the request body is described by the content-type and content-length HTTP headers. POST /blog HTTP/1.1[CRLF] The HTTP response generated by your ASP.NET application looks slightly different that the request that prompted it. The general format remains, but the first line is now [HTTP Version] [Status-code] [Status code description]. Http status codes are very important, but are beyond the scope of this article. See http://en.wikipedia.org/wiki/List_of_HTTP_status_codes for more information. HTTP/1.1 301 Moved Permanently [CRLF] Important note: If you have multiple domains pointing to one website, make sure they are all 301 redirected to precisely one host name. Otherwise you will sabotage your search engine placement by (1) diluting your page rank, and (2) being penalized for duplicate content. URIs versus URLsThe term URL (Uniform Resource Locator) has been considered obsolete for a long time. In its place stands the URI (the Uniform Resource Identifier). Strictly speaking, a URL must provide all of the information required to located and retrieve a resource, while a URI is only required to identify it in relation to the current context. Thus, a URL is a URI that "in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network 'location').". In common usage, however, both terms are synonymous. It is important, however , to differentiate 'complete' URIs (such as URLs) and incomplete, or relative, URIs. For example, the following URIs are also URLs: http://www.mysite.com:54321/ folder/virtualfolder/default.aspx? param1=thisisatest¶m2=test2 However, these are not: ../css/shared.css [URI relative to the location of the parent document] #requirements [URI fragment relative to current document. Fragments describe a section, place, or entity in the current document. In HTML, they usually refer to a certain anchor tag (by name or ID). The window is usually scrolled to the location of the anchor tag. Fragments are never sent to the server computer, and only function as a display instruction to the client. If a fragment isn't understood, it is ignored. Fragments are pretty much free-form. If the current document is http://mysite.com/home.html and a link to http://mysite.com/home.html#part3 is clicked, the browser (or user-agent), is not supposed to ask the server for http://mysite.com/home.html again, but older clients may. Relative fragments like #part3 are handled better. Now let us dissect the following URL: http://www.mysite.com:54321/folder/virtualfolder/default.aspx? param1=thisisatest¶m2=test2 http The scheme (protocol). The protocol determines how the client should talk to the server (basically the language, or grammar). www.mysite.com The computer the resource is located on (DNS, WINS, or IP Address) :54321 The port number to communicate with on the computer. Instead of trying to sort out incoming packets and route them to the right application on the server computer, ports are used. Certain default ports are assumed for some protocols. Http requests are sent to port 80 by default. Https requests are sent to port 443, and FTP requests are sent to port 21. If an application is not listening on that port (or the request packets are blocked by a firewall), no response will be given. Additional sorting is sometimes performed, as in the case of WCF (.NET 3.0) port sharing, or when multiple sites are hosted on a single server. When an HTTP request is sent to a server, it is accompanied by the original hostname from the address bar. An unlimited number of DNS (Domain Name System) addresses can point to a single computer, which is convenient for web hosting providers. IIS (Internet Information Services) can be configured to look at this host header, and forward the request to whichever site is configured to receive requests for that particular hostname (DNS address). For information about DNS, read http://en.wikipedia.org/wiki/Domain_name_system. Super-simplified view of DNSDNS addresses are hierarchical, and levels (domains) are separated by a period. Domains progress from most specific to least specific. For example, in resolving www.mysite.com, the following steps would be taken: Ask computer 'COM' where computer 'MYSITE' is at (what its IP address is). Ask computer MYSITE where computer 'WWW' is at. DNS is used for a whole lot more that just web browsing, so the company at mysite.com might have a whole bunch of computers, such as ftp.mysite.com, mail.mysite.com, pop.mysite.com, telnet.mysite.com, as well as www.mysite.com. WWW usually points to the web server for the company. Please note, however that the WWW part is completely unnecessary, and is just a commonly followed convention. Note: In www.mysite.com, "com" is a TLD (Top-level domain), and "mysite" is a SLD (Second-level domain) SLDs usually cost a registration fee, as the poor owner of the "COM" computer has tremendous bandwidth bills. Third-level domains can be freely created if the parent SLD is under your control. Paths In the URI http://microsoft.com/default.aspx?tabid=2 The path is the portion of the URL after the third forward-slash. Tutorial toolbar: Tell A Friend | Add to favorites | Feedback | |