Facebook

Why do some websites have "www." at the start, but others have "http://"?

Here's the secret: the "www." is a subdomain.  So "www.google.com "is actually a subdomain of, and in theory distinct from, the actual domain "google.com".  However, by convention, and to avoid confusion, the subdomain "www.google.com" is set up to take you to the the same place as the actual domain "google.com".
Incidentally, this is also why there is never a "www" in your email address: i.e. you can have the email "johndoe@example.com" but it's almost never "johndoe@www.example.com" (in reality, a web admin could set up email addresses with a www. subdomain in them if they wanted, but that would be just weird).
To give you to a physical analogy, imagine you've arrived at the office complex of "google.com".  The office complex consists of several departments, each of which occupy their own building, you see the "images" building (whose full address is "images.google.com"), a "maps" building (whose full address is "maps.google.com"), there is also a building called "www" (whose full address is "www.google.com").
You can see that the "www.google.com" building is not much different from the "maps.google.com" or "images.google.com" - it's just another building inside the complex.  However, it is clearly important because it was built front and center of the entire complex, such that when you arrive at "google.com" office complex, unless you had another department in mind, you would enter the "www.google.com" building.
This is good, it avoids confusion, going to http://google.com is functionally the same as going to http://www.google.com.  However, some websites aren't so neatly set up and forget to do this.  For analogy, this is like putting the "www." building somewhere in the office complex, but having the main entrance of the office complex lead to an empty lot.  People who arrive at "example.com" find themselves in the empty lot, and have to remember to go through a side-door that leads to "www.example.com" to get to what they want.
A more common error that you might see is that the web admins forget that the "www." domain is technically distinct from the main domain, and forget to set up login sessions that work across both.  This means if you logged in on "example.com", and then went to "www.example.com", it wouldn't know that you were logged in.  This is like arriving at the office complex and picking up your security badge at the main gate, then going to the "www.example.com" building, and the security guard telling you that your security pass is not valid for the "www" building, and that you need to log in again to get a second security pass.  To fix this, web admins have to specifically instruct the website that "main domain security passes and www subdomain security passes are interchangeable".
Finally, you might often see a "www2" instead of a "www".  This can be done when websites are undergoing maintenance - they'd set up a second building named "www2" and simply redirect visitors to it while renovation work is carried out on the "www" building.  Or alternatively if a website needs to distribute users across multiple servers to handle the load - that would be like setting up multiple identical buildings, and evenly distributing the visitors between them so that no single building gets overcrowded.
Basically, the "www" subdomain, by convention, contains the main website.  By convention it's the same as going to the domain, except for when things aren't set up correctly due to an error or ignorance, and ends up confusing users.
The "http:" part on the other hand is always there, whether it's "http://google.com" or "http://www.google.com".  That part tells your browser what kind of connection to make to the website.  HTTP is in effect the "language" that your browser uses to talk to the server to request the website.  These days, you'll often see "https:" instead of "http:"; HTTPS is the secure version of HTTP.  When your browser accesses an "https:" site, your browser sets up an encrypted connection with the server, therefore preventing anyone monitoring your internet traffic from seeing your data, particularly important if you're making a purchase with a credit card.
There are a lot of these protocols in use over the internet, they're known as the URI Scheme, and you can find a list of the "official" ones here: Uniform Resource Identifier (URI) Schemes.  To give you another example of a URI scheme other than http: or https:, you may also come across ftp:, which is used for file transfer (I pick this as an example because your browser happens to support it, but there are many more protocols that need their own software).  For example ftp://ftp.ucsb.edu/ is a UCSB's publicly accessible FTP portal.  The analogy for this is: if http: is an instruction to enter the visitor's entrance of the building (and https: is like an extra-safe way, where you enter the visitor's entrance via an unmarked black car and with a private security detail scanning your surroundings to make sure you weren't followed), then ftp: is like an instruction to drive a truck over to the loading bay of the building to load or unload goods.
Bonus material:
Let's talk about ports.  This is often hidden from users by the web browser, but in reality, a building can have many different entrances, and each entrance has a number (but not all entrances are open, and not all entrances have something behind them ready to receive visitors).  Entrance 80 (i.e. port 80) is the default entrance for web pages.  When you tell your browser you want an http connection, the browser will by default attempt to connect to port 80 unless you tell it to otherwise.  Conversely, https is port 443, and ftp is port 21.
You can specify a custom port number by doing this: "http://www.google.com:80" (though in the case of Google, they've got it set up to redirect you over to the https version of the site).
Want to confuse the server?  Tell your browser to go to an incorrect port, for example "http://www.google.com:443"  Here, 443 is the port for encrypted https connections, but you're telling your browser to use plain old http: you are attempting to make an unencrypted http connection to a port that is waiting for an encrypted connection.  You'll usually get a "connection reset" message, which means the server decided that your browser is not talking in a language that it can (or is willing to) understand, and has closed the connection with you.  This is like you arriving at the English-speaking entrance of the building and yelling Spanish at them until they close the door on you.  Note that going to "https://www.google.com:443" works, because you're telling your browser to make an encrypted https connection to the 443 port, which is what is expected.
There are many reasons a different and non-standard port might be used.  Often, a security measure is to use non-standard ports.  For example, port 22 is used for the ssh protocol, which is used to remotely administrate servers.  This is like the private employee entrance that has a keypad on the door.  However, there are certain criminals who go looking for these doors and try to guess the password, hoping to gain unauthorized access to the building (perhaps to steal information).  This is like going up to door 22 and trying different passwords on the keypad until the right one is found.  Various countermeasures are available, such as having the keypad lock people out on multiple failed attempts, or switching to a key-card entry (the employee must present a key-card rather than enter password).  Such countermeasures are essential for security, but don't solve the problem of criminals rattling the handles every few seconds.  To help with that, one possibility is simply moving the employee entrance to a different door (a different port).  This way door 22 doesn't even have a handle to try - as long as the crims don't know where the employee entrance was moved to, they would simply encounter a bricked-over entryway where door 22 used to be.
So to conclude, if your address bar says something like "https://www.quora.com", it's doing this:
- Address: "quora.com" domain (which, by the way, lives in the ".com "top-level domain)
- Department: "www"
- Language to speak: HTTPS
- Door: 443 (the default door to go to for HTTPS speakers)
The things your computer does to get you a webpage is fascinating and complex: we haven't even talked about IP addresses and DNS, or exactly how your computer knows how to reach a website; or HTTP methods, or HTTP status codes yet.

Answered by - Yuan Gao on Quora.

1 comment: