URLs
Some friends of mine recently shared a link through social media that gave the offer of free airline tickets. Sounds great, right? Surprise! They were being fooled by a fake website. They could have avoided being caught by this scam if they had known how a world-wide web address works. See the image below for the link that they clicked on and shared:
This is an example of social engineering, where some kind of bad actor is using people’s emotions (the desire to get free things) to both spread the link around and collect personal data for fraudulent use. The people who participated in this could have saved themselves and others time and trouble if they had known how to read a URL.
What is a URL?
URL stands for “Uniform Resource Locator.” It is the address that leads to any website, webpage, file or other resource on the World Wide Web. It is also used for email, file transfer, database access, and other services.
What are the parts of a URL?
The basic part of a URL is fairly simple. It’s what you see in advertisements and use when you tell people about a website. The main part is the domain or host. The domain of this website is conquercomputerscience or, more fully, conquercomputerscience.org. People know these sorts of domains, and that knowledge was used by the “free tickets” scam because people recognized the airline name and the dot com ending.
The details of a URL get more complicated and for a Computer Science student, it’s important to know the details. (If my friends had known this, they would have immediately seen that the URL was not trustworthy!)
Let’s look at an illustration of a complete (but made-up) URL:
I’ve broken the URL down into sections using colours to highlight each part.
Protocol
The first part (highlighted in yellow) is the scheme or protocol, which determines how a resource is accessed. In the example, https is used, which is a standard website protocol. HTTPS stands for HyperText Transfer Protocol Secure, which is an encrypted way to transfer a web page (constructed using hypertext) from a server to another device. The insecure form (HTTP) is not more dangerous, but is not encrypted so can be read by anybody who tries to intercept the information as it goes from the server to the requesting device.
There are other types of protocols used on the worldwide web, most commonly the mail transfer protocol used for emails, but also file transfer protocol, and more.
When people advertise or tell others about a website, and also when they try to connect to one, they often leave out the protocol. Web browsers and other applications will try to use a default protocol (such as HTTPS) if people don’t enter one, so if you tell your friends about this website, you don’t need to say that it’s “https conquercomputerscience.org”!
Authority
The second line in the example is the most commonly understood part, the part people call the domain. It is specifically called the authority or host. The orange-highlighted part is the actual host and has three parts, separated by dots:
- The first part (which can be called a leaf host) is often www but can be something else. This is a specific section of the main domain.
- The middle part is the main domain name, which is technically the second-level domain.
- The final part (with red letters in the example) is the top-level domain, which specifies a “region” of the world-wide web that the domain is part of. Originally there were mainly three-letter domain names (such as com or org and others) and two-letter country domain names (ie for Ireland, pt for Portugal, and so on). Now there are many more, including email, europe and lots of others.
The first part (highlighted in grey) is an optional username that can be used to log in to the URL. It’s not commonly used, but it certainly can be. (The URL my friends were fooled by included a username that looked like the domain for a famous airline company!)
The last part (highlighted in the example in red and separated by a colon) is the port that is used to connect to the host. It’s often left out, and a default port is used. (80 is one of the default ports used for websites.) Some types of services (for example, a web database) may require a port number.
Path
The part highlighted in green is the path that leads to a particular resource. Different directories (or folders) are separated by slashes, and the path might end in a specific file name (such as myfile.html or somedoc.pdf or image.jpg) but if it doesn’t, a default webpage called index.html will be shown (if it exists).
Queries and Fragments
The final section (highlighted in purple) are optional components:
Queries are begun with a ? symbol, and are used for various purposes: to be used as searches through a database (especially in search engines), to track user information, to enter data into forms, etc. The most common use, web forms, is shown by the strings such as “key1=value1” which would enter “value1” into a form field called “key1.” Separate elements are separated by an & (“ampersand”) symbol.
Fragments are begun with a # symbol and is usually used to indicate a specific element of a web page that should be shown. This is also called an anchor and could be a section of the web page or any other part the web page creator wants to highlight.
For more information on URLs, check out my online course “Conquer Computer Science: Networks and Data Transmission”
Leave a Reply