I was recently reminded of insights I gained in my younger years into how the web works by manually talking to a web server. At the time (early 2000s), HTTP/1.1 was the de facto standard being embraced as an improvement over HTTP/1.0. HTTP versions < 2 are plain text, making them quite simple to use in interactions with web servers.
Making a Request
With a tool like Telnet or Netcat, we can open a connection to a web server and start trying to talk the HTTP/1.1 protocol.
I will use Netcat (nc).
$ nc google.com 80
With a connection established, we can start by making a GET request for the root url using HTTP/1.1
GET / HTTP/1.1
Next we need to specify the Host that the web request is for. A single web server may handle requests for multiple hosts; using the Host header we specify which host we are making a request to.
Finally, the default behaviour in HTTP/1.1 is to keep the connection open after the request is finished. To instruct the web server to close the connection we can use the Connection header. To signal that we have no further headers to add, we add an empty line.
After having added a blank line to the request, we expect the web server to reply.
$ nc google.com 80 GET / HTTP/1.1 Host: google.com Connection: close HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Connection: close <HTML><HEAD><meta http-equiv= .....
We can see the response has a 301 status code. We can see there is a Location header with a value of http://www.google.com/ The Content-Type header describes how the client should interpret the response body. We can see a Content Length header which describes how long the response body is.
If we go head and take the suggestion to make a request to www.google.com/ instead of google.com:
$ nc www.google.com 80 GET / HTTP/1.1 Host: www.google.com Connection: close HTTP/1.1 200 OK Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Set-Cookie: NID=215=UePS... Accept-Ranges: none Vary: Accept-Encoding Connection: close Transfer-Encoding: chunked 5206 <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" ....
Here we can see an ensemble of new headers. There is a multitude of documentation available on the web to cross-reference what each header is for and what it does. MDN Web Docs is one such resource.
Transfer-Encoding: chunked indicates that ‘chunks’ or batches of response data will be sent. The length of each chunk is
specified prior to the response body,
5206 in this case. This mechanism is useful for large responses or for cases where
the length of the response may not be known up front.
Set-Cookie: NID=.... is potentially the most interesting header in terms of web application development. When a web
server response includes a set cookie header, typically a browser will include the cookie value data in consequent requests
to the same domain. This is how being ‘logged in’ typically works. The client submits a username and password in a POST
request. The web server responds with a cookie. When the client makes requests with the cookie the web server can cross-reference
the cookie value against its list of sessions and treat the request as from a logged in user. This usage is typically
referred to as a session cookie. There are lots of other uses of cookies from tracking to recording usage preferences.
Check out HTTP/1.1 rfc 2616 and experiment!