HTTP

The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the simplicity and the speed needed for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). It's not the first Hypertext protocol in history because before it there was Hypertalk, made by Apple. A feature of HTTP is typing the data representation, allowing systems to be built independently w.r.t data being transferred. HTTP has been in use by the World-Wide Web global information initiative since 1990.

Terminology

  • client: an application program that establishes connections for the purpose of sending requests.

  • user agent: the client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools.

  • server: an application program that accepts connections in order to service requests by sending back responses.

  • proxy: an intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them, with possible translation, on to other servers. A proxy must interpret and, if necessary, rewrite a request message before forwarding it. Proxies are often used as client-side portals through network firewalls and as helper applications for handling requests via protocols not implemented by the user agent.

  • gateway: a machine which acts as an intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway. Gateways are often used as server-side portals through network firewalls and as protocol translators for access to resources stored on non-HTTP systems.

  • tunnel: a tunnel is an intermediary program which is acting as a blind relay between two connections. Once active, a tunnel is not considered a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. The tunnel ceases to exist when both ends of the relayed connections are closed. Tunnels are used when a portal is necessary and the intermediary cannot, or should not, interpret the relayed communication.

  • cache: a program's local store of response messages and the subsystem that controls its message storage, retrieval, and deletion. A cache stores cachable responses in order to reduce the response time and network bandwidth consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be used by a server while it is acting as a tunnel. Any given program may be capable of being both a client and a server; our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.

Syntax

HTTP request: HTTP response:

Main HTTP headers

Header

Purpose

Common Use

Cache-Control

Specifies caching behavior

Cache management

Connection

Controls the connection (keep-alive or close)

Connection persistence

Content-Type

Defines the media type of the body

Specifies data format (JSON, HTML, etc.)

Content-Length

Specifies the body length in bytes

Defines the size of the response body

Authorization

Contains authentication credentials

API or HTTP authentication

User-Agent

Identifies the client software making the request

Browser or client identification

Set-Cookie

Sets cookies on the client

Managing sessions and user state

Location

Redirect URL for 3xx responses

Redirect responses

ETag

Resource identifier (used for caching)

Cache validation

X-Frame-Options

Controls iframe embedding

Security (prevents clickjacking)

Accept-Encoding

Specifies acceptable compression methods

Handling compressed responses

Connection Header

Value

Description

keep-alive

Keep the connection open for subsequent requests.

close

Close the connection after the current request/response is completed.

Upgrade

Indicates that the client wants to switch protocols (e.g., WebSockets).

Main HTTP methods

GET

The body is always empty.

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive

POST

The body could be organized in two different ways with respect to the request header specified:

  1. Content-Length: It specifies the exact size of the body in bytes.

Content-Length: <body_bytes_size>
  1. Transfer-Encoding: It specifies the encoding mechanisms used to transfer the message body. The most common encoding is chunked, introduced with HTTP/1.1.

Transfer-Encoding: chunked

Caching Mechanism

  • Expires header

    1. The client asks the resource to the server, that replies with the resource and adding "Expires" header. This is done by the server to specify wan the resource will be considered obsolete.

    2. The client stores a copy of the resource in its local cache.

    3. The client, before sending a new request, checks if it has already the resource he's asking to server. If he has already the resource, he compares the Expiration date, specified by server at phase 1, with the real time clock. A problem of this method is that the server needs to know in advance when the page changes. So the "Expires" value, sent by server, must be:

      • exactly known in advance for periodic changes (E.g. daily paper)

      • statistically computed (evaluating the probability of refreshing and knowing a lower bound of duration of resource) The other problem of this method is that we need to have server and client clocks synchronized. Hence, we need to have date correction and compensation between these systems.

  • Last-Modified header

    1. The client asks the resource to the server as before but now, he stores resource in the cache, within also its "Last-Modified" header value.

    2. The client checks if its copy of the resource is obsolete by making a request to the server of only the header of the resource. This type of request is done by using the "HEAD" method.

    3. The client looks to the value of the header "Last-Modified", received by the server. This value is compared with the last-modified header value stored within the resource. If the store date was older than new date, the client makes a new request for the resource to the server. Otherwise, he uses the resource in the cache. The problem of this method is that, in the worst case, we send two times the request of the same resource (even if the first one, with "HEAD" method, is less heavy).

  • If-Modified-Since header

    1. The client asks the resource to the server as before, storing the resource in the cache within its "Last-Modified" header value.

    2. When the client needs again the resource, it sends the request to the server, specifying also "If-Modified-Since" header value as store data.

    3. If the server, looking to the resource, sees that its Last-Modified value is more recent than date specified in the request by client, it sends back to the recipient the newer resource. Otherwise, it sends to client the message "HTTP/1.0 304 Not Modified".

  • Etag header

    1. The client asks the resource to the server, he stores resource in the cache, within also its "Etag" header value.

    2. The client checks if its copy of the resource is obsolete by making a request to the server of only the header of the resource. This type of request is done by using the "HEAD" method.

    3. The client looks to the value of the header "Etag", received by the server. This value is compared with the "Etag" header value stored within the resource, because everytime that a file changes, its hash code is computed again. If the store data has different hash code from one received, the client makes a new request for the resource to the server. Otherwise, he uses the resource in the cache.

HTTP Authentication is a way of securing web resources by requiring the client (browser, API consumer, etc.) to authenticate itself before accessing certain resources. HTTP supports several authentication mechanisms. Below are some common HTTP authentication types, along with examples:

Cookies

Cookies are small pieces of data sent from the server to the client and stored on the client side. These cookies are sent back to the server with each subsequent HTTP request to the same domain, allowing the server to remember and maintain state across multiple requests.

Cookies are closely tied to HTTP is a stateless protocol. Adding the cookies, the communication between the client and the server becomes stateful.

Server to Client: The server sends a cookie to the client in the Set-Cookie header in the HTTP response and the browser stores the cookie locally and includes it in future HTTP requests to the same domain.

HTTP/1.1 200 OK
Set-Cookie: session_id=abc123; Expires=Wed, 09 Apr 2025 10:18:14 GMT; Path=/; Secure; HttpOnly

Client to Server: When the client makes subsequent HTTP requests to the same server, it sends the stored cookies back to the server in the Cookie header.

GET /profile HTTP/1.1
Host: example.com
Cookie: session_id=abc123

Use Cases

  • Session Management

  • Personalization (User Preferences)

  • Tracking and Analytics

  • Target Advertising

  • Cross-Domain Authentication (Single Sign-On)

  • Name=Value: The cookie’s name and value.

    • Example: session_id=abc123

  • Expires: The date and time when the cookie will expire. If not set, the cookie is a session cookie and will expire when the browser is closed.

    • Example: Expires=Wed, 09 Apr 2025 10:18:14 GMT

  • Max-Age: Similar to Expires, but specifies the number of seconds from the current time until the cookie expires.

    • Example: Max-Age=3600 (cookie will expire in 1 hour)

  • Domain: Specifies the domain for which the cookie is valid. If not set, the cookie is only valid for the domain that set it.

    • Example: Domain=example.com (cookie will be sent to all subdomains of example.com)

  • Path: Specifies the URL path for which the cookie is valid. If not set, the cookie is valid for the entire domain.

    • Example: Path=/dashboard (cookie is only sent for requests to the /dashboard path)

  • Secure: Ensures the cookie is only sent over HTTPS (secure connections).

    • Example: Secure (cookie is only sent over HTTPS, not HTTP)

  • HttpOnly: Indicates that the cookie is not accessible via JavaScript (to prevent attacks like XSS).

    • Example: HttpOnly (cookie can't be accessed by JavaScript)

  • SameSite: Controls whether the cookie is sent with cross-origin requests. There are three possible values:

    • Strict: Cookie is only sent for same-site requests.

    • Lax: Cookie is sent for same-site requests and some cross-site requests (e.g., top-level navigation).

    • None: Cookie is sent with all cross-site requests, but must also be Secure.

    • Example: SameSite=Lax

HTTP Authentication

When the client tries to retrieve a protected resource, the Server usually replies with the following response:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: <authentication_type>

where <authentication_type> is the authentication type required by the server to access the resource.

Then, the browser will send a new HTTP request with the following format:

GET /protected-resource HTTP/1.1
Host: example.com
Authorization: <authorization_string>

where the <authorization_string> differs with respect to the <authentication_type> required by the Server.

1. Basic Authentication

Basic Authentication is the simplest authentication method. It involves sending a username and password encoded in the Authorization header as base64.

Server Response:

WWW-Authenticate: Basic

Client Request::

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

where dXNlcm5hbWU6cGFzc3dvcmQ= is the base64-encoded string of username:password.

Note: Basic authentication is not secure unless used over HTTPS

2. Digest Authentication

Digest Authentication is more secure than Basic Authentication because it hashes the credentials and uses a challenge-response mechanism, preventing the password from being sent in plaintext.

Server Response:

WWW-Authenticate: Digest realm="Protected Area", qop="auth", nonce="dcd98b7102dd2f0e8b5e3f8b8b45e4a8", opaque="5ccc069c403ebaf9f0171e9517f40e41"

Client Request::

GET /protected-resource HTTP/1.1
Host: example.com
Authorization: Digest username="admin", realm="Protected Area", qop=auth, nonce="dcd98b7102dd2f0e8b5e3f8b8b45e4a8", uri="/protected-resource", response="5ccc069c403ebaf9f0171e9517f40e41"

In this case, the response value is a hash of the password and other parameters (nonce, URI, etc.), providing a much more secure method of authentication compared to Basic Authentication.

3. Bearer Token Authentication (OAuth 2.0)

Bearer token authentication is commonly used in modern web applications, especially with APIs. This method is often used in conjunction with OAuth 2.0. Instead of sending a username and password, the client sends a token that grants access to the requested resource.

Server Response: The server issues a 401 Unauthorized response with a challenge.

Client Request: The server expects the client to send the token in the Authorization header.

Authorization: Bearer <access_token>

OAuth 2.0 allows clients to authenticate without needing to expose user credentials. Tokens are typically short-lived and can be refreshed, making them more secure than basic credentials.

4. Digest Authentication with Nonces (HMAC)

Digest Authentication can be further secured with HMAC (Hash-based Message Authentication Code), where a hashed value is generated based on a shared secret and request data (like nonce, timestamp, URI, etc.). This method is commonly used in REST APIs and web services.

Server Response: The server issues a 401 Unauthorized response with a challenge.

Client Request:

GET /api/resource HTTP/1.1
Host: api.example.com
Authorization: Digest username="admin", realm="Example API", nonce="xyz123", uri="/api/resource", response="d41d8cd98f00b204e9800998ecf8427e"

5. NTLM Authentication (Windows Authentication)

NTLM is a Microsoft protocol used for authentication in Windows environments. It is commonly used for accessing Windows-based resources and is supported by servers like IIS (Internet Information Services).

Server Response:

WWW-Authenticate: NTLM

Client Request:

GET /protected-resource HTTP/1.1
Host: example.com
Authorization: NTLM TlRMTVNTUAADAAAAGAAYAFgAA...

6. Kerberos Authentication

Kerberos is a network authentication protocol that uses secret-key cryptography to provide strong authentication. It's often used in enterprise environments, such as when accessing corporate services.

Server Response:

WWW-Authenticate: Negotiate

Client Request:

GET /protected-resource HTTP/1.1
Host: example.com
Authorization: Negotiate YIIK... (encrypted token)

7. Custom Authentication

Some applications or services implement custom authentication schemes. This could involve sending specific headers or tokens that are validated by the server.

Client Request:

GET /api/resource HTTP/1.1
Host: api.example.com
X-Auth-Token: your-api-token-here

HTTP Proxy

An HTTP proxy acts as an intermediary between the client (usually a web browser) and the server. When a client sends a request to access a resource, the proxy server forwards that request to the destination server and then relays the server's response back to the client.

HTTP Proxy Requests

When a client makes a request to a web server through an HTTP proxy, the request is first sent to the proxy server. The proxy server then forwards the request to the destination server.

  1. Client to Proxy Request: The client sends an HTTP request to the proxy. For example:

    GET /index.html HTTP/1.1
    Host: www.example.com
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
  2. Proxy Request to Destination Server: The proxy server takes the client request and forwards it to the destination server looking to Host header. For example:

    GET /index.html HTTP/1.1
    Host: www.example.com
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    X-Forwarded-For: 192.168.1.1

    The X-Forwarded-For header is often added by the proxy to indicate the client's IP address. The proxy could also modify headers like User-Agent or Referer.


HTTP Proxy Responses

Once the destination server responds to the proxy server, the proxy forwards the server’s response to the client.

  1. Server → Proxy Response: The server sends an HTTP response to the proxy. For example:

    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8
    Content-Length: 1234
    Date: Mon, 06 Apr 2025 10:00:00 GMT
  2. Proxy Response to Client: The proxy server forwards this response to the client. For example:

    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8
    Content-Length: 1234
    Date: Mon, 06 Apr 2025 10:00:00 GMT
    Via: 1.1 proxy.example.com  <-- This header indicates the proxy server’s involvement.

    The proxy may add the Via header to show the request was handled by an intermediary (the proxy server).

  3. Caching by the Proxy: Some proxies cache the response (especially for frequently requested resources) to improve performance and reduce the load on the destination server. If the requested resource is cached, the proxy will serve the cached content instead of forwarding the request to the destination server again.

Types of Proxy Requests

  1. Forward Proxy:

    • A forward proxy sits between the client and the internet. The client sends requests to the proxy server, which then forwards those requests to the internet.

    • Commonly used in corporate environments or for content filtering and privacy protection.

  2. Reverse Proxy:

    • A reverse proxy sits between the client and a web server, but it is the web server that sends requests to the reverse proxy.

    • Commonly used for load balancing, SSL termination, or caching. It appears as the web server to the client but forwards requests to one or more backend servers.

  3. Transparent Proxy:

    • A transparent proxy intercepts requests and responses without modifying them. The client may not even be aware that the proxy is involved.

    • Commonly used for monitoring or caching requests.

  4. Anonymous Proxy:

    • An anonymous proxy hides the client’s IP address, allowing the client to request resources anonymously. The server only sees the IP of the proxy, not the actual client.


Important HTTP Proxy Headers

  • X-Forwarded-For: This header is used by proxies to pass along the client’s original IP address. This is particularly useful for reverse proxies or when a request is routed through multiple proxies.

    • Example: X-Forwarded-For: 203.0.113.195

  • Via: This header indicates the proxy server's involvement in handling the request. It can contain the version and address of the proxy that handled the request.

    • Example: Via: 1.1 proxy.example.com

  • X-Real-IP: Another header used to pass the real client IP address. This is often used by proxies that need to provide the original client IP address to the destination server.

    • Example: X-Real-IP: 203.0.113.195

  • Cache-Control: Specifies caching behavior. Proxies can use this header to determine how responses should be cached.

    • Example: Cache-Control: no-cache (prevents caching by proxies)

  • Authorization: When the client is required to provide credentials (like username and password), this header is used in proxy requests for HTTP authentication.

    • Example: Authorization: Basic YWxhZGRpbjpvcGVuYXV0aA==


Conclusion

HTTP Proxy Requests and Responses act as intermediaries between the client and the server. The proxy server forwards the client’s request to the destination server and then relays the server's response back to the client. The primary use cases for proxies include improving performance through caching, filtering content, hiding the client’s identity, load balancing, and enhancing security. Proxies can add or modify headers, and use features like caching and authentication to optimize the flow of data between the client and server.

Last updated