HTTP
Last updated
Last updated
The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the simplicity and the speed needed for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). It's not the first Hypertext protocol in history because before it there was Hypertalk, made by Apple. A feature of HTTP is typing the data representation, allowing systems to be built independently w.r.t data being transferred. HTTP has been in use by the World-Wide Web global information initiative since 1990.
client: an application program that establishes connections for the purpose of sending requests.
user agent: the client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools.
server: an application program that accepts connections in order to service requests by sending back responses.
proxy: an intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them, with possible translation, on to other servers. A proxy must interpret and, if necessary, rewrite a request message before forwarding it. Proxies are often used as client-side portals through network firewalls and as helper applications for handling requests via protocols not implemented by the user agent.
gateway: a machine which acts as an intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway. Gateways are often used as server-side portals through network firewalls and as protocol translators for access to resources stored on non-HTTP systems.
tunnel: a tunnel is an intermediary program which is acting as a blind relay between two connections. Once active, a tunnel is not considered a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. The tunnel ceases to exist when both ends of the relayed connections are closed. Tunnels are used when a portal is necessary and the intermediary cannot, or should not, interpret the relayed communication.
cache: a program's local store of response messages and the subsystem that controls its message storage, retrieval, and deletion. A cache stores cachable responses in order to reduce the response time and network bandwidth consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be used by a server while it is acting as a tunnel. Any given program may be capable of being both a client and a server; our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.
Header
Purpose
Common Use
Cache-Control
Specifies caching behavior
Cache management
Connection
Controls the connection (keep-alive or close)
Connection persistence
Content-Type
Defines the media type of the body
Specifies data format (JSON, HTML, etc.)
Content-Length
Specifies the body length in bytes
Defines the size of the response body
Authorization
Contains authentication credentials
API or HTTP authentication
User-Agent
Identifies the client software making the request
Browser or client identification
Set-Cookie
Sets cookies on the client
Managing sessions and user state
Location
Redirect URL for 3xx
responses
Redirect responses
ETag
Resource identifier (used for caching)
Cache validation
X-Frame-Options
Controls iframe embedding
Security (prevents clickjacking)
Accept-Encoding
Specifies acceptable compression methods
Handling compressed responses
Connection
HeaderValue
Description
keep-alive
Keep the connection open for subsequent requests.
close
Close the connection after the current request/response is completed.
Upgrade
Indicates that the client wants to switch protocols (e.g., WebSockets).
The body is always empty.
The body could be organized in two different ways with respect to the request header specified:
Content-Length: It specifies the exact size of the body in bytes.
Transfer-Encoding: It specifies the encoding mechanisms used to transfer the message body. The most common encoding is chunked, introduced with HTTP/1.1.
Expires header
The client asks the resource to the server, that replies with the resource and adding "Expires" header. This is done by the server to specify wan the resource will be considered obsolete.
The client stores a copy of the resource in its local cache.
The client, before sending a new request, checks if it has already the resource he's asking to server. If he has already the resource, he compares the Expiration date, specified by server at phase 1, with the real time clock. A problem of this method is that the server needs to know in advance when the page changes. So the "Expires" value, sent by server, must be:
exactly known in advance for periodic changes (E.g. daily paper)
statistically computed (evaluating the probability of refreshing and knowing a lower bound of duration of resource) The other problem of this method is that we need to have server and client clocks synchronized. Hence, we need to have date correction and compensation between these systems.
Last-Modified header
The client asks the resource to the server as before but now, he stores resource in the cache, within also its "Last-Modified" header value.
The client checks if its copy of the resource is obsolete by making a request to the server of only the header of the resource. This type of request is done by using the "HEAD" method.
The client looks to the value of the header "Last-Modified", received by the server. This value is compared with the last-modified header value stored within the resource. If the store date was older than new date, the client makes a new request for the resource to the server. Otherwise, he uses the resource in the cache. The problem of this method is that, in the worst case, we send two times the request of the same resource (even if the first one, with "HEAD" method, is less heavy).
If-Modified-Since header
The client asks the resource to the server as before, storing the resource in the cache within its "Last-Modified" header value.
When the client needs again the resource, it sends the request to the server, specifying also "If-Modified-Since" header value as store data.
If the server, looking to the resource, sees that its Last-Modified value is more recent than date specified in the request by client, it sends back to the recipient the newer resource. Otherwise, it sends to client the message "HTTP/1.0 304 Not Modified".
Etag header
The client asks the resource to the server, he stores resource in the cache, within also its "Etag" header value.
The client checks if its copy of the resource is obsolete by making a request to the server of only the header of the resource. This type of request is done by using the "HEAD" method.
The client looks to the value of the header "Etag", received by the server. This value is compared with the "Etag" header value stored within the resource, because everytime that a file changes, its hash code is computed again. If the store data has different hash code from one received, the client makes a new request for the resource to the server. Otherwise, he uses the resource in the cache.
HTTP Authentication is a way of securing web resources by requiring the client (browser, API consumer, etc.) to authenticate itself before accessing certain resources. HTTP supports several authentication mechanisms. Below are some common HTTP authentication types, along with examples:
Cookies are small pieces of data sent from the server to the client and stored on the client side. These cookies are sent back to the server with each subsequent HTTP request to the same domain, allowing the server to remember and maintain state across multiple requests.
Cookies are closely tied to HTTP is a stateless protocol. Adding the cookies, the communication between the client and the server becomes stateful.
Server to Client:
The server sends a cookie to the client in the Set-Cookie
header in the HTTP response and the browser stores the cookie locally and includes it in future HTTP requests to the same domain.
Client to Server:
When the client makes subsequent HTTP requests to the same server, it sends the stored cookies back to the server in the Cookie
header.
Session Management
Personalization (User Preferences)
Tracking and Analytics
Target Advertising
Cross-Domain Authentication (Single Sign-On)
Name=Value
: The cookie’s name and value.
Example: session_id=abc123
Expires
: The date and time when the cookie will expire. If not set, the cookie is a session cookie and will expire when the browser is closed.
Example: Expires=Wed, 09 Apr 2025 10:18:14 GMT
Max-Age
: Similar to Expires
, but specifies the number of seconds from the current time until the cookie expires.
Example: Max-Age=3600
(cookie will expire in 1 hour)
Domain
: Specifies the domain for which the cookie is valid. If not set, the cookie is only valid for the domain that set it.
Example: Domain=example.com
(cookie will be sent to all subdomains of example.com
)
Path
: Specifies the URL path for which the cookie is valid. If not set, the cookie is valid for the entire domain.
Example: Path=/dashboard
(cookie is only sent for requests to the /dashboard
path)
Secure
: Ensures the cookie is only sent over HTTPS (secure connections).
Example: Secure
(cookie is only sent over HTTPS, not HTTP)
HttpOnly
: Indicates that the cookie is not accessible via JavaScript (to prevent attacks like XSS).
Example: HttpOnly
(cookie can't be accessed by JavaScript)
SameSite
: Controls whether the cookie is sent with cross-origin requests. There are three possible values:
Strict
: Cookie is only sent for same-site requests.
Lax
: Cookie is sent for same-site requests and some cross-site requests (e.g., top-level navigation).
None
: Cookie is sent with all cross-site requests, but must also be Secure
.
Example: SameSite=Lax
When the client tries to retrieve a protected resource, the Server usually replies with the following response:
where <authentication_type>
is the authentication type required by the server to access the resource.
Then, the browser will send a new HTTP request with the following format:
where the <authorization_string>
differs with respect to the <authentication_type>
required by the Server.
Basic Authentication is the simplest authentication method. It involves sending a username and password encoded in the Authorization
header as base64.
Server Response:
Client Request::
where dXNlcm5hbWU6cGFzc3dvcmQ=
is the base64-encoded string of username:password
.
Note: Basic authentication is not secure unless used over HTTPS
Digest Authentication is more secure than Basic Authentication because it hashes the credentials and uses a challenge-response mechanism, preventing the password from being sent in plaintext.
Server Response:
Client Request::
In this case, the response
value is a hash of the password and other parameters (nonce, URI, etc.), providing a much more secure method of authentication compared to Basic Authentication.
Bearer token authentication is commonly used in modern web applications, especially with APIs. This method is often used in conjunction with OAuth 2.0. Instead of sending a username and password, the client sends a token that grants access to the requested resource.
Server Response:
The server issues a 401 Unauthorized
response with a challenge.
Client Request: The server expects the client to send the token in the Authorization header.
OAuth 2.0 allows clients to authenticate without needing to expose user credentials. Tokens are typically short-lived and can be refreshed, making them more secure than basic credentials.
Digest Authentication can be further secured with HMAC (Hash-based Message Authentication Code), where a hashed value is generated based on a shared secret and request data (like nonce
, timestamp, URI, etc.). This method is commonly used in REST APIs and web services.
Server Response:
The server issues a 401 Unauthorized
response with a challenge.
Client Request:
NTLM is a Microsoft protocol used for authentication in Windows environments. It is commonly used for accessing Windows-based resources and is supported by servers like IIS (Internet Information Services).
Server Response:
Client Request:
Kerberos is a network authentication protocol that uses secret-key cryptography to provide strong authentication. It's often used in enterprise environments, such as when accessing corporate services.
Server Response:
Client Request:
Some applications or services implement custom authentication schemes. This could involve sending specific headers or tokens that are validated by the server.
Client Request:
An HTTP proxy acts as an intermediary between the client (usually a web browser) and the server. When a client sends a request to access a resource, the proxy server forwards that request to the destination server and then relays the server's response back to the client.
When a client makes a request to a web server through an HTTP proxy, the request is first sent to the proxy server. The proxy server then forwards the request to the destination server.
Client to Proxy Request: The client sends an HTTP request to the proxy. For example:
Proxy Request to Destination Server:
The proxy server takes the client request and forwards it to the destination server looking to Host
header. For example:
The X-Forwarded-For
header is often added by the proxy to indicate the client's IP address. The proxy could also modify headers like User-Agent
or Referer
.
Once the destination server responds to the proxy server, the proxy forwards the server’s response to the client.
Server → Proxy Response: The server sends an HTTP response to the proxy. For example:
Proxy Response to Client: The proxy server forwards this response to the client. For example:
The proxy may add the Via
header to show the request was handled by an intermediary (the proxy server).
Caching by the Proxy: Some proxies cache the response (especially for frequently requested resources) to improve performance and reduce the load on the destination server. If the requested resource is cached, the proxy will serve the cached content instead of forwarding the request to the destination server again.
Forward Proxy:
A forward proxy sits between the client and the internet. The client sends requests to the proxy server, which then forwards those requests to the internet.
Commonly used in corporate environments or for content filtering and privacy protection.
Reverse Proxy:
A reverse proxy sits between the client and a web server, but it is the web server that sends requests to the reverse proxy.
Commonly used for load balancing, SSL termination, or caching. It appears as the web server to the client but forwards requests to one or more backend servers.
Transparent Proxy:
A transparent proxy intercepts requests and responses without modifying them. The client may not even be aware that the proxy is involved.
Commonly used for monitoring or caching requests.
Anonymous Proxy:
An anonymous proxy hides the client’s IP address, allowing the client to request resources anonymously. The server only sees the IP of the proxy, not the actual client.
X-Forwarded-For
: This header is used by proxies to pass along the client’s original IP address. This is particularly useful for reverse proxies or when a request is routed through multiple proxies.
Example: X-Forwarded-For: 203.0.113.195
Via
: This header indicates the proxy server's involvement in handling the request. It can contain the version and address of the proxy that handled the request.
Example: Via: 1.1 proxy.example.com
X-Real-IP
: Another header used to pass the real client IP address. This is often used by proxies that need to provide the original client IP address to the destination server.
Example: X-Real-IP: 203.0.113.195
Cache-Control
: Specifies caching behavior. Proxies can use this header to determine how responses should be cached.
Example: Cache-Control: no-cache
(prevents caching by proxies)
Authorization
: When the client is required to provide credentials (like username and password), this header is used in proxy requests for HTTP authentication.
Example: Authorization: Basic YWxhZGRpbjpvcGVuYXV0aA==
HTTP Proxy Requests and Responses act as intermediaries between the client and the server. The proxy server forwards the client’s request to the destination server and then relays the server's response back to the client. The primary use cases for proxies include improving performance through caching, filtering content, hiding the client’s identity, load balancing, and enhancing security. Proxies can add or modify headers, and use features like caching and authentication to optimize the flow of data between the client and server.
HTTP request: HTTP response: