All about HTTP

Internet Network
- IP protocol has several fall-offs. Even if the receiver is out of service or non-existent, the packet is still delivered. Also, packet might not be delivered in sequence as intended because it is too big and so it is cut into pieces and delivered separately. Also, in the middle of this packet delivery, some nodes might be closed so it may cause a loss of packet. Finally, if there are 2 applications using the same IP address, there is no way to differentiate the 2.
- There is IP, TCP (3 way handshake), UDP
- IP is hotel. Port is hotel room number.
- DNS solves problems of remembering long IP addresses and when IP addresses change.
- Uniform Resource Locator(URL) or Uniform Resource Name(URN)
- URL is made up of scheme, user info(not rly used), host (either domain name or ip address), port, resource path (hierarchial structure), ?query parameter (in key-value format) and #fragment (normally in html).
HTTP basic
- HyperText Transfer Protocol (HTTP)
- Stateless means server does not contain the client’s condition/details at that point of time. Stateful is problem because each server serves a specific purpose and if one goes down, the others cannot serve the purpose that the down server was serving. Stateless is good for like when 6pm chicken discount even, thousands of people connect at same time. But still hard to resolve this issue so maybe display a static HTML page first for them to read and play around first before the event connect button.
- Connectionless means cuts TCP/IP connection once client request has processed. But TCP/IP connections needs to be made again for another request, so extra time is taken for 3 way handshake. This is solved by HTTP Persistent Connections.
- HTTP message is either request or response.
- HTTP message structure is made up of start-line, header, empty line (CRLF) and message body.
- For HTTP request by client, start-line is also called request-line. request-line = method + request-target (path) + HTTP version.
- HTTP methods - GET is searching or asking for resource in server while POST is giving some data to server for server to do something about it.
- Path, or absolute path, starts with / (backslash) right after the HTTP methods. absolute-path?query
- For HTTP response by server, start-line is called status-line. status-line = HTTP-version + space + status-code + space + reason-phrase + CRLF
- 200 is success, 400 is client request is wrong, 500 is server error
- reason-phrase like OK after status-code is just for humans to easily understand it better than the number status-code
- HTTP header: header-field = field-name + “:” + OWS field-value OWS (OWS means allows space in between)
- field-name can be caps on or off. There should not be a space between field-name and “:”
- Message body can be HTML document, image, video, JSON, or basically anything byte can display
HTTP method
- For naming URI, need to differentiate resources/concepts like members. Finding members by id is not resource but member itself is a resource. The method of resource will be taken care of by HTTP methods
- GET - data that client wants to get from server is through ?query
- POST - data delivered through message body, It is for starting a new process, making new resource or changing resource
- PUT is substitute resource and if there is no resource, make it (e.g. putting file in folder and if there is no such file, make it and if there is an existing file, new file substitutes/overrides 덮어버려 the old one). Diff between PUT and POST is that PUT client knows the resource location and sets the URI but for POST, client does not know. However, if there is a missing member detail like PUT id only instead of id and age, the resulting change deletes the age condition as well and only the id remains. So if certain features of resource need to be changed, PATCH is better.
- PATCH is changing a part of resource (e.g. changing member’s id). But some very few servers don’t accept PATCH so in those cases, use the ultimate solution POST
- DELETE is deleting a resource
- Safe - resource doesn’t change when it is called like GET, not POST. Doesn’t consider overloading logs in server
- Idempotent - f(f(x)) = f(x) call once or 100 times the result is the same like GET, PUT , DELETE, not POST. When client wants DELETE but server gives TIMEOUT and is unable to give a response, can the client re-request it? Yes because it is idempotent. Doesn’t consider external changes made (e.g. GET something but there is a PUT that changes that, GET will get that changed details)
- Cacheable - Storing large data like images in browser cache. GET, HEAD normally used. POST, PATCH also can but caches require matching to specific keys and these 2 methods are hard to do that
- Client sends data server through 2 main methods: 1) Query parameter (?query) with GET 2) Message bodies (POST,PUT,PATCH)
- For static content, no need query parameter so just URL with GET works.
- For non-static content, need query parameter.
- HTML Form generates HTTP message via POST (GET also can) and the FORM’s data is sent through message body in key-value format (query parameter) and is url encoded (abc김 ->abc%EAFSAF%&)
- Content-TYPE:multipart/form-data can send binary data like file upload and other forms of data
- HTTP API - POST, PUT, PATCH, GET and Content-Type:application/json
- In HTTP API (Collection), Resource = Collection in member system, where server takes care of resource directory. When POST, server MAKES a URL like members/100 and gives it as response to client
- in HTTP API (Store), Resource = Store in file uploading system, where client takes care of the resource storage. When uploading file via PUT, client knows URL file/{filename}
- Since HTML FORM can only use GET and POST, need to use controller URI like /new, /edit, /delete
HTTP status code
- 1xx(Informational) - request being processed (but not rl used)
- 2xx(Successful) - request successful
- 3xx(Redirection) - Need more info by user agent (client) to complete request
- 4xx(Client Error) - Client error, client at fault
- 5xx(Server Error) - Server at fault
- 2xx(Successful): 200 OK, 201 Created (POST), 202 Accepted, 204 No Content. For 201, created resource is in response’s header field like Location: /members/100. For 202, request is accepted but has not been processed yet by server. For 204, saving button when editing document and server completes request successfully but does not give response of any other data cuz not needed.
- 3xx(Redirection): 300 Multiple Choices but not used, 301 Moved Permanently when a resource’s URI is permanently moved like /members to /users or /event to /new-event but if user uses POST, the message is lost as 301 redirects the request to GET the new page so may have to fill up the messsage again on the new page (info lost)
- 302 Found is like 301 that when redirects, request method may be changed to GET but in most cases, it is changed to GET
- 303 See Other is like 301 , but request method is ALWAYS changed to GET instead of maybe like 302
- 304 Not modified (very much used) - redirects to cache stored in local PC. Tells client that resource that it is seeking is not changed so client uses the cache stored in local PC. Response does not invlove message body because it must use local cache
- 307 Temporary Redirect is like 302 but does not change the request method (POST remains POST), when after purchase, go to purchase details,
- 308 Permanent Redirect when same as 301 but POST is redirected with POST so that it conserves the client’s info/message to the new page
- According to 3xx, if there is a location in the header, it redirects to that location
- PRG: Post/Redirect/Get - when I made a purchase via POST and reload the web browser purchase page, browser requests POST again so may purchase double again
- After purchasing with POST, the purchase result page is redirected with GET. Even after reloading, the purchase result page is displayed again with GET
- 400 Bad Request - Client sends wrong request parameter or API data format is wrong so client has to resend the correct one
- 401 Unauthorsed - Client needs to authenticate onself (login), returns this WWWW-Authenticate header that lists info of client needed for the authentication
- 403 Forbidden - Server understands the client request but the client is not authorised to access that particular resource
- 404 Not Found - That resource is not in the server or if want to hide that resource from the client but dont want to send the 403 Forbidden message
- 500 Internal Server Error - for general server problems
- 503 Service Unavailable - For temporary shut-down of server due to upgrades/work so requests cannot be carried out at that time
HTTP Basic Header
- header-field = field-name + “:” + OWS field-value OWS (OWS means space allowed)
- header-field contains all info required for HTTP methods, custom header-fields can be made too
- Representation = representation metadata (that describes the representation data) + representation data
- Content-Type: representation data’s type
- Content-Length: byte length of the representation data. Cannot use this when using Transfer-Encoding
- Content Negotation: Providing what preferred representation client is requesting to server
- Accept: client’s preferred media type
- Quality values(q) in Content Negotiation: Value from 0 to 1 where 1 is the highest priority. If value is 1, no need write number 1 for ko-KR. E.g. Accept-Language: ko-KR,ko;q=0.9,en-US;q=0.8. Also, if q is long, it is higher in priority. (e.g.text/palin;format=flowed)
- Content-Length, Content-Type, Transfer-Encoding, Content-Range
- Referer - it gives the previous URL of current request (e.g. moving A to B, referer is A)
- User-Agent - client application (user application)’s information like web browser
- Host: this header (e.g. google.com) is a MUST to be included and if multiple domains are registered to 1 same IP address, this host header helps differentiate
- Allow: this displays HTTP methods that are allowed and if a banned method is tried, 405 Method Not Allowed should be responded by server
Cookie
- 2 types of Cookie: Set-Cookie is cookie that server sends to client as response. Cookie is one that client gets from the server and saves it and when client requests HTTP, this cookie is sent to server.
- HTTP is stateless protocol, whose connection ends when client and server is done with request and response. When client requests again, server does not remember the previous request.
- Set-cookie:user=leejh and this cookie is saved in cookie storage in web browser
- expires, max-age (in seconds)
- Session cookie: if expires date is not inputted, this session cookie exists until browser exit
- If domain is mentioned in cookie, the subdomain can also be entered. But if domain is not mentioned, only the domain can be entered.
- Root path is normally assigned.
- Secure,HttpOnly,SameSite = security features of cookie
Cache
- Big data files like image are stored in browser cache.
- Last-Modified or ETag header (both are called Validator) tells when data has last been changed by server
- 304 Not Modified is sent by server and there is no HTTP body, which saves a lot of data since only the HTTP header is sent by server. 304 redirection to cache
- If-Modified-Since/If-Unmodified-Since is used with Last-Modified
- If there indeed is data change, 200 OK with header body is sent by server
- Entity Tag (ETag) stores a version instead of date into the cache data
- If-None-Match/If-Match is used with ETag
- Cache-Control, Pragma, Expires
- Cache-Control: max-age (in seconds how long this cache will last), no-cache (can cache data but always must check with origin server (not proxy server) with Last-Modified or If-None-Match), no-store(sensitive data so save in memory and delete ASAP)
- If absolutely don’t want cache, Cache-Control: no-cache, no-store, must-revalidate and Pragma:no-cache
- When no-cache + ETag and checking with origin server, if the origin server is suddenly down or sudden network disconnection between proxy and origin server, the proxy server may just send Error 200 OK instead of saying origin server is down
- must-revalidate + ETag prevents this by always getting the error 504 Gateway Timeout as response