widgeo.net

Tuesday, 29 July 2014

CDN - Content Delivery Networks

A CDN (Content Delivery Network) is a global cluster of caches that can serve as local caches for static files (objects).

If a visitor of a website or a user of an application request certain files (e.g. images, pdfs, javascript, CSS files, etc), instead of the hosting server responding with these objects, the CDN takes care of serving them. Because a CDN takes the geo-location of the user in account, the file will be served from the caching-node closest to the visitor. This means data can become available to the user with lightning speed, independent of the location of the website or application.

If the caching-node that is assigned with serving the data to the visitor, doesn't have the requested object available, it will ask other nodes in the CDN first. If the object is not in the CDN, then the object will be requested from the Origin Server (the server or object store container where the source files reside).
In order for a CDN provider to know which objects belong to which request, a hostname / Origin Server combination needs to be registered with the CDN provider. A hostname is a subdomain of a domain controlled by the customer, like images.mydomain.com for instance. The Origin Server reflects the internet path to the objects that are to be served from the CDN. After registering both pieces of information with the CDN, it knows which objects belong to what subdomain.

At the end of the registration process for the hostname / Origin Server combination, a CNAME is sent back by the CDN provider. A CNAME links one subdomain (the hostname) to another subdomain (an entry point into the CDN) at the DNS level. To actually make the connection, the acquired CNAME must be implemented in the DNS record for the domain (mydomain.com).

If the CNAME returned by the CDN is abcdefghijk.a.b.cdn.com and the hostname is images.mydomain.com, than the CNAME record in mydomain.com should be similar to:

images.mydomain.com. IN CNAME abcdefghijk.a.b.cdn.com.
Only after the DNS is updated and the name servers have incorporated the change, the link with the CDN is ready. A request for http://images.mydomain.com/ will subsequently be served by the CDN.

If we continue the example for http://images.mydomain.com/logo.png, the first time the CDN receives a request for logo.png, it will collect it from the Origin Server and put it in it's cache. Every subsequent request for logo.png will be significantly faster.

Be aware that, in order to gain the maximum benefit of the CDN, some changes to your site could be needed. Static files that can be used in the CDN might be combined and dynamic elements might have to be separated and loaded at a later stage.

Request Flow with a CDN





1. Client requests logo.png on images.mydomain.com
2. The DNS system finds the CNAME and redirects the request to the CDN
3. If logo.png is not found or expired in the CDN, it is requested from the Origin Server and put in the CDN
4. The CDN responds to the Client request with the file logo.png

During the time the file logo.png is in the CDN, step 3 is skipped from the request flow and the response will be significantly faster.


CDN Cache control and Invalidations

A CDN stores a copy of your objects in a global network of caching servers. Upon request your objects will be served from these caches. Objects that are not requested for a period of a year will be cleaned out of the caching servers, but if there is a continuous demand for certain objects then they could remain in the cache indefinitely.

The problem with this is that new versions of these objects on the Origin server are not automatically replacing old versions of objects in the cache. For that reason we advise to assign a maximum lifespan to an object. This can be done through the use of Cache Control Headers.

Cache-Control HTTP Header

Cache-Control headers are HTTP headers that tell the cache how long an object should remain cached for. These headers are normally set by the Origin server, by Apache or Nginx in the case of a hosting server or by values in the meta-data for an object in the case of an object store.

RFC 2616 (section 14.9) has the full details on the Cache-Control header. Below we list the values that are most important in determining the lifespan of an object.

max-age=$seconds

The max-age value is the most important value for the cache-control header to determine the maximum lifespan of an object in the cache. It will always overrule any directives set by an Expires header for example (see RFC 2616 section 14.9.3).

The assigned variable is the age in seconds, so when an object should remain in the cache for a maximum of 15 minutes, the header should read: Cache-Control: max-age=900. We recommend maximum lifespans between one minute and one hour depending on the situation.

must-revalidate

We also recommend to add the must-revalidate flag to the Cache-Control header. This flag tells the cache to strictly follow any maximum lifespan information you give it about an object. We like to include this because caches somtimes take liberties with the maximum lifespans of objects.

public

This flag forces an object to always be cached, even if it normally would not.


Conclusion

When we want an object to be cached for a maximum of 15 minutes, the best header to set is:

Cache-Control: public, max-age=900, must-revalidate


Configuring the Cache-Control headers

Here we will provide an example on where and how to set the Cache-Control headers. For web servers we have an example for the Apache web servers. Please read the web server documentation for implementation details. We have an example concerning the object store: setting the cache control headers through the interface and setting them through a command line API call.

Apache2 web server

In order to be able to set the cache-control header, the mod_headers module must be installed and loaded into the Apache web server. Together with the FilesMatch directive, you can control the cache control header for specific file extensions. 

The example below is valid in both a .htaccess file or a VirtualHost segment.

<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
  Header set Cache-Control "public, max-age=900, must-revalidate"
</FilesMatch>


The ObjectStore (API)

In order to update HTTP Headers via the API, a POST request must be sent to the full path of the object you wish to update. A POST request replaces current headers on the object so it is imperative to resend important headers. An important header to resend is the Content-Type header, this header determines, in for instance a browser, what type (image/jpeg, video/mp4, text/plain, etc)  the object is and how to display it. In the below example we will update a jpeg image using the command line and the cURL binary:

curl -X POST -H "X-Auth-Token: $keystone_token"  \
-H "Cache-Control: public, max-age=900, must-revalidate" \
-H "Content-type: image/jpeg" \
https://$container.$projectid.objectstore.eu/photo-1.jpg



4 comments:

  1. I visited your blog for the first time and just been your fan. I Will be back often to check up on new stuff you post!
    VPN IP Changer

    ReplyDelete
  2. Post is very informative,It helped me with great information so I really believe you will do much better in the future.
    vpn privacy

    ReplyDelete
  3. You have some honest ideas here. I done a research on the issue and discovered most peoples will agree with your blog.
    خرید vpn موبایل

    ReplyDelete
  4. Thanks for your article :) If you are looking for Free CDN provider, I recommend try Hostry, he offer Free CDN service ensures a lifetime free CDN for startups and beginning webmasters!

    ReplyDelete