Bits and a Pint: Fewer Requests vs. Response Sizes

It is well known that when it comes to requests it's best to have fewer on a page for better performance. However, what if our attempt to do this creates very large responses? In this situation is it still a good idea to maintain this rule or should we split a request into multiple asynchronous parallel requests that return smaller responses? That was the question that came up the other day as I was testing a site and noticed that one of our javascript libraries was starting to creep up in size even though it was minimized and gzipped. In order to answer this we need to keep in mind several other rules that deal with receiving the smallest response we can. In situations like this, where performance rules can collide, which ones should we give priority to? Should we ever break the rule of fewer requests? To answer this, let's dive into the rules...

Fewer Requests

A simple rule - fewer requests = better performance. This is a well known performance rule considered a staple in every performance engineer's checklist and is usually listed as the #1 rule - https://developer.yahoo.com/performance/rules.html. Why is this considered a "primary" rule? Here are several of the performance costs associated with having multiple requests:

Limited # of parallel connections allowed from the browser
Cost of connection setup
If HTTPS, cost of SSL negotiation
Back-end load cost due to increased threads to handle a single page

In examining the above costs we can get a clear picture as to why this can be considered the #1 or "primary" rule of performance tuning for the front end. However, in order to answer the proposed question we need to look at the other rules too.

Minimizing Response Sizes

I'd like to refer to this as a wrapper rule as there there are several rules that deal with this part. In detail they are:

Gzip components
Minimize css and javascript
Optimize images
The newer rule - keep components under 25k

The spirit of this rule should be considered one of the top 5 rules; Gzip components is #4 from the list, https://developer.yahoo.com/performance/rules.html. The benefits from this rule can be observed from several factors, some of which can be mitigated by other rules higher up the priority chain. However, just because another rule can mitigate the performance cost that this rule handles, does not mean we should not follow this rule. Some of the performance factors to consider from this rule are:

Content download time
Bandwidth
iPhone cache size restrictions - limited to 25k
Memory - evaluation of larger javascript files could lead to potential memory issues or unnecessary GC pauses
Parsing/Evaluation time

Sizing it up

Now that we understand the significance of each rule and it's impact on performance we can consider the question - should we ever break the "primary" rule in favor of smaller response sizes? The general rule of thumb and gut feeling is to adhere to the "primary" rule and never break it. However, in my experience I've found that it depends entirely on your application.

Consider the following use case:

Your application consists of a light weight HTML container that depends entirely on CSS and Javascript to execute it's functionality. The javascript for your site consists of custom code and 3rd party libraries some of which is needed throughout the site and some of which is very page specific. You might be thinking I'm describing your site, that's because this is a very common use case for the modern site. In this case would you create one large javascript file to adhere to the "primary" rule? Chances are NO! Instead you would create three distinct javascript files loaded by three requests consisting of libraries, common and page specific or something along these lines. Why? The answer is obvious, in the case of this application we want to take advantage of the long term caching of our libraries that will rarely change; loading and caching the common logic that all pages will use and may change at a more moderate rate and finally having the page specific logic contained in a small light weight file that may be changed frequently. In doing this we are attempting to minimize our performance cost by the effective use of caching.

We've just proved that there are reasons to break the "primary" rule, but we haven't answered our question yet... "should we split a request into multiple asynchronous parallel requests that return smaller responses". Knowing that the answer depends on our application we need to consider our architecture and user base. Once we have a theory on what is best we need to prove it by conducting tests.

My Use Case

As I mentioned in the opening, I started to ponder this question due to the cost I saw in downloading one of our javascript files as it continued to get larger. This particular javascript file is now ~130k after being compressed, minimized and gzipped, so the only option let to minimize the response further is to either remove functionality or split it into multiple files/requests. Before I decided to test which would be better, I decided to collect more information. I started with collecting the pertinent data for our performance factors for the rules we are reviewing.

As this request is for a javascript file that I'm considering breaking up I considered the following factors to be the vital ones.

Cost of connection
Content Download Time
Evaluation Time
Memory - Optional (not reviewed in our case)

Cost of connection

To capture this, open your chrome developer tools and initiate the page load. The click on the request in question and review the timing objet (seen below).

In my applications case for this request we have the following:

Stalled - 20ms (This will vary and is out of our control for this request, but again when considering adding requests it must be considered)
DNS Lookup - 0ms (Our application uses DNS prefetch so this price was paid already)
Initial Connection - 0ms
SSL - 65ms (Our application is not using HTTPS today, however, it will be in the near future so we must consider this and test with it)
TTFB - 101ms (This speaks more to your network and connection, however, when considering adding requests this is important.)
Total Cost of Connection - 186ms

Content Download Time

To capture this, return to your chrome developer tools and initiate the page load. The click on the request in question and review the timing objet. You can also see this within the network tab.

CD time: 189ms

Now that we know the download time we can calculate our average time per TCP segment by assuming the standard MSS of 1440bytes. Segment Time (ST) = CD time / # of Segments (request size in bytes / 1440). Why is this important? It will help is in predicting the potential content download time of each file if we split our javascript file up.

Using our example we get ST = 189 / (132kb / 1440b) = 189 / 92 = 2.05ms.

NOTE: These are estimates, for actuals you could use a packet sniffer like Wireshark and watch the communication/traffic between your application and the server

Evaluation Time

For a quick view of this open chrome developer tools and review the network panel looking at the "parser" time. You can also run a profile of the application with chrome developer tools and view the particular javascript's evaluation for more details on how the time is allocated.

The above indicates an evaluation time of 281ms.

Note: The uncompressed size (non-gzipped) is 407kb, which is what the parser evaluates.

Predicting Costs and Validation

Prediction Time

Now that we have the data we can predict the potential cost of splitting the file into 2 requests of roughly equal sizes.

1 Request = 470ms (seen above)
2 Requests taking advantage of asynchronous requests

As these will be asynchronous requests we should expect that the total time spent retrieving and evaluating both scripts would be something like this:

Initial Cost + Longest CD time + evaluation time for both (remember eval time pauses the other requests which is why both are added in).
Initial Cost: 186ms (this should not change)
CD Time: For our app we are looking at splitting the content in half so our CD time would be predicted at: Segment Time (ST) * (New File Size / 1440); 2.05ms * (66000 / 1440) = 2.05 * ~46 ~ 95ms.
Evaluation Time: considering we are splitting this we may want to halve the eval time, but this is in no way a guarantee, just something to use for estimating... So this would give us a tie of ~140ms
Predicted total time: 186 + 95 + 140 = 421ms; yielding a savings of ~50ms.

Testing our theory

To test this we can create a simple HTML page that contains just the request we are interested in and test this against another HTML page that contains the file split into two requests returning ~equally sized javascript objects. I did not test with HTTPS, remember we are assuming that the initial cost is the same for each request to the specified domain. Another caveat, I was using a webserver without gzip enabled so my requests will vary from the above times. Here were the results of my test:

Test 1 - the control

Initial Connection Cost + CD Time = 453ms + 31ms parsing = Total Time: 484ms

Test 2 - Split requests

Initial Connection Cost + Longest CD time + evaluation time for both = 293 + 29 + 31 = Total Time: 353ms

Results

Our prediction said we would see ~50ms savings, which would not justify a change given that it takes at least 100ms for a person to notice. However, our tests are showing a savings of ~130ms, which would suggest that we make the change. Before we go about doing this, we need to consider a few other things:

Is this repeatable? What's the average?

Repeated tests showed the above to be close and that the average was closer to ~100ms.

How would this change affect our architecture? - remember the tuning needs to consider the application needs.

Changing this would break our model of loading common/core objects in a single file to loading it in 2 files. While not ideal, this can be mitigated as our particular file contains ~50% libraries and ~50% custom code, giving us a good split option/location. However, our architecture is also "mobile first", so we need to consider the limitations of a mobile device above all else and mobile browsers more limited in the number of simultaneous connections it can handle. Finally, being mobile first we would need to consider the browser side caching limitations of the iPhone, which is restrained to 25k per object.

If we spend the time making the change will the user truly notice?

~100ms is considered the threshold for a user to perceive changes - which this change will gain us - however, when viewed as part of the application as a whole will the user actually notice? According to further research it would actually take a 20% overall shift for a user to truly perceive the change - see "The 20% Rule" referenced by Denys Mishunov in his article "Why Performance Matters: The Perception Of Time". For my application the home page takes an average of ~3.5s to load so a 100ms gain is only going to yield a 2.9% improvement. That's a long way from 20%, so chances are that our users will not notice an improvement.

Armed with all of our data we decided that this was not the best change for our application. While this change does seem to improve performance; we are a mobile first application and this change may actually hurt our performance there. At this time we elected not to make this change, but instead to focus on other issues. In the future we may revisit this option and reverse this decision...

Conclusion

In conclusion there are times when you may want to trade off the performance gains from reduced request counts for multiple requests with smaller responses, however, before you do it you should always consider your application's needs and validate it.

Bits and a Pint

Daily Power Ups

Wednesday, September 30, 2015

Fewer Requests vs. Response Sizes