First lets clarify that in this article I'm only speaking about the page's initial load/render performance. While I believe that each indicator has it's own place in the world of performance there are a few key indicators that we use at StubHub to track our page's performance. We have 2 main indicators that we track against company KPI goals. The first is time to interact (TTI) and the second is Full Page Time (FPT). I know what your thinking, "But wait, you just said that Full Page Time is not a good indicator of performance.", while this is generally true, the FPT indicator is a good reference for keeping 3rd party and non-critical functionality in check. Other indicators, non-company KPI goals, that we use deal with time to first byte (TTFB) and the loading of ads.
You might be asking "why don't you look at several of the other indicators, given that you've stated that each indicator is important?". The reality is that we do utilize many of the other factors - page weight, requests count, etc...; however we utilize these as to help us set our KPI goals and to help us tune our pages. I've found it especially useful to use these other indicators when giving the design teams a rough set of guidelines around page design and performance impact, which I might discuss in another article.
Our Testing System
Before we jump into how we use each of the key indicator's I mentioned above, I want to spend a moment on how we measure/collect them. For our collection process we've setup a custom monitoring system that utilizes Webpagetest (http://www.webpagetest.org/), Jenkins (https://jenkins-ci.org/) and HARStorage (https://github.com/pavel-paulau/harstorage).Jenkins
We use Jenkins as the glue that wires the whole system together. This is our controller that will execute the tests, upload the results to our collector and broadcast periodic test reports.
Webpagetest
For webpagetest we have a public API key that use to execute tests on public nodes and we have multiple private instances installed in several of our data centers across the country so that we can execute tests against both our test and production servers.HARStorage
HARStorage is an open source python/pylons/mongo based application for parsing HAR files and creating graphs and trending from that data. We've cloned this repo and made modifications to the source for our own needs.Our KPI's
Time To First Byte (TTFB)
This is a great indicator of overall environment health as well as internet routing health. Since our primary request is not cached on the edge today we an utilize this to track how well we are responding to quests from across the network. If we see this number dropping then we know we have to review our back-end for potential stalls as well as the network traversal paths for any potential issues within the backbones and/or CDN nodes that are handing off to the origin. Ideally if there is nothing dynamically generated in the primary request it should be cached and served from a CDN, which is something we are working towards, but until then we can utilize this indicator to help alert on potential issues early. Also included in here is the cost of DNS lookup and SSL negotiation, so again an alarm here allows us to make sure these vital components are behaving and that paths are tuned properly.Use Case: We added a new server to our pool that would handle specific subdomain traffic and we started to see the TTFB along this path increase at the 90th% causing us to investigate. What we found was that for some of the requests coming in SSL the negotiation was taking a long time. The cause was that the paths for these new subdomains were not added to our CDN's DNS/SSL negotiation service so any call's from the east coast were forced to traverse tour our west coast server locations to do the necessary lookups/negotiations.
Time To Interactive (TTI)
This is one of the most important indicators as it speaks to when the page is actually ready for the user to interact and conduct the business the page is intended for. If you use webpagetest as your testing platform you can get a sense of this by reviewing the XXX. However, to get an accurate number for our App we created a custom timing object that uses the Performance User Timing spec. This spec isn't supported by all browsers yet, so the code object needs to handle this in the case of non-supporting browsers; but for our synthetic tests we are utilizing chrome which does support the spec, thus allowing us to capture, display, graph and alert on the data.Our Custom object
SHCustomLoadTiming = window.SHCustomLoadTiming || {}; .... .... .... SHCustomLoadTiming.uxLoadTime; SHCustomLoadTiming.setUxLoadTimeToNow = function() { if (!SHCustomLoadTiming.isSupported) { return; } var now = new Date().getTime(); SHCustomLoadTiming.uxLoadTime = now - performance.timing.navigationStart; SHCustomLoadTiming.setPerformanceUserTimeMark('user-ready'); SHCustomLoadTiming.setPerformanceUserTimeMeasure('user-ready', 'navigationStart'); console.debug('SHCustomLoadTiming: New uxTimer: ' + SHCustomLoadTiming.uxLoadTime); }; SHCustomLoadTiming.setPerformanceUserTimeMark = function(name) { if (!SHCustomLoadTiming.isSupported) { return; } //Clear previous marks inbound mark var markName = 'mark-' + name; window.performance.clearMarks(markName); //create the new marks window.performance.mark(markName); }; SHCustomLoadTiming.setPerformanceUserTimeMeasure = function(name, startMark) { if (!SHCustomLoadTiming.isSupported) { return; } //Clear previous measures for inbound mark var markName = 'mark-' + name; var measureName = 'measure-' + name; window.performance.clearMeasures(measureName); //create the new measures window.performance.measure(measureName, startMark, markName); //Track custom SH measurements if (SHCustomLoadTiming.measures.indexOf(name) < 0) { SHCustomLoadTiming.measures.push(name); } };
Calling the custom object
//The application publishes an event once it finishes rendering it's critical path elements this.publishEvent('app:render-ready'); //The core captures the event and triggers the custom object timer delayLoader.once('app:render-ready', function() { SHCustomLoadTiming.setUxLoadTimeToNow(); if (location.search.indexOf('commonDelay=false') === -1) { require(['common-delay'], function(CommonDelay) { console.log('common delay is loaded'); } ); } });
While this is a slower than normal test, you can see from the screenshots of WPT how TTI is roughly 2x that of FPT.
Given the importance of this indicator on a user's perception of the performance of a page we have set this as an engineering KPI. The nice thing about using this indicator within engineering is that the engineers don't feel helpless when trying to achieve the goals. This is mainly due to prioritizing all critical path content to render first thus pushing all 3rd party data, typically out of the developers ability to tune, loading post this indicator.
Full Page Time (FPT)
I mentioned above that FPT is not a "true" indicator of a page's performance. This is because we are generally loading a bunch of 3rd party non-critical items that affect the full page load time. However, this is still an important indicator to track if only to give the business a sense of how the additional non-critical items are affecting the page in totality. At StubHub we track this as a KPI with a company goal set against it. In tracking this and comparing it to the TTI above we can tell immediately if we have bad behaving 3rd party content on the site. Once we dive in, we can see if it's a bad behaving vendor or if our business teams have added too much page bloat in the form of managed content, tracking pixels, ads, etc....Here is an example report showing at the 90th% the delta of between TTI and FPT for several pages.
No comments:
Post a Comment