To prepare for our big jump in WordPress versions coming up soon [ from 3.1.4 to 3.6.1 ], we needed to review the changes that have occurred over these 4 versions. Below is a summary of these changes for WordPress 3.2, broken down into “changes” and “new features”. Changes are any deprecated pieces of code or changes to how a piece of code functions. New features are self-explanatory. For our purposes, the list of changes is of greater importance to ensure nothing breaks.
Version 3.2 added a new default theme, a distraction free writing mode and other admin improvements. The major change with 3.2 was an update to the PHP and MySQL requirements, from a minimum of PHP 4.3 & MySQL 4.1.2 to a minimum of PHP 5.2.4 & MySQL 5.0.15. We meet both of these minimum requirements currently so it is not a concern for us. Additionally WordPress dropped support for IE6 in this version. These changes plus improvements to the codebase resulted in better performance and faster page loads.
Below is a list of relevant changes and new features introduced in WordPress 3.2 for developers. The most useful new features include a WP_Meta_Query class similar to WP_Query, a new is_multi_author() template tag, and hooks into wp-admin/update-core.php.
- IE6 no longer supported, EOL start for IE7, uses Browse Happy.
- Show the sticky posts checkbox (“Stick this post to the front page”) Only when author has ‘edit_others_posts’ capability
- Rename duplicate ‘delete_post’ and ‘deleted_post’ actions to ‘before_delete_post’ and ‘after_delete_post’
- Themes sub-menu under Appearance [ hide for certain users? ]
- new is_multi_author() template tag
- Allow plugins to hook into wp-admin/update-core.php
- Allow custom author elements such as email
- Add option_page_capability_$option_page filter
- Allow taxonomies to be queried by $_GET parameters on non-taxonomy url’s
- Add a per-post-type nav menu items filter for plugin control
- Add .ics / text/calendar to the whitelist of allowed file types
- Add cache_domain argument to get_terms() to allow caching to a unique set of cache buckets; useful when taxonomy queries have been modified via filters and need their own cache space
- Allow get_pages() to support multiple post statuses
- Allow WP_Query ‘post_status’ parameter to accept an array, as well as a singular value and comma separated list
- Introduce WP_Meta_Query and relation support
Version 3.2.1 was also reviewed but only contained a bug fix and is not summarized here.
It’s official — the next WordPress upgrade at BU has (finally) begun!
We are using the latest stable release — 3.6.1 — as there is not yet a release candidate for 3.7. It’s worth mentioning that the core team has decided to follow in the footsteps of many other open source projects and shorten their development cycles. According to their roadmap, 3.7 will be out some time in October, followed by 3.8 in December.
BU devs and designers — you can get yourself set up with a shiny 3.6.1 sandbox today:
Comments are now enabled on this blog. This is the requisite test post :-)
Problem: Unrealistic Load Tests Problem
Solution: Load URLs using Real Browsers
I’ve created a load-urls.py script in the bu-toolkit repo which loads up urls from a file in multiple browsers at the same time. The script will wait until the onload event is fired on the page to consider it ready. It uses Selenium’s webdriver library to facilitate this work and supports browsers such as PhantomJS (default), Firefox, and Chrome. The benefit of using PhantomJS is that it is by default a headless browsers, whereas the other browsers require special headless versions and/or virtual displays to get working optimally.
Running some adhoc urls for our TEST servers, I was able to make the following observations:
|Threads||Load Average on each TEST server|
It should be noted that I was not able to run the 20 threads on malahide server (the server used to run load-urls.py) for too long as it’s own load average climbed up to 15. Also running the script for too long at just 10 threads may cause the server to become unresponsive, as it did to malahide server. So we need to use multiple servers set at 10 each to generate greater load for a short period of time. However, if the load needs to be maintained for a longer period of time, the task should be subdivided even further.
As we approach the Fall semester, traffic is starting to ramp up.
With the greater amount of data we can more confidently draw some conclusions. While the addition of CPUs to the application servers had a definite, measurable positive effect on system load, this did not translate into much, if any, of an impact on request response times, which are still creeping up.
A reminder: we had several site launches around that response time spike which resulted in a rather odd distribution of items in our memcached layer. It’s possible the additional CPU did have an effect on response time, but it is not apparent because it is merely mitigating the degradation caused by the uneven cache distribution.
A quick update on application response times. If you recall, we recently added additional CPU to our application servers:
- On 8/8, Scott assigned two additional CPUs to ist-wp-app-prod02
- On 8/11, Scott assigned two additional CPUs to ist-wp-app-prod01 and ist-wp-app-prod03
Here’s an updated chart with some additional data.
Whereas our last post detailing response times suffered because we didn’t have enough data after the adjustment to the environment, this post clearly highlights how not having enough data before the adjustment is making it difficult to draw conclusions.
It does look promising, though. Performance is slightly better overall, and does not appear to be spiking heavily from day to day.
I am going to pull additional logs from prior to the adjustment and create an updated chart later today.
My last post end with a few questions. I realized as I was writing them that I had an easy way to answer one of them — “How much (if any) would response times improve by moving memcached to servers that were not overloaded serving web requests?”.
This graph is similar to this one from my previous post, but adds a fourth server to the memcached pool: ist-wp-app-prod04. When I ran these benchmarks, app-prod04 was not yet accepting web requests, and as such is representative of response times we might see if the memcached instance was not running on a busy app server.
Response times from app-prod04 are nearly 50% quicker than the other remote instances (app-prod02 and 03). Again the benchmark was run from ist-wp-app-prod01, so response times from that instance are still (understandably) the quickest.
Seems like a good argument for moving memcached off of the app servers…
We used the memcached benchmark script to sample response times of common cache operations (get, set and delete) against different server configurations with data generated from actual WP requests.
We generated sample cache data using 20 different pages across four of our more popular sites. Most of these page requests were for administrative pages (/wp-admin/), which make much heavier use of memcached in order to cache data structures returned from frequent / expensive database queries.
|URL||Cache Items||Size in Bytes|
Pages without /wp-admin/ are front-end page requests. The majority of these have one or two cache items — this is due to the Batcache plugin, which stores fully generated HTML pages for front-end page requests by unauthenticated users.
We ran the benchmark script against two different environments:
- Systems Test / Development
For each environment we spun up isolated memcached instances to handle client requests from the benchmark script.
There were some notable variances between test environments.
# Systems Test / Development ist-wp-app-syst01$ memcached -d -p 11212 -m 64 -c 1024 ist-wp-app-devl01$ memcached -d -p 11212 -m 64 -c 1024 # Production ist-wp-app-prod01$ memcached -d -p 11214 -m 128 -c 1024 ist-wp-app-prod02$ memcached -d -p 11214 -m 128 -c 1024 ist-wp-app-prod03$ memcached -d -p 11214 -m 128 -c 1024
Where -p is the port to accept requests on, -c is the maximum amount of concurrent connections to accept, and -m is the amount of memory to allocate. Note that the sample data did not exceed 64M in size, and as such this variance did not make any difference — there were no cache evictions reported during test runs.
The benchmark scripts were run from ist-wp-app-syst01 for Systems Test / Development, and ist-wp-app-prod01 for production.
While we did run the prod benchmarks during a low-traffic period, system usage was still vastly different from SYST / DEVL due to the volume of web requests being handled on the boxes running memcached. Systems test and development served a combined 68565 requests over a 24 hour period during which the benchmarks were run. In contrast, the production servers handled 4957761 during that same period.
15-minute load average across the 3 prod app servers averaged 2.1 during benchmark tests, with CPU usage hovering around 40%. This is compared to a 0.1 load average on ist-wp-app-devl01, with 5% CPU usage. (And even less on ist-wp-app-syst01).
Needless to say, we were not expecting the same results between environments.
The benchmark script does the following, starting from an empty cache:
- Set all cache items for a given page request
- Get all cache items for a given page requests
- Delete all cache items for a given page requets
It carries out these operations sequentially in two modes — pooled, and once per server.
With this test approach in mind, we were looking for answers to the following:
- How long do set, get and delete operations typically take in our environments?
- How do response times vary for these operations when the memcached server is local to the client? On the same blade? On a different blade?
- How do response times vary when the system is under load?
Here are some graphs generated from the benchmark results. Click through to the attachment pages for some commentary.
Results pretty much confirmed what you would expect. In summary:
- Set is slower than get is slower than delete.
- Local requests are much faster then remote requests. For production, requests on the same blade (prod01 -> prod02) were marginally faster then requests across blades (prod01 -> prod03).
- System load greatly impacts response times for all memcached operations.
And some more questions…
- How much (if any) would response times improve by moving memcached to servers that were not overloaded serving web requests?
- How expensive are evictions? Default TTL for cache items is indefinite in the WordPress object cache plugin — should we consider expiring items at the cost of more frequent set operations?
More on that first point in a follow-up post, coming soon.
As part of the effort to put together better load tests, I’ve written a simple shell script for performing some access log analysis. Use
fetch.sh to obtain log files from production servers and
analyze.sh <directory containing files> to perform analysis. This will take up a bit of space for a single day.
After running this script there will be new files in the directory showing wp-admin and admin-ajax requests ordered by frequency for the day.
Fetch scripts from github.
So far after looking at several days of results, some preliminary items of note are:
- Most wp-admin requests are to admin-ajax.php.
- Roughly half of those admin-ajax.php requests, roughly half are POSTS.
- Roughly half f those POSTs to admin-ajax.php (and surprisingly some of the GETs) have no action in the URL that we can use to identify what’s being done. This is a significant amount of our admin.
I’m compiling a list of use cases for test planning here.
I’ve create a Github repos:
… for housing tools and utilities we create for our performance analysis. Right now it contains only one script, bu-response-percentiles.py, which I used to calculate the response time percentiles referenced in my last post.
The “scripts” directory is intended to contain executable, installable scripts, that is, scripts that we might want to install to /usr/local/bin on our workstations.
The “sample” directory is intended to contain executable scripts that we might want to use passively. For example, a script that executes periodically and samples data about memory or CPU usage of a particular process group.