Tickets for the Wellington 2017 edition of the New Zealand International Film Festival went on sale this morning at 10:00. As with 2014 and 2015 the online ticketing worked... rather poorly for the first hour or so after tickets went on sale. Leading to various tweets calling for patience -- and an apology from the NZIFF Festival Director on Facebook. I had fairly high hopes at 10:00 this morning, after being told by other Festival regulars that 2016 had been better than previous years -- but they were quickly dashed. After trying for the first half hour and getting no where I gave up until about 11:15, and then eventually managed to buy the tickets I wanted gradually, mostly one at a time, over the next hour.

As I have said previously, ticketing is a hard problem. Given a popular event, and limited (good) seats, there will always be a rush for the (best) seats as soon as the sales start. The demand in the first day will always be hundreds of times higher than the demand two weeks later, and the demand in the first hour will be 75% of the demand in the first day. That is just part of the business, so what you need to do is plan for that to happen.

The way that NZIFF (and/or their providers) have set up their online ticketing appears, even four years in, to not properly plan for efficiently handling a large number of buyers all wanting to buy at once. Some of the obvious problems with their implementation include:

  • putting the tickets for about 500 (five hundred) events on sale at the exact same moment -- so instead of a moderate sized stampede for tickets to one event, there are many stampedes for many events, all competing for the same server/network resources.

  • only collecting information about the types of tickets required at ticket purchase time, rather than collecting it in the "wishlist" in advance.

  • only collecting details of the purchaser at ticket purchase time, rather than collecting them in advance (eg, "create account profile" as part of building the wishlist), requiring another more round trips to the server, and storing more data in the database during the contentious ticket sale period.

  • relying on a "best available seat" algorithm that has no user control, and typically picks at best a mediocre seat, thus forcing many more users through the "choose my own seat" process which requires more intensive server interaction

  • not collecting money in advance (eg, selling "movie bucks" credits), which means that the period where the seat allocations are conditional waiting on payment is extended much longer, which both delays finalising seats free for the next buyer to choose from and requires more writes to the database

Less obviously, it appears as if there are some other technical problems:

  • Not designing to automatically "scale out" rapidly to more servers when the site is busy

  • Not pre-scaling to a large enough size, and pre-warming the servers (so they have everything they need in RAM) before opening up ticket sales

  • Breaking the web pages up into too many small requests and stages, increasing the client/server interaction (and thus both load on the server and points at which the process could go wrong) dramatically

  • Writing too much to disk during the processing

  • Reading too much from disk during the interactions

  • Not offloading enough to third party services (eg, CDNs)

and behind all of these is inadequate load testing to simulate the thundering herd of requests that come as an inevitable part of the ticket sales problem, leading to false confidence that "this year we will be okay", only to have those hopes crushed in the first 10 minutes.

So how do we make this ticket sales problem more manageable:

  • Stagger the event sales -- with 500+ events over 15+ days there is no good reason to put all of the events on sale at exactly the same time. It just makes the ticket sales problem two orders of magnitude worse than a single popular event. So break up the ticket releases into stages -- open up sales for the first few days of the festival at 10:00 on the first day, then open up sales for the next few days of the festival in the afternoon or the next day. Clearly having many many days with new tickets going on sale is impractical, but staggering the ticket sales opening over 2-5 days is fairly easily achieved, and instantly halves (or better) the "sales open" server load.

  • Collect every possible piece of information you can in advance of ticket sales, and write it to the database in advance. This would include all the name and contact details needed to complete the sale, a confirmation the user has read the terms and conditions, and details of how many tickets of which types the user wants. All of this can be part of the account profile and "wishlist". Ideally the only thing left for ticket sales time is seat allocation.

  • Preferably also collect the users preferred seat, or a way to hint to the seat allocation policy where to pick. Many regular movie goers (and almost all of the early sales will be regulars) will know the venues like the back of their hand, and can probably name off the top of their head their favourite seat. Obviously you cannot guarantee the exact seat will still be available when they buy their ticket, but if your seat selection algorithm is choosing "seat nearest to the user desired one" rather than "arbitrary seat the average person might not hate", then there is a good chance the user will not have to interact with the "choose my seat" screen at all. (For about half the films I booked this morning pre-entering my preferred seat would have just worked to give me the perfect seat. But since I had no way to pre-enter it, I had to go through the "I want to choose a better seat than the automatic one" on every single movie session.)

  • Ideally, collect the users money in advance. Many of the most eager purchasers will be literally spending hundreds of dollars, and going to dozens of sessions. Most of them would probably be willing to pre-purchase, say, a block of "10 tickets of type FOO" to be allocated to sessions later, if it sped up their ticket purchasing process. Having the money in advance both saves the users typing in their credit card details over and over again, and also means the server can go directly from "book my session" to "session confirmed" with no waiting -- avoiding writing details to the database at any intermediate step. (This also potentially decreases the festival expenses on credit card transaction fees by an order of magnitude.)

  • Maintain an in-RAM cache of "hot" information, such as the seats which are available/sold/in the process of being sold for each active session. Use memcached or other similar products. Make the initial decisions about which seats to offer the user from those tables, only accessing the database to store a permanent reservation once the seats are found.

  • Done completely you end up with a process that is:

    • User ticks one or more sessions for which they want to finalise their ticket purchase

    • The website returns a page saying "use your credit to buy these seats for these session", pre populated with the nearest seat to the ones they pre-indicated they wanted. It saves a temporary seat reservation to the database with a short timeout, and marks it in the RAM cache as "sale in progress". These writes to the database can be very short (and thus quick) because they are just a 4-tuple of (userid, eventid, seatid, expiry time).

    • User clicks "yes, complete sale", their single interaction if the seat they wanted (or a "close enough" one) is available.

    • The website makes the temporary seat reservations as final (by writing "never" in the expiry time), and writes the new credit balance to the database, and returns a page saying "you have these seats, and this credit left, tickets to follow via email".

    Occasionally a user might need to dive into the seat selection page to try to find a better choice, but for users in that critical first hour there is a pretty good chance that they will get something close to the seat they wanted. And the users will rapidly decide the algorithm is doing as well as is possible when they dive into the seat selection page and find all the ones nearer their preferred seat are gone already.

  • Organise the website so as much as possible is static content -- all images, styling (CSS), descriptions of films, etc, is cache-friendly static content. That both allows the browsers not to even ask for it again, and for any checks for whether it has changed to be met with a very quickly answered "all good, you have the latest version".
    Redirect all that static content to an external CDN to keep it away from the sales transaction process.

  • For data that has to be dynamically loaded (eg, seats available) send it in the most compact form possible, and unpack it on the client. CPU time on the client is effectively free in this process as there are many many client CPUs, and relatively few server resources. Try to offload as much work as possible to the browser CPUs, and make them rely on as little as possible coming from the central server.

  • By getting the sales process down to "are you sure?" / "yes", very few server interactions are required, so users get through the process quicker and go away (reducing load) happy. It also means that there is very little to write to the database, so the database contention is dramatically reduced. Done properly almost nothing has to be read from the database.

  • The quick turn around then makes it possible to do things like, eg, keep a HTTPS connection open from the browser to the load balancer to the back end webserver for the 15-30 seconds it takes to complete the sale, avoiding a bunch of network congestion and setup time. This also dramatically reduces the risk of the sales process failing at step 7 of 10, and the user having to start again (which means the load generated by all previous steps was wasted load on the server and means the user is frustrated). By taking the payment out of line from the seat allocation/finalisation process, the web browser only needs to interact with a single server maximising the chances of keeping the connection "hot", ready for when the user eagerly clicks "yes, perfect, I want those" button. Which completes the transaction as quickly as possible.

  • The quick turn around would also encourage users to purchase multiple sessions at once, rather than resorting to purchasing one ticket at a time just to have a chance of anything working. And users purchasing, eg, 10 sessions at a time will get all the tickets they were after much quicker, then leave the site -- and more server resources available for all other users.

  • Host the server as close as possible to the actual users, so that the web connection set up time is as small as possible, and the data transfers to the user happen as fast as possible. Having connections stall for long periods due to packet loss and long TCP timeouts (scaled to expect long delays due to distance) just ties up server resources and makes the users frustrated.

  • Pre-start lots of additional capacity in advance of the "on sale now" time, and pre-warm it by running a bunch of test transactions through, so the servers are warmed up and ready to go. A day or two later you can manually (or automatically) scale back to a more realistic "sale days 2-20" capacity. With most "cloud" providers you will pay a few hundred dollars extra on the first few hours, or days, in exchange for many happy customers and thus many sales. The extra sales possible as a result may well pay for the extra "first day" hosting costs. And most "cloud" providers will allow you to return that extra capacity on the second day at no extra cost -- so it is a single day cost.

Implementing any of this would help. Implementing all of it would make a dramatic difference to the first day experience of the most enthusiastic customers. I for one would be extremely grateful even just to avoid having to type in my name and contact details several dozen times (between failed and successful one-ticket-at-a-time attempts), or avoiding having to type my credit card details in a couple of dozen times in a rush to try to "complete the sale while the site is still talking to me".