A Sirius Pitch:
As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter system architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open representative workload is an obstacle facing our community in addressing these questions.
In this tutorial, we present the design of Sirius, an open end-to-end IPA web service application that accepts queries in the form of voice and images, and responds with natural language. We then show how this workload can be used to investigate the implications of five points in the design space of future accelerator-based server architectures spanning GPUs, manycore throughput coprocessors such as Intel Phi, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to our five accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points.