This repository contains the source code implementation of the SOSP '24 paper Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving.
Please note that the arXiv version is not up to date with our SOSP submission. We will update the arXiv paper once the camera-ready version is finalized.
Apparate is implemented in Python. We have tested Apparate on Ubuntu 22.04 with Python 3.8.13.
Detailed instructions on how to reproduce the main results from our SOSP paper are in EXPERIMENTS.md.
@article{dai2023apparate,
title={Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving},
author={Dai, Yinwei and Pan, Rui and Iyer, Anand and Li, Kai and Netravali, Ravi},
journal={arXiv preprint arXiv:2312.05385},
year={2023}
}