YT – Distributed Data Processing Revisited
YT is a new platform for distributed data processing.
Over the course of three years we planned, developed and implemented YT – a new platform for storing and processing large volumes of data. It has been created as an alternative to the MapReduce-like system that Yandex has been using since 2008. We had to increase its efficiency, accessibility and scalability. The task was complicated the huge amount of legacy code of clients with which it was necessary to maintain compatibility, as well as the presence of universally accepted open alternatives (eg. Hadoop platform). Since YT was conceived on the principle of “bigger than MapReduce”, a set of reusable components stand out in its design: a subsystem of distributed consensus and replication status, metadata tree, blob storage, and others. In this talk, I will give a brief overview of the new system’s architecture, tell about several key components, and share experience gained in the development and implementation process. In conclusion, I will name the priority directions of YT’s further development.