--- layout: markdown_page title: "Database Sharding Working Group" --- ## On this page {:.no_toc} - TOC {:toc} ## Attributes | Property | Value | |-----------------|-----------------| | Date Created | February 11, 2020 | | Target End Date | August 11, 2020| | Slack | [#wg_database-sharding](https://gitlab.slack.com/archives/CTNSZFHEZ) (only accessible from within the company) | | Google Doc | [Database Sharding Working Group Agenda](https://docs.google.com/document/d/1_sI-P2cLYPHlzDiJezI0YZHjWAC4BSKJ8aL0cNduDlo/edit#) (only accessible from within the company) | | Issue Board | TBD | ## Business Goal Database sharding and partitioning will improve availability, scalability and performance. Sharding will also allow us to enable a path forward for data isolation. As we continue to investigate implementations and technologies, we will test our hypotheses and add more detail about improvements in the areas below, listed in priority order. 1. Availability - the database will no longer be a single point of failure as it is today. Sharding will allow us to spread data across multiple servers and better isolate database outages 1. Scalability - sharding will allow us to horizontally scale at the database tier 1. Performance - partitioning will provide performance enhancements in several identified areas such as search and audit log tables ## Development Plan The rollout of PostgreSQL 11 to GitLab.com is being done in parallel with the research and development for partitioning and subsequently sharding. Our approach is to first deliver an MVC implementation of partitioning and use the knowledge and lessons learned to better inform our sharding approach. In PostgreSQL, sharding is built on top of partitioning. By starting with partitioning first we can remove the infrastructure concerns and focus on the implentation details and potential complications that we may encounter. Once we have a working partitioning implementation we can advance to a sharding solution. ### Development Plan Caveats - Data Isolation - our initial investigations into how to partition data are focused on [Range, List and Hash Partitioning](https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW). We are identifying a [tenancy model](https://gitlab.com/gitlab-org/gitlab/-/issues/196224) in support of the partitioning models listed above, however we are not currently exploring data isolation per tenant. - CockroachDB - CockroachDB has been mentioned as a possible technology solution for our eventual sharding implementation. It is not currently considered a viable option for our needs at this point in time. There are concerns about scale and support for features we use in PostgreSQL. More details and comments in this issue: [Support CockroachDB as a backing store](https://gitlab.com/gitlab-org/gitlab/-/issues/24143) ## Exit Criteria - Infra: [PostgreSQL 11 deployed on GitLab.com](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/106) - April - Distribution: [Add support for PostgreSQL 11](https://gitlab.com/groups/gitlab-org/-/epics/2414) (13.0) - Deploy MVC partition (Dependent on PG11 Deployment) - Define partition key (may be different than [tenancy model](https://gitlab.com/gitlab-org/gitlab/-/issues/196224) for MVC) - Identify MVC [candidate](https://gitlab.com/gitlab-org/gitlab/-/issues/201871) for partitioning implementation - Implement partitioning MVC - Document process to enable backend teams to implement their own partitioning solution going forward - Measure results - Implement sharding strategy (Informed by partitioning implementation) - [Explore CitusDB as a sharding solution](https://gitlab.com/gitlab-org/gitlab/issues/207833) - Identify shard key (e.g. [Range, List, Hash](https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW)) - Implement and Demonstrate POCs - Gather feedback and metrics from POCs - Roll out sharding implementation ## Specific Lines of Enquiry - [Upgrade to PostgreSQL 11 timeline](https://gitlab.com/groups/gitlab-org/-/epics/2184) - [Infrastructure - Upgrade to PostgreSQL 11](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/6505) - Testing PostgreSQL 11 upgrade - [Database Paritioning](https://gitlab.com/groups/gitlab-org/-/epics/2023) - [Database Sharding](https://gitlab.com/groups/gitlab-org/-/epics/1854) ## Roles and Responsibilities | Working Group Role | Person | Title | |------------------------------------------|---------------------------------|------------------------------------------| | Executive Stakeholder | Christopher Lefelhocz | Senior Director of Development | | Facilitator | Craig Gomes | Engineering Manager, Database | | DRI for Database Sharding | Craig Gomes | Engineering Manager, Database | | Functional Lead | Nailia Iskhakova | Software Engineer in Test | | Functional Lead | Josh Lambert | Senior Product Manager, Geo | | Functional Lead | Gerardo "Gerir" Lopez-Fernandez | Engineering Fellow, Infrastructure | | Functional Lead | Stan Hu | Engineering Fellow, Development | | Functional Lead | Andreas Brandl | Staff Backend Engineer, Database | | Member | Chun Du | Director of Engineering, Enablement | | Member | Pat Bair | Senior Backend Engineer, Database | | Member | Joanna Shih | Quality Engineering Manager, Ops/CI/CD |