---
layout: handbook-page-toc
title: "Performance and Scalability"
---

## On this page
{:.no_toc .hidden-md .hidden-lg}

- TOC
{:toc .hidden-md .hidden-lg}

The Quality Department has a focus on measuring and improving the performance of GitLab, as well as
creating and validating reference architectures that self-managed customers can rely on as
performant configurations.

## Reference Architectures

To ensure that self-managed customers have performant, reliable, and scalable on-premise
configurations, the Quality Department is creating several reference architectures. Our goal is to
provide tested and verified examples to customers which can be used to ensure good performance and
give insight into what changes need to be made as organizations scale.

| Users | Status           | Link to more info |
|-------|------------------|-------------------|
| 2.5k  | To Do (Q4)       | [Issue link](https://gitlab.com/gitlab-org/quality/performance/issues/58) |
| 10k   | Complete         | [Documentation link](https://docs.gitlab.com/ee/administration/high_availability/README.html#reference-architecture) |
| 25k   | In Progress (Q3) | [Issue link](https://gitlab.com/gitlab-org/quality/performance/issues/57) |
| 50k   | In Progress (Q3) | [Issue link](https://gitlab.com/gitlab-org/quality/performance/issues/66) |

## Performance Toolkit

We have created the [GitLab Performance Toolkit](https://gitlab.com/gitlab-org/quality/performance)
which measures the performance of various endpoints under load as well as web rendering performance
using SiteSpeed. This toolkit is in use internally within GitLab, but it is also available for
self-managed customers to set up and run in their own environments.

If you have a self-managed instance and you would like to use the Toolkit to test its performance,
please take a look at the documentation in the
[Toolkit's README file](https://gitlab.com/gitlab-org/quality/performance/blob/master/README.md).

### Daily Testing Process

Once a day, the GitLab Performance Toolkit is run against the existing reference architecture using
a recent or the latest release of GitLab. This allows us to catch and triage degradations early in
the process so that we can try to implement fixes before a new release is created. If problems are
found, issues are created for degraded endpoints and are then prioritized during the weekly
[Availability & Performance Grooming](../../#availability-and-performance-grooming) meeting.

### Testing Results

The latest results against our various testing environments are automatically posted to
[a wiki page in the Performance project](https://gitlab.com/gitlab-org/quality/performance/wikis/Benchmarks/Latest).

In Q3, the Quality Department has a goal of automating the testing process so that each new monthly
release is tested and compared to the release before it. Work on this project is ongoing and is
prioritized after the creation of the 25k and 50k reference environments described above. You can
track progress on this quarterly goal using
[our OKR issue](https://gitlab.com/gitlab-com/www-gitlab-com/issues/4852).

### Expanding the Toolkit

The endpoint coverage of the load tests in our Toolkit is not yet comprehensive. We have done a
review of our common endpoints with an eye towards spotting the most highly used ones as well as the
slowest ones. Issues have been created for our team to add these to the Toolkit, and we expect the
addition of some of these will surface degraded endpoints which we'll need to send through
performance grooming as defined in the [Daily Testing Process](#daily-testing-process).

Additionally, the analysis that was performed was ad-hoc and we would like to define a process for
conducting a review on some regular cadence, whether that is after every release, once a quarter, or
some other timing. Because GitLab is constantly expanding and evolving, we need to iterate on our
coverage in tandem.

We've created [an epic](https://gitlab.com/groups/gitlab-org/quality/-/epics/10) to track the
initial expansion as well as the work defining our recurring process for analyzing endpoints and
verifying our coverage is adequate.

## Performance Playbook

When self-managed customers experience or suspect they are experiencing performance issues, we have
developed a playbook for initial steps to investigate the problem.

The first step is requesting logs. We use a tool called fast-stats in conjunction with the following
log artifacts. These logs should be either rotated, or logs from a peak day after peak time.

- `production_json.log`
- `api_json.log`
- Gitaly logs: `/var/log/gitlab/gitaly/current`
- Sidekiq logs: `var/log/gitlab/sidekiq/current`