--- layout: handbook-page-toc title: "Production Architecture" --- Our GitLab.com core infrastructure is primarily hosted in Google Cloud Platform's (GCP) `us-east1` region (see [Regions and Zones](https://cloud.google.com/compute/docs/regions-zones/))—and we use GCP iconography in our diagrams to represent GCP resources. We do have dependencies on other cloud providers for separate functions. Some of the dependencies are legacy fragments from our migration from Azure, and others are deliberate to separate concerns in the event of cloud provider service disruption. We're currently working to implement a [Disaster Recovery](/handbook/engineering/infrastructure/library/disaster-recovery/) solution that redesigns our failure scenarios across multi-zone, multi-region, and multi-cloud architectures. This document does not cover servers that are not integral to the public facing operations of GitLab.com. ## On this page {:.no_toc .hidden-md .hidden-lg} - TOC {:toc .hidden-md .hidden-lg} ## Other Related Pages - [Application Architecture documentation](https://docs.gitlab.com/ee/development/architecture.html) - [GitLab.com Settings](/gitlab-com/settings/) - [Monitoring of GitLab.com](/handbook/engineering/monitoring/) - [GitLab performance monitoring documentation](https://docs.gitlab.com/ee/administration/monitoring/performance/introduction.html) - [Performance of the Application](/handbook/engineering/performance/) - [Gemnasium Service Production Architecture](/handbook/engineering/dev-backend/production-architecture/gemnasium-service.html) - [CI Service Architecture](ci-architecture.html) - [dev.gitlab.org Architecture](supporting-architecture.html#dev-gitlab-org) - [ops.gitlab.net Architecture](supporting-architecture.html#ops-gitlab-net) - [version.gitlab.com Architecture](../../dev-backend/production-architecture/version-architecture.html) ## Current Architecture {: #infra-current-archi-diagram}

[Source](https://www.draw.io/?state=%7B%22ids%22:%5B%221JVdcF29Nb3fYTZ57YL7xIbFlWm6fW9lC%22%5D,%22action%22:%22open%22,%22userId%22:%22109983779601367693333%22%7D#G1JVdcF29Nb3fYTZ57YL7xIbFlWm6fW9lC) ### Service Architecture

[Source](https://www.draw.io/?state=%7B%22ids%22:%5B%221MopYWzl0XDDUmEibEpTOAj2Jhto3dULp%22%5D,%22action%22:%22open%22,%22userId%22:%22109983779601367693333%22%7D#G1MopYWzl0XDDUmEibEpTOAj2Jhto3dULp) ### Database Architecture

[Source](https://docs.google.com/drawings/d/1BWb1Q-hJzCZs8krvYwi5V9F_hJe-4CJdtIORfVGWJLo/edit), GitLab internal use only ### Storage Architecture

[Source](https://docs.google.com/drawings/d/1ObQRiSf1n6xLTTQdwt8OETP77EIrQzjl4-BhdZ8zp3U/edit), GitLab internal use only ### Monitoring Architecture

[Source](https://docs.google.com/drawings/d/1ffCoExVFbSOLu-aOVbM1I0FQ9EmU-vGDtGWQHM9-uj0/edit), GitLab internal use only ### Network Architecture

[Source](https://drive.google.com/file/d/19-IMmcJHVUz_bWOXU7_1NoYOdQJEZ3lM/view?usp=sharing), GitLab internal use only Our network infrastructure consists of networks for each class of server as defined in the Current Architecture diagram. Each network contains a similar ruleset as defined above. We currently peer our ops network. Inside of this network is most of our monitoring infrastructure where we allow InfluxDB and Prometheus data to flow in order to populate our metrics systems. For alert management, we peer all of our networks together such that we have a cluster of alert managers to ensure we get alerts out no matter a potential failure of an environment. No application or customer data flows through these network peers. ### DNS & WAF We host our DNS with cloudflare (gitlab.com, gitlab.net) and route53 (gitlab.io and others). Route53:

[Source](https://drive.google.com/file/d/1NPr1NI-pU1UKr4zJTyPwKJoFoLIL2Yda/view?usp=sharing), GitLab internal use only Cloudflare:

GitLab.com zone

GitLab.com

GitLab.com SSH

GitLab.com AltSSH

[Source](https://gitlab.com/gitlab-com/gl-infra/readiness/-/blob/6f92124563835415e5c6e59f40b32e7307d3fb67/cloudflare/README.md#with-cloudflare) ### Chef Architecture

[Source](https://drive.google.com/file/d/1T2EXku49xlZ5cctatG_ygnCTfj3vGsZn/view?usp=sharing), GitLab internal use only ## Host Naming Standards ### Hostnames A hostname shall be constructed by using the service offered by that node, followed by a dash, and a two digit incrementing number. i.e.: `sidekiq-NN`, `git-NN`, `web-NN` Service specific identifiers, when it connotes a difference in build or function, will be identified as `-specific` and precede the two digit numeric i.e.: `sidekiq-realtime-01` ### Service Tiers Following the hostname shall be the service tier that the node belongs in: - `sv` for Service - `lb` for Load Balancer - `db` for Database Nodes - `inf` for Infrastructure Nodes ### Environments Following the service tier shall be the environment: - `gprd` for Production - `gstg` for Staging - `pre` for PreProd - `ops` for Operations - `dev` for Development ### TLD Zones When it comes to DNS names all services providing GitLab as a service shall be in the `gitlab.com` domain, ancillary services in the support of GitLab (i.e. Chef, ChatOps, VPN, Logging, Monitoring) shall be in the `gitlab.net` domain. ## Internal Networking Scheme We leverage the use of VPC's greatly. You can see how we configure these for each of our environments and servers in our [terraform repo](https://gitlab.com/gitlab-com/gitlab-com-infrastructure). ### Remote Access Access is granted to only those whom need access to production. At this point in time we utilize bastion hosts. Instructions for requesting and using the bastion hosts can be found in the `*-bastions.md` files in our [runbooks](https://gitlab.com/gitlab-com/runbooks/-/tree/master/docs/bastions). ## Secrets Management GitLab utilizes two different secret management approaches, GKMS for machine in side of Google Cloud Provider, and Chef Encrypted Data Bags for all other host secrets. ### GKMS Secrets Secrets are divided up based upon the Chef role that will be requiring them (i.e. Load Balancers, Sidekiq, Storage) and are arranged in JSON files. The JSON files are encrypted and stored in Google Cloud Storage (GCS) with access restriction being limited to the environment consuming the keys (i.e. production servers only have access to the production GCS storage bucket). The JSON files are encrypted with GKMS keys that are managed by the GKMS service. #### Node Secret Execution When a node performs a chef run it pulls the encrypted JSON file out of GCS, makes a request to the GKMS system as the node, requesting a key to decrypt an object, and since the nodes have permission the JSON file is decrypted and read into memory of the current Chef process, making it available for Chef parsing where the secrets are applied to templates and scripts. Keys are auto-rotated every 90 days. ### Chef Encrypted Data Bags Secrets are again divided up based upon the Chef role that will be requiring them and are arranged in JSON structured files. These files are then encrypted and signed with the individual Chef administrator keys, and the client node keys that need to have access. #### Node Secret Execution During a Chef run the client node requests the encrypted data bag from the Chef server, uses it's own private key to decrypt the contents, and then applies them to the configuration templates and scripts. Keys are manually rotated roughly every 90 days or whenever we make an change to the Chef administrators, whichever comes first. ## Azure Azure is where we have lingering infrastructure. Remaining servers exist here for a wide variety of reasons. * Testing/Comparisons with old infrastructure changes * [Dev](https://dev.gitlab.org) * [Customers](https://customers.gitlab.com) ## Digital Ocean Digital Ocean houses several servers that do not need to directly interact with our main infrastructure. * Chef Configuration Management Servers * Kerberos * Our backup environment for CI Runners * [Forum](https://forum.gitlab.com) * [Quality Insights](http://quality-dashboard.gitlap.com) ## AWS We host our DNS with route53 and we have several EC2 instances for various purposes. The servers you will interact with most are listed below: * [License](https://license.gitlab.com) * [Package](https://packages.gitlab.com) * [Redash](https://redash.gitlab.com) * [Version](https://version.gitlab.com) ## Monitoring See how it's doing, for more information on that, visit the [monitoring handbook](/handbook/engineering/monitoring/). ## Technology at GitLab We use a lot of cool ([but boring](/handbook/values/)) technologies here at GitLab. Below is a non-exhaustive list of tech we use here. * [Chef](https://www.chef.io/chef/) * [Consul](https://www.consul.io) * [ELK Stack](https://www.elastic.co/products) - Running as [managed Elasticsearch on GCP](https://www.elastic.co/gcp) * [PostgreSQL](https://www.postgresql.org/) * [Prometheus](https://prometheus.io/) * [Redis](https://redis.io/) * [Ruby](https://www.ruby-lang.org/) (probably goes without saying) * [Terraform](https://www.terraform.io) ## Proposed Cloud Native Architecture {: #infra-proposed-cloud-native} We are working on running GitLab.com on Kubernetes by containerizing all the different services and components that are necessary to run GitLab-EE at GitLab.com scale. This is the proposed architecture to move from what we are running in static VMs to a container orchestration managed world. ### Pods Definition {: #infra-proposed-archi-pods}

[Source](https://docs.google.com/drawings/d/1BL9hjUUvnZarjO-f_ENoCKdlSTX_MGWXVbZSjkjEd04/edit), GitLab internal use only