--- layout: markdown_page title: "Category Direction - Gitaly" --- - TOC {:toc} ## Gitaly | Section | Stage | Maturity | Last Reviewed | | --- | --- | --- | --- | | [Dev](/direction/dev/) | [Create](https://about.gitlab.com/stages-devops-lifecycle/create/) | Non-marketable | 2020-03-12 | ## Introduction and how you can help The Gitaly direction page belongs to the [Gitaly](/handbook/product/categories/#source-code-group) group of the [Create](/direction/create) stage, and is maintained by [James Ramsay](https://gitlab.com/jramsay). This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision. - [Issue List](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=group%3A%3Agitaly) - [Epic list](https://gitlab.com/groups/gitlab-org/-/epics?label_name[]=group%3A%3Agitaly) ## Overview Gitaly is a Git RPC service for handling all the Git calls made by GitLab. Until mid 2018, GitLab application relied on direct disk access to Git repositories, performing Git operations with either Rugged (libgit2 wrapper) or by shelling out to Git directly. At scale, this meant using NFS to make the repositories available to every application server. NFS adds latency and has opaque failure modes which are hard to debug in production. Furthermore, using multiple interfaces for Git makes instrumentation and caching difficult. In late 2016 GitLab began building Gitaly, a gRPC service that would become the interface through which the GitLab application interacts with Git repositories, and in mid 2018 GitLab completed this process for GitLab.com and unmounted NFS from GitLab.com application servers.
### Target Audience **Systems Administrators** directly interact with Gitaly when installing, configuring, and managing a GitLab server, particularly when high availability is a requirement. Today systems administrator must create and manage an NFS cluster to configure a high availability GitLab instance, and manual manage the failover to new Gitaly nodes mounted on the same NFS cluster. Once a HA Gitaly reaches minimal viability, it will be possible to eliminate the NFS cluster from architecture and rely on Gitaly for replication. At HA Gitaly continues to mature, automatic failover, automatic Gitaly node rebalancing and horizontal scaling read access across replicas will deliver 99.999% uptime (five 9's) and improved performance without regular intervention. Systems Administrators will have fewer applications to manage as other version control systems are retired as the last projects are migrated to GitLab. **Developers** will benefit from increasing performance for repositories of all shapes and sizes, on the command line and in the GitLab application as performance improvements continue. Once support for monolithic repositories reaches minimal and continues maturing, developers will no longer be split between Git and legacy version control systems, as projects consolidate increasingly on Git. Developers that heavily use binary assets, like **Game Developers**, will at long last be able to switch to Git and eliminate Git LFS by adopting native large file support in Git. ## Where we are Headed Gitaly is responsible for access to, and the availability of Git repositories, and the performance of Gitaly directly influences the experience of using GitLab. This includes performing code reviews, browsing repositories, the speed to CI jobs, and the performance of push and fetch Git operations. The performance of Gitaly is reliably good in many situations, but poor disk performance, very large repositories, poor Git access patterns are a problem (GitLab is working to address known performance regressions when using NFS, which are exacerbated by bad access Git patterns). Many exciting opportunities to significantly improve performance exist through improving how we use Git (configuration), improving Git, implementing features like deduplicated forks, caching and improving Git access patterns. Performance improvements to Gitaly benefit both the Git interface and GitLab application. Native support for high availability will also allow horizontally scaling Git read operations for better distributed CPU usage and further performance improvements. The performance and availability of Gitaly is matter of importance for GitLab Administrators who are responsible to their organizations for the performance and availability of GitLab, of which Gitaly is a critical component. The inability to access Git repositories on a GitLab server is an outage event, and for a large instance would prevent thousands of people from doing their job. Today Gitaly depends on external systems, like NFS, to achieve high availability, but in the future Gitaly will be natively highly available, replicating repositories to many Gitaly nodes and will be able to recover automatically from node and repository level failures automatically preventing extended outages caused by disk failures, server failures, or zone outages. Git is the market leading Version Control System (VCS), but many organizations with extremely large projects continue to use centralized version control systems like CVS, SVN, and Perforce. Many of these smae organizations also use Git for many of their projects, but have have been unable to standardize on Git for these extremely large repositories. Gitaly and GitLab will make it possible to standardize on Git for extremely large repositories with native support for monolithic repositories and native large file support (eliminating the need for Git LFS), allow organizations to consolidate on one VCS: Git. - [HA for Gitaly](https://gitlab.com/groups/gitlab-org/-/epics/842) - [Git for enormous repositories](https://gitlab.com/groups/gitlab-org/-/epics/773) - [Performance monitoring and optimization](https://gitlab.com/groups/gitlab-org/-/epics/290) ### What's Next & Why - **In progress:** [High Availability Gitaly MVC](https://gitlab.com/groups/gitlab-org/-/epics/842) Currently there is no way to run GitLab in a HA configuration without NFS. This is preventing GitLab from being in the AWS marketplace and from running GitLab in a HA configuration in Kubernetes. - **In progress:** [Improve Gitaly shard management](https://gitlab.com/groups/gitlab-org/-/epics/2307) Large instances, like GitLab.com, require many Gitaly shards. However, the management tools are not sufficiently reliable, or fast. When rebalancing is required, this is a problem. - **In progress:** [Partial Clone for large files](https://gitlab.com/groups/gitlab-org/-/epics/958) Partial Clone will replace [Git LFS](https://git-lfs.github.com/), and allow other kinds of enormous repositories to use Git. Support for very large repositories will help customers with very large repositories to migrate from Perforce, SVN and CVS to Git. - **Next:** [Strong Consistency for Gitaly HA](https://gitlab.com/groups/gitlab-org/-/epics/1189) After shipping the eventually consistent first iteration of Gitaly HA, improving consistency is of utmost importance. ### What is Not Planned Right Now - [VFS for Git](https://gitlab.com/groups/gitlab-org/-/epics/93) GitLab is supporting the direction of the Git project to address to performance problems of working with extremely large projects through [partial clone and promisor packfiles](https://gitlab.com/groups/gitlab-org/-/epics/915). We also want to add [native large file support to Git](https://gitlab.com/groups/gitlab-org/-/epics/958). We have been supporting this work in the Git project for quite a while and it is close to reaching a point where it can be used. We do not want to split our attention between Microsoft's VFS for Git protocol and the native Git implementation, nor do we want to build support for a feature that is not in mainline Git, and requires custom driver/kernel extensions. We prefer boring solutions, like using native Git and supporting it's direction. ### Maturity Plan Gitaly is a **non-marketable** category, and is therefore not assigned a maturity level. ## Competitive Landscape Important competitors are [GitHub.com](https://github.com) and [Perforce](https://perforce.com) which, in relation to Gitaly, compete with GitLab in terms of raw Git performance and support for enormous repositories respectively. Customers and prospects evaluating GitLab (GitLab.com and self hosted) benchmark GitLab's performance against GitHub.com, including Git performance. The Git performance of GitLab.com for easily benchmarked operations like cloning, fetching and pushing, show that GitLab.com similar to GitHub.com. When comparing GitHub Enterprise to a self-hosted GitLab instance, it is important to compare like to like configurations, particularly the use of NFS. This is because NFS is known to significantly reduce Git performance. Gitaly is planned to provide high availability without NFS in 2020, providing both high performance and high availability. GitHub Enterprise does not currently offer true high availability. - [HA for Gitaly](https://gitlab.com/groups/gitlab-org/-/epics/842) Perforce competes with GitLab primarily on it's ability to support enormous repositories, either from binary files or monolithic repositories with extremely large numbers of files and history. This competitive advantage comes naturally from it's centralized design which means only the files immediately needed by the user are downloaded. Given sufficient support in Git for partial clone, and sufficient performance in GitLab for enormous repositories, existing customers are waiting to migrate to GitLab. - [Git for enormous repositories](https://gitlab.com/groups/gitlab-org/-/epics/773) ## Business Opportunity The version control systems market is expected to be valued at close to US$550mn in the year 2021 and is estimated to reach US$971.8md by 2027 according to [Future Market Insights](https://www.futuremarketinsights.com/reports/version-control-systems-market) which is broadly consistent with revenue estimates of GitHub ([$250mn ARR](https://www.owler.com/company/github)) and Perforce ([$130mn ARR](https://www.owler.com/company/perforce)). The opportunity for GitLab to grow with the market, and grow it's share of the version control market is significant. Git is the market leading version control system, demonstrated by the [2018 Stack Overflow Developer Survey](https://insights.stackoverflow.com/survey/2018/#work-_-version-control) where over 88% of respondents use Git. Although there are alternatives to Git, Git remains dominant in open source software, usage by developers continues to grow, it installed by default on macOS and Linux, and the project itself continues to adapt to meet the needs of larger projects and enterprise customers who are adopting Git, like the Microsoft Windows project. According to a [2016 Bitrise survey](https://blog.bitrise.io/state-of-app-development-2016#self-hosted) of mobile app developers, 62% of apps hosted by SaaS provider were hosted in GitHub, and 95% of apps are hosted in by a SaaS provider. These numbers provide an incomplete view of the industry, but broadly represent the large opportunity for growth in SaaS hosting on GitLab.com, and in self hosted where GitLab is already very successful. ## Analyst Landscape - [Native support for large files](https://gitlab.com/groups/gitlab-org/-/epics/958) is important to companies that need to version large binary assets, like game studios. These companies primarily use Perforce because Git LFS provides poor experience with complex commands and careful workflows needed to avoid large files entering the repository. GitLab has been supporting work to provide a more native large file workflow based on promiser packfiles which will be very significant to analysts and customers when the feature is ready. ## Top Customer Success/Sales issue(s) - [High Availability Gitaly](https://gitlab.com/groups/gitlab-org/-/epics/842) is need to allow customers to avoid needing NFS to achieve a highly available GitLab instance. The network latency of any network based file system, like NFS, EFS, Gluster, will negatively impact Git performance because of Git's disk access requirements. It is important to customers want to run an HA GitLab instance that we provide a better way. - [Gitaly HA: transactional writes](https://gitlab.com/groups/gitlab-org/-/epics/1189) will allow high availability and maximum consistency by guaranteeing a quorum of Gitaly nodes have accepted write operations before reporting a success to the client. This will make automatic fail possible with a high degree of confidence that no data loss will occur. - [Native support for extremely large repositories](https://gitlab.com/groups/gitlab-org/-/epics/915) prevents existing customers and prospects from being able to migrate enormous repositories from Perforce or SVN to Git. It is frequently requested and many organizations want to standardize on a single version control system and tool like GitLab across all projects. ## Top user issue(s) Users do not see Gitaly as a distinct feature or interface of GitLab. Git performance is the most significant user facing area where improvements are frequently requested, however the source of the performance problem can vary significantly. ## Top internal customer issue(s) - [High Availability Gitaly](https://gitlab.com/groups/gitlab-org/-/epics/842) is important to the Distribution team so that we can offer a GitLab Helm chart that supports high availability. It is also important to the Production team so that we can consider deploying GitLab.com in Kubernetes. - [Gitaly HA: transactional writes](https://gitlab.com/groups/gitlab-org/-/epics/1189) will allow high availability and maximum consistency by guaranteeing a quorum of Gitaly nodes have accepted write operations before reporting a success to the client. This will make automatic fail possible with a high degree of confidence that no data loss will occur. ## Top Vision Item(s) - [Gitaly HA: transactional writes](https://gitlab.com/groups/gitlab-org/-/epics/1189) will allow high availability and maximum consistency by guaranteeing a quorum of Gitaly nodes have accepted write operations before reporting a success to the client. This will make automatic fail possible with a high degree of confidence that no data loss will occur. - [Native support for large files](https://gitlab.com/groups/gitlab-org/-/epics/958) prevents existing customers and prospects being able to migrate repositories with large files to Git. Git LFS isn't a sufficient solution for these organisations in comparison with the native support of other version control systems. The most pressing problem is avoiding the need to download enormous amounts of data, and not having to remember to use different commands for different files so as not to make life worse for everyone.