--- layout: markdown_page title: Product Direction - Telemetry --- ## On this page {:.no_toc} - TOC {:toc} ## Direction for telemetry Telemetry manages a variety of technologies that are important for GitLab's understanding of how our users use our products. These technologies include but are not limited to: Snowplow Analytics and an in-house tool called Usage Ping which is hosted on `version.gitlab.com` and includes a separate service called[ Version Check](https://docs.gitlab.com/ee/user/admin_area/settings/usage_statistics.html#version-check-core-only). If you'd like to discuss this vision directly with the Product Manager for Telemetry, feel free to reach out to Sid Reddy via [e-mail](mailto:sreddy@gitlab.com). The primary purpose of telemetry is to help us build a better Gitlab. Data about how Gitlab is used is collected to better understand what parts of Gitlab needs improvement and what features to build next. Telemetry also helps our team better understand the reasons why people use Gitlab and with this knowledge we are able to make better product decisions. The overall vision for the Telemetry group is to ensure that we have a robust, consistent and modern telemetry data framework in place to best serve our internal Product, Finance, Sales, and Customer Success teams. The group also ensures that GitLab has the best visualization and analysis tools in place that allows the best possible insights into the data provided through the various collection tools we utilize. | Category | Description | | ------ | ------ | | [🧺 Collection](/direction/telemetry/collection) | The structure(s) and platform(s) of how we collect Telemetry data | | [🔍 Analysis](/direction/telemetry/analysis) | Manages GitLab's internal needs for an analysis tool that serves the Product department | ## Dashboards Dashboards at GitLab have been created for the team to have better insights into how various parts of the company are performing. Today in Periscope we have dashboards that range from GitLab's overarching company KPIs, financial performance, Customer success, Sales forecasts to Product focused SMAU dashboards. ### How to: Build Your Own Dashboard in Periscope Please reference the “[Self-Serve Analysis in Periscope](https://about.gitlab.com/handbook/business-ops/data-team/periscope/#self-serve-analysis-in-periscope)” handbook page for a thorough step by step guide Quick steps to get-started: * The first step to building your own Periscope dashboard is checking that you have the correct permissions. * After logging in to Periscope using Okta, you should see a New Chart button in the top right corner. If you don’t see anything, you only have View-only access and you should follow the instructions above to gain Editor access. * Once you can see New Chart, you can start creating your own dashboards! Find Dashboards on the left side nav bar and click the + icon to build one. Make sure to give it a name in order to keep our directory organized. * Once your dashboard is built and named, you can start adding charts by clicking New Chart in the top right. Now you’re ready to start writing queries. ### How To: Request A Dashboard * To request a new dashboard or report please use [this link](https://gitlab.com/gitlab-data/analytics/issues/new). An issue will be added to the data team’s issue board. Please label the issue with ‘Growth’ and mention ‘Eli Kastelein’ and ‘Mathieu Peychet’ for visibility and prioritization. ### Data Fields Available for Tracking Today We currently have a variety of data available for reporting today. * On GitLab.com, Snowplow is set up to track frontend interactions such as page views, clicks and sessions. Engineers can also add additional snowplow custom tracking through either the frontend (Javascript) or the backend (Ruby) - [Read more about event tracking here](https://docs.gitlab.com/ee/development/fe_guide/event_tracking.html#tracking-in-raw-javascript) * Also for GitLab.com, the postgres database is loaded into our data warehouse which allows us to analyze the usage of all of GitLab’s great features. * [Usage Ping](https://docs.gitlab.com/ee/user/admin_area/settings/usage_statistics.html) gives us high-level activity data about our self-managed instances on a weekly cadence. ### What do I do if the event I need to track is not available today? If you do not see an event in the table above we have outlined a best practices guide for you to implement tracking [here](https://docs.gitlab.com/ee/development/fe_guide/event_tracking.html#event-tracking) In this we cover: * Custom Event tracking * Tracking in HAML or Vue templates * Tracking in raw Javascript * Toggling tracking on or off ## Tracking and Instrumentation Overview Up until now, the GitLab codebase has been optimized for the application. Now, we need to optimize the codebase for analytics. Today we are using a few different systems to track users and usage in our product. Those systems are Snowplow and Usage Ping. Below we’ve broken down the best way to gain insights on both GitLab.com and the Self-hosted versions of GitLab. ### What do we have in place today and where are we headed?** | **Driver** | **GitLab.com** | **Self-Managed** | | :---: | :---: | :---: | | % of Revenue | 10% | 90% | | MAU | ~750K | Millions (paid and CE ~5.5M) | | Ease of Data Collection | Easy | Complicated | | Data Sources Today | GitLab.com db
Snowplow (basic) | Usage Ping | | Data Sources in Future | GitLab.com db
Snowplow (enhanced) | Usage Ping| | Opt-out? | TBD | Yes | ### GitLab.com Instrumentation The primary system to extract insights from GitLab.com is Snowplow. This system allows us to track user level events which includes frontend, backend and custom events. Snowplow also provides a level of flexibility for us to manage historical data. ### Self-hosted Instrumentation The primary system used to extract insights from GitLab's Self-hosted offering is Usage Ping. This system uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific or personal data. The information from usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional. ## Telemetry Technologies & Services In this section we will explain the various types of technologies and services we leverage to support and provide data insights and visualizations that help tell a story about a products usage and answer questions pertinent to building world class products. We will breakdown Usage Ping, Snowplow, Snowflake and Pendo. ### Usage Ping _Status: in production ready for use_ _Impacts Self-hosted and GitLab.com_ GitLab sends a weekly payload containing usage data to GitLab Inc. The usage ping uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific data. The information from the usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional, and any instance can disable analytics. The usage data is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. In addition to row counts, there are many boolean flags indicating which features the instance has enabled. The payload also tells us what version of GitLab the instance is currently running and how many users are active on the instance. Related Usage Ping Links: * [Feature Instrumentation](https://about.gitlab.com/handbook/product/feature-instrumentation/) * [usage_data.rb](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/lib/gitlab/usage_data.rb) #### Usage Ping limitations * Usage Ping does not track frontend events things like page views, link clicks, or user sessions and only focuses on aggregated backend events. * Because of these limitations we recommend instrumenting your products with Snowplow for more detailed analytics on Gitlab.com and Usage Ping to track aggregated backend events on Self-Hosted. ### Snowplow _Status: in production ready for use_ _Impacts GitLab.com only_ Snowplow is an enterprise-strength marketing and product analytics platform. It does three things: 1. Identifies your users, and tracks the way they engage with your website or application 2. Stores your users' behavioral data in a scalable "event data warehouse" you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres (we use Postgres) 3. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Superset, Re:dash to analyze that behavioral data. (we use Periscope) _Snowplow technology 101_ The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats: To briefly explain these six sub-systems: ![snowplow_flow](/uploads/eef8d839ff7a91e509fe277f2736cbaf/snowplow_flow.png) * Trackers fire Snowplow events. Currently Snowplow has 12 trackers, covering web, mobile, desktop, server and IoT * Collectors receive Snowplow events from trackers. Currently we have three different event collectors, sinking events either to Amazon S3, Apache Kafka or Amazon Kinesis * Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have a Hadoop-based enrichment process, and a Kinesis- or Kafka-based process * Storage is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift and Postgres databases * Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We have data models for Redshift and Looker * Analytics are performed on the Snowplow events or on the aggregate tables. ### Snowflake _Status: in production ready for use_ _Houses both data from GitLab.com and Self-Hosted_ Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings. Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities. ## Priorities ### Collection Priorities #### Current focus | Priority | Focus | Why? | | :------: | ------ | ------ | | 1️⃣ | [Enhance Usage Ping to Measure AMAU/SMAU](https://gitlab.com/gitlab-org/telemetry/issues/308) | After this epic is closed, we should be able to measure usage activity from Usage Ping and have an internally consistent view of that data between self-managed and .com.| #### Next up | Priority | Focus | Why? | | :------: | ------ | ------ | | 2️⃣ | [Harden Usage Ping](https://gitlab.com/gitlab-org/telemetry/-/issues/335) | As our organization grows, we require better data to inform our product, marketing, and sales team as they make decisions to grow the business and realize our strategic goals. Usage Ping is a key provider of that data but Usage Ping in it's current state is fragile. If another stage team adds a complex counter to Usage Ping, we need to ensure the entire Usage Ping does not break. This parent issue will serve as the aggregation of issues required to improve the performance and stability of Usage Ping, so we can have a world-class data platform at GitLab. | | 3️⃣ |[Telemetry Documentation & Guides](https://gitlab.com/gitlab-org/telemetry/issues/307)| It is important that as we roll out new changes and develop processes and workflows, we clearly and transparently document everything in a way that is easily discoverable and digestible by both GitLab team members and users/customers. | ### Analysis Priorities #### Current focus | Priority | Focus | Why? | | :------: | ------ | ------ | | 1️⃣| [AMAU/SMAU Dashboards](https://gitlab.com/groups/gitlab-org/-/epics/1325) | Parallel to enabling the measurement of [AMAU/SMAU](https://gitlab.com/gitlab-org/telemetry/issues/308), it's important that we are capturing the right metrics for each stage so that each Product Manager has visibility into how users are using their stage and stage categories. | #### Next up | Priority | Focus | Why? | | :------: | ------ | ------ | | 2️⃣ |[Explore Alternate Analytics Tools for Product](https://gitlab.com/gitlab-org/telemetry/issues/303)| Product team members have a hard time having their needs met by Periscope, there is a need for the data to be explored easily without the need for SQL or getting a dashboard built. | ## How we prioritize We follow the same prioritization guidelines as the [product team at large](https://about.gitlab.com/handbook/product/product-management/process/#prioritization). Issues tend to flow from having no milestone, to being added to the backlog, to a directional milestone (e.g. Next 3-4 releases), and are finally assigned a specific milestone. Our entire public backlog for Telemetry can be viewed [here](https://gitlab.com/groups/gitlab-org/-/issues?label_name%5B%5D=group%3A%3Atelemetry&scope=all&sort=popularity&state=opened&utf8=%E2%9C%93), and can be filtered by labels or milestones. If you find something you are interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!