Microsoft isn’t the only company that’s interested in scaling Git.
That plan appears to be going well. Yesterday, the company announced that GitHub was adopting its modifications and that the two would be working together to bring suitable clients to macOS and Linux.
Microsoft wanted to move to Git because of Git’s features, like its easy branching and its popularity among developers. But the transition faced three problems. Git wasn’t designed for such vast numbers of developers—more than 20,000 actively working on the codebase. Also, Git wasn’t designed for a codebase that was so large, either in terms of the number of files and version history for each file, or in terms of sheer size, coming in at more than 300GB. When using standard Git, working with the source repository was unacceptably slow. Common operations (such as checking which files have been modified) would take multiple minutes.
The company’s solution was to develop Git Virtual File System (GVFS). With GVFS, a local replica of a Git repository is virtualized such that it contains metadata and only the source code files that have been explicitly retrieved. By eliminating the need to replicate every file (and, hence, check every file for modifications), both the disk footprint of the repository and the speed of working with it were greatly improved. Microsoft modified Git to handle this virtual file system. The client was altered so that it didn’t needlessly try to access files that weren’t available locally and a new transfer protocol was added for selectively retrieving individual files from a remote repository.
Internally, this proved successful, with Windows development being substantially migrated to Git in May of this year. But what of the broader Git community?
Microsoft says that, so far, about half of its modifications have been accepted upstream, with upstream Git developers broadly approving of the approach the company has taken to improve the software’s scaling. Redmond also says that it has been willing to make changes to its approach to satisfy the demands of upstream Git. The biggest complexity is that Git has a very conservative approach to compatibility, requiring that repositories remain compatible across versions.
GitHub’s interest and involvement is motivated by the company’s desire to address the needs of enterprise customers. The open source, free GitHub hosting doesn’t need the scaling work Microsoft has done—obviously, if someone is using standard Git, today then standard Git must be good enough for their development process. But on the paid, enterprise side, the situation can be a little different. Certain industries have large repositories that pose problems with Git; for example, game repositories are often physically large not because they have millions of files and decades of history, but because of their large number of graphics and other assets. The scaling improvements that Microsoft has made to Git are useful for this kind of large repository, too. As such, having the same family of improvements available in GitHub will enable the company to better serve these communities.
Microsoft itself has had similar demands from enterprise; the company told us that Siemens wanted to move away from the Team Foundation Server version control to using Git instead. But it’ll only be able to do this once the scaling improvements had been made; right now, TFS version control scales better.
As the name would imply, GVFS requires a file system driver to work. The Windows division worked with the engineering team to add features to Windows to make this efficient. The intent is to eventually make this capability into a supported, extensible API and, at some point, move systems such as the new OneDrive placeholders to use the same API.
Microsoft and GitHub are also working to bring similar capabilities to other platforms, with macOS coming first, and later Linux. The obvious way to do this on both systems is to use FUSE, an infrastructure for building file systems that run in user mode rather than kernel mode (desirable because user-mode development is easier and safer than kernel mode). However, the companies have discovered that FUSE isn’t fast enough for this—a lesson Dropbox also learned when developing a similar capability, Project Infinite. Currently, the companies believe that tapping into a macOS extensibility mechanism called Kauth (or KAuth) will be the best way forward.