skip to main content
Primo Search
Search in: Busca Geral

Parallelizing Git Checkout: a case study of I/O parallelism on desktop applications

Bernardino, Matheus Tavares

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Matemática e Estatística 2022-07-13

Acesso online. A biblioteca também possui exemplares impressos.

  • Título:
    Parallelizing Git Checkout: a case study of I/O parallelism on desktop applications
  • Autor: Bernardino, Matheus Tavares
  • Orientador: Lejbman, Alfredo Goldman Vel
  • Assuntos: Git; Sistemas De Controle De Versões; Paralelismo Em E/S; Sistemas De Arquivos Em Rede; Programação Paralela; Git; Parallel Programming; Parallel I/O; Network File Systems; Version Control Systems
  • Notas: Dissertação (Mestrado)
  • Descrição: A version control system (VCS) is a tool that tracks and manages the changes made to a set of files over time. More broadly, VCS tools can also help to shape and manage collaboration flows, find and fix bugs, remember the motivations behind a given code change, etc. Although these tools can typically track any type of data, version control systems bring huge benefits to software projects and, as a result, have become standard practice in this field. Among the VCS tools available today, Git is the most popular among developers. This tool is currently being used to version control a variety of repositories, from small personal projects of a few megabytes in size to massive corporate repositories with more than 300 GB and 3.5 million files. For that reason, speed and scalability are among the top priorities for the Git development community. However, the performance of the tool sometimes falls short of what is desired on networked file systems (NFS), where input and output (I/O) operations tend to be more costly. In particular, one Git operation that suffers from these costs is checkout, which is responsible for restoring files from specific versions of a project. Various optimizations were employed on code related to the checkout operation over the years, but the sequential processing of files still carried a large time penalty for NFS, as well as being suboptimal for local file systems on SSDs. In this project, we worked to parallelize the Git checkout machinery, resulting in speedups of up to 4.5x on NFS and 3.6x on SSDs. We also study how parallelism affects the I/O tasks performed by the checkout operation on different machines and storage devices. The parallel checkout feature was incorporated into the upstream Git repository and made available to all users of the tool since version 2.32.0, which was released in June 2021.
  • DOI: 10.11606/D.45.2022.tde-31082022-210254
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Matemática e Estatística
  • Data de criação/publicação: 2022-07-13
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.