skip to main content
Primo Search
Search in: Busca Geral

HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

De, K ; Klimentov, A ; Maeno, T ; Mashinistov, R ; Novikov, A ; Poyda, A ; Tertychnyy, I ; Wenaus, T

Journal of physics. Conference series, 2017-10, Vol.898 (5), p.52018 [Periódico revisado por pares]

Bristol: IOP Publishing

Texto completo disponível

Citações Citado por
  • Título:
    HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies
  • Autor: De, K ; Klimentov, A ; Maeno, T ; Mashinistov, R ; Novikov, A ; Poyda, A ; Tertychnyy, I ; Wenaus, T
  • Assuntos: Bioinformatics ; Computer networks ; Data processing ; Deoxyribonucleic acid ; Distributed processing ; DNA ; Gene sequencing ; Genomes ; Leadership ; Physics ; Software ; Software development tools ; Supercomputers
  • É parte de: Journal of physics. Conference series, 2017-10, Vol.898 (5), p.52018
  • Descrição: PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
  • Editor: Bristol: IOP Publishing
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.