skip to main content

Ransomware Combined Structural Feature Dataset

Moreira, Caio

Mendeley Data 2024

Texto completo disponível

Citações Citado por
  • Título:
    Ransomware Combined Structural Feature Dataset
  • Autor: Moreira, Caio
  • Assuntos: Cybersecurity ; Data Analytics Cybersecurity ; Malware Mitigation
  • Notas: 10.17632/yzhcvn7sj5
    RelationTypeNote: IsVersionOf -- 10.17632/yzhcvn7sj5
  • Descrição: This dataset contains several strutuctural features extracted of 2675 binary executable samples. The training and validation set consisted of 2157 samples (80%): 1023 ransomware belonging to 25 relevant families and 1134 goodware. Meanwhile, a testing set consisted of 518 samples (20%): 385 ransomware belonging to the 15 recent families and 133 goodware. The CSV file columns are sample ID, filename, target class (GR), family ID, and numerical columns (features), as follows: | ID | filename | GR | family | Features | Training Goodware | 10000 to 11133 | Their name.exe | 0 | 0 | Numerical features | Testing Goodware | 12000 to 12132 | Their name.exe | 0 | 0 | Numerical features | Training Ransomware | 20000 to 21022 | Their SHA-256 hash | 1 | 1-25 family IDs | Numerical features | Testing Ransomware | 22000 to 22384 | Their SHA-256 hash | 1 | 26-40 family IDs | Numerical features | Options: 1) The dataset is split into individual types of features without preprocessing, including headers, imported DLLs, function calls, entropy of sections, and 3, 4, and 5-grams opcode frequencies. 2) The combined datasets include headers, imported DLLs, function calls, and entropy of sections feature sets, with and without the 3-gram feature set, after the preprocessing step, according to our research paper entitled "A Comprehensive Analysis Combining Structural Features for Detection of New Ransomware Families," from the Journal of Information Security and Applications. Note: The preprocessing step primarily involved merging similar APIs, feature selection, and normalizing the features based on their maximum and minimum values, considering only the training data. Some features exclusively exist in the test data, with zero occurrences in the training samples. For accurate testing, it's advisable to exclude these features from the training set. Family IDs: Avaddon 1 Babuk 2 Blackmatter 3 Conti 4 Darkside 5 Dharma 6 Doppelpaymer 7 Exorcist 8 Gandcrab 9 Lockbit 10 Makop 11 Maze 12 Mountlocker 13 Nefilim 14 Netwalker 15 Phobos 16 Pysa 17 Ragnarok 18 RansomeXX 19 Revil 20 Ryuk 21 Stop 22 Thanos 23 Wastedlocker 24 Zeppelin 25 AvosLocker 26 BianLian 27 BlackBasta 28 BlackByte 29 BlackCat 30 BlueSky 31 Clop 32 Hive 33 HolyGhost 34 Karma 35 Lorenz 36 Maui 37 Night Sky 38 PlayCrypt 39 Quantum 40
  • Editor: Mendeley Data
  • Data de criação/publicação: 2024
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.