![]() It is not recommended to compute the performance of predictive algorithms over multiple datasets using the same or default values of their hyperparameters. Panel D shows the importance of tuning the hyperparameters of the regressors for each dataset. We can also see that for a few datasets, none of the regressors perform well. The linear regressors at the bottom-left of the subplot achieve lower scores than the ensemble regressors such as xgboost, gradient boosting at top-left of the subplot. Panel C shows the r-squared scores of each regressor for all datasets. Decision and extra tree regressors are also fast, but their accuracy is better (> 0.7 r-squared score) than the linear regressors. Regressors such as linear regression, huber and elastic net are fast, but their accuracy is low. The regressors such as xgboost, gradient boosting and extra trees achieve > 0.80 r-squared score, but extra trees regressor requires significantly more time to finish compared to the other two regressors. The running time of a regressor on a dataset is the sum of the training and validation times for the best regression model. We compute an average running time of each regressor over all 112 datasets. Panel B shows a comparison between the running time and accuracy of different regressors. The ensemble regressors perform better on average than the other categories which include linear, tree and nearest neighbors regressors. The subplot also shows a comparison of different regressors (on y-axis). For example, by mapping the color of the square between adaboost (shown on y-axis) and linear regression (LR) shown on x-axis to the adjacent color-scale, we conclude that the adaboost regressor performs better on 75–80 datasets (out of 112) than the linear regressor. Panel A shows a heatmap in which each square contains a number of datasets for which the regressor on the left (wins) performed better than the regressor on the bottom (losses). In panels A and C, heatmaps show the percentage of datasets for which the model listed along the row outperforms the model along the column. S1 Fig: Comparison of different regression models. Qiang Gu, Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing, 1, 2 Anup Kumar, Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, 3 Simon Bray, Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, 3 Allison Creason, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Visualization, Writing – original draft, 1, 2 Alireza Khanteymoori, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, 3 Vahid Jalili, Data curation, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, 1, 2 Björn Grüning, Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, 3 and Jeremy Goecks, Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing 1, 2, *
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |