Skip to content

Error Corrections and Suggestions for Improvement (6 items) #15

@iembry

Description

@iembry

Hi Russ, I want to say thank you for the continuous improvement on the NumericEnsembles and related packages. I have some error corrections & suggestions for improvement.

  1. In the NumericEnsembles Introduction vignette, you have the following code examples:

Example 1

> library(NumericEnsembles)
> Numeric(data = MASS::Boston,
        colnum = 14,
        numresamples = 2,
        remove_VIF_above = 5.00,
        remove_ensemble_correlations_greater_than = 100,
        scale_all_predictors_in_data = "N",
        data_reduction_method = 0,
        ensemble_reduction_method = 0,
        how_to_handle_strings = 0,
        predict_on_new_data = "N",
        stratified_random_sampling = "N",
        save_all_trained_models = "N",
        set_seed = "Y",
        save_all_plots = "N",
        use_parallel = "Y",
        train_amount = 0.60,
        test_amount = 0.20,
        validation_amount = 0.20)

which results in this error:

# Error in Numeric(data = MASS::Boston, colnum = 14, numresamples = 2, remove_VIF_above = 5,  :
#  unused argument (stratified_random_sampling = "N")

Example 2

> library(NumericEnsembles)
> Numeric(data = ISLR::Carseats,
+         colnum = 1,
+         numresamples = 2,
+         remove_VIF_above = 5.00,
+         remove_ensemble_correlations_greater_than = 1.00,
+         scale_all_predictors_in_data = "N",
+         data_reduction_method = 0,
+         ensemble_reduction_method = 0,
+         how_to_handle_strings = 2,
+         predict_on_new_data = "N",
+         save_all_trained_models = "N",
+         set_seed = "Y",
+         save_all_plots = "N",
+         use_parallel = "Y",
+         train_amount = 0.60,
+         test_amount = 0.20,
+         validation_amount = 0.20)
Which integer would you like to use for the seed? 12345

which results in this error:

Error in Numeric(data = ISLR::Carseats, colnum = 1, numresamples = 2,  :
  argument "stratified_random_column" is missing, with no default
  1. When using the stratified_random_column parameter = 0 in the Numeric function with xgboost version 1.7.5.1 with the following code:
> library(NumericEnsembles)

> Numeric(data = MASS::Boston, colnum = 14, numresamples = 2, remove_VIF_above = 5.00, remove_data_correlations_greater_than = 0.99, remove_ensemble_correlations_greater_than = 100, scale_all_predictors_in_data = "N", data_reduction_method = 0, ensemble_reduction_method = 0, how_to_handle_strings = 0, predict_on_new_data = "N", set_seed = "N", save_all_trained_models = "N", save_all_plots = "N", use_parallel = "Y", stratified_random_column = 0, train_amount = 0.60, test_amount = 0.20, validation_amount = 0.20)

Resampling number 1 of 2,

Working on Bagging
Working on BayesGLM
Working on BayesRNN
Number of parameters (weights and biases) to estimate: 30
Nguyen-Widrow method
Scaling factor= 0.7015619
gamma= 29.2179   alpha= 4.7623   beta= 19818.09
Working on Cubist
Working on Earth
Working on Elastic
Working on Generalized Additive Models with Smoothing Splines
Working on Gradient Boosted
Using 100 trees...

Using 100 trees...

Using 100 trees...

Using 100 trees...

Using 100 trees...

Using 100 trees...

Working on Lasso
Working on Linear
Working on Neuralnet
# weights:  13
initial  value 579835.644333
iter  10 value 309875.230199
final  value 7381.250996
converged
Working on Partial Least Squares
Working on Principal Components Regression
Working on Ridge
Working on RPart
Working on Support Vector Machines
Working on Trees
Working on XGBoost


I get the following error:

Error: 'xgb.params' is not an exported object from 'namespace:xgboost'
  1. You include R packages in a Depends statement rather than an Imports statement in the DESCRIPTION file. I offer the following articles with more information on the difference betwween the 2:

https://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
r - Better explanation of when to use Imports/Depends - Stack Overflow

https://r-pkgs.org/dependencies-in-practice.html
"R Packages" (2e): 11 Dependencies: In Practice
By Hadley Wickham and Jennifer Bryan

  1. Instead of solely predicting on "new data", will you also add in the ability to forecast as well?

Your ForecastingEnsembles package does not work on daily data & I have had success with the training and validation of daily data (the time element is numeric) using the NumericEnsembles; however, it does not do any forecasting.

  1. Instead of using message(), you may want to consider using either cat() or print(), see below:
> tempdir1 <- tempdir()
> tempdir1
[1] "/tmp/RtmpJtdnMX"

> message("The trained models are temporariliy saved in this directory: tempdir1. This directory is automatically deleted at the end of the R session.\n          You may save the trained models before you end this session if you chose to do so.")
The trained models are temporariliy saved in this directory: tempdir1. This directory is automatically deleted at the end of the R session.
          You may save the trained models before you end this session if you chose to do so.
          
> cat("Current tempdir(): ", tempdir(), "\n")
Current tempdir():  /tmp/RtmpJtdnMX

> print(tempdir())
[1] "/tmp/RtmpJtdnMX"
  1. In order to run the Numeric function within a Markdown document with set_seed & predict_on_new_data, is it possible to create a Numeric function that allows the direct setting of an integer value for set_seed in the function call and also the direct setting of the data URL for predict_on_new_data too, i.e. rather than using the readline function call?

Thank you.

Irucka

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions