-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi Russ, I want to say thank you for the continuous improvement on the NumericEnsembles and related packages. I have some error corrections & suggestions for improvement.
- In the NumericEnsembles Introduction vignette, you have the following code examples:
Example 1
> library(NumericEnsembles)
> Numeric(data = MASS::Boston,
colnum = 14,
numresamples = 2,
remove_VIF_above = 5.00,
remove_ensemble_correlations_greater_than = 100,
scale_all_predictors_in_data = "N",
data_reduction_method = 0,
ensemble_reduction_method = 0,
how_to_handle_strings = 0,
predict_on_new_data = "N",
stratified_random_sampling = "N",
save_all_trained_models = "N",
set_seed = "Y",
save_all_plots = "N",
use_parallel = "Y",
train_amount = 0.60,
test_amount = 0.20,
validation_amount = 0.20)
which results in this error:
# Error in Numeric(data = MASS::Boston, colnum = 14, numresamples = 2, remove_VIF_above = 5, :
# unused argument (stratified_random_sampling = "N")
Example 2
> library(NumericEnsembles)
> Numeric(data = ISLR::Carseats,
+ colnum = 1,
+ numresamples = 2,
+ remove_VIF_above = 5.00,
+ remove_ensemble_correlations_greater_than = 1.00,
+ scale_all_predictors_in_data = "N",
+ data_reduction_method = 0,
+ ensemble_reduction_method = 0,
+ how_to_handle_strings = 2,
+ predict_on_new_data = "N",
+ save_all_trained_models = "N",
+ set_seed = "Y",
+ save_all_plots = "N",
+ use_parallel = "Y",
+ train_amount = 0.60,
+ test_amount = 0.20,
+ validation_amount = 0.20)
Which integer would you like to use for the seed? 12345
which results in this error:
Error in Numeric(data = ISLR::Carseats, colnum = 1, numresamples = 2, :
argument "stratified_random_column" is missing, with no default
- When using the stratified_random_column parameter = 0 in the Numeric function with xgboost version 1.7.5.1 with the following code:
> library(NumericEnsembles)
> Numeric(data = MASS::Boston, colnum = 14, numresamples = 2, remove_VIF_above = 5.00, remove_data_correlations_greater_than = 0.99, remove_ensemble_correlations_greater_than = 100, scale_all_predictors_in_data = "N", data_reduction_method = 0, ensemble_reduction_method = 0, how_to_handle_strings = 0, predict_on_new_data = "N", set_seed = "N", save_all_trained_models = "N", save_all_plots = "N", use_parallel = "Y", stratified_random_column = 0, train_amount = 0.60, test_amount = 0.20, validation_amount = 0.20)
Resampling number 1 of 2,
Working on Bagging
Working on BayesGLM
Working on BayesRNN
Number of parameters (weights and biases) to estimate: 30
Nguyen-Widrow method
Scaling factor= 0.7015619
gamma= 29.2179 alpha= 4.7623 beta= 19818.09
Working on Cubist
Working on Earth
Working on Elastic
Working on Generalized Additive Models with Smoothing Splines
Working on Gradient Boosted
Using 100 trees...
Using 100 trees...
Using 100 trees...
Using 100 trees...
Using 100 trees...
Using 100 trees...
Working on Lasso
Working on Linear
Working on Neuralnet
# weights: 13
initial value 579835.644333
iter 10 value 309875.230199
final value 7381.250996
converged
Working on Partial Least Squares
Working on Principal Components Regression
Working on Ridge
Working on RPart
Working on Support Vector Machines
Working on Trees
Working on XGBoost
I get the following error:
Error: 'xgb.params' is not an exported object from 'namespace:xgboost'
- You include R packages in a Depends statement rather than an Imports statement in the DESCRIPTION file. I offer the following articles with more information on the difference betwween the 2:
https://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
r - Better explanation of when to use Imports/Depends - Stack Overflow
https://r-pkgs.org/dependencies-in-practice.html
"R Packages" (2e): 11 Dependencies: In Practice
By Hadley Wickham and Jennifer Bryan
- Instead of solely predicting on "new data", will you also add in the ability to forecast as well?
Your ForecastingEnsembles package does not work on daily data & I have had success with the training and validation of daily data (the time element is numeric) using the NumericEnsembles; however, it does not do any forecasting.
- Instead of using
message(), you may want to consider using eithercat()orprint(), see below:
> tempdir1 <- tempdir()
> tempdir1
[1] "/tmp/RtmpJtdnMX"
> message("The trained models are temporariliy saved in this directory: tempdir1. This directory is automatically deleted at the end of the R session.\n You may save the trained models before you end this session if you chose to do so.")
The trained models are temporariliy saved in this directory: tempdir1. This directory is automatically deleted at the end of the R session.
You may save the trained models before you end this session if you chose to do so.
> cat("Current tempdir(): ", tempdir(), "\n")
Current tempdir(): /tmp/RtmpJtdnMX
> print(tempdir())
[1] "/tmp/RtmpJtdnMX"
- In order to run the Numeric function within a Markdown document with set_seed & predict_on_new_data, is it possible to create a Numeric function that allows the direct setting of an integer value for set_seed in the function call and also the direct setting of the data URL for predict_on_new_data too, i.e. rather than using the readline function call?
Thank you.
Irucka