Feature Selection

Bill W

April 27, 2023 16:50

Most often a user does not know prior to running a machine learning model, how many features are important, and this is especially so as the number of features get very large. It would be great for the hybrid solver to have the ability to determine how many features are most important and return that list only so it could be used as a further import to additional machine learning modeling.

Comments

4 comments

Tanjid I (Report)
May 05, 2023 15:45

Hi Bill,

Thank you for submitting your feature request.

Would this satisfy your requirements for determining key features: https://github.com/dwavesystems/dwave-scikit-learn-plugin?

Please let us know if it doesn't and how it would be different from what this library can achieve.

Best Regards,

Tanjid

0

Comment actions Permalink
Bill W (Report)
May 05, 2023 16:26

Thanks, but that is just pointing me to the plugin documentation. My question is very specific. For the feature selection I have to tell it how many features I want to retain. What if I don't know that information. It would be great if this plugin could determine the OPTIMAL NUMBER of features to return. If I have a dataset of 1000 features, I might not know the optimal number prior, so instead of giving an arbitrary number and then keep running over and over and over again with different values until I get a good model, it would be good if this was done in an unsupervised way for the user.

0

Comment actions Permalink
Bill W (Report)
May 18, 2023 16:15

Just wondering if anyone took a look at this.

0

Comment actions Permalink
Tanjid I (Report)
May 19, 2023 21:14

Hi Bill

Thank you for elaborating further.

It is not possible to have the hybrid solver determine the optimal number of features as part of the selection process.

To find the optimal number of features the performance of machine learning models with feature selection needs to be measured. As it is not possible to upload the actual classifier to the hybrid solver this must be done with an iterative process. One approach could be to use binary search, assuming a convex relationship between ML performance and feature selection. scikit-learn provides a binary search-based hyperparameter optimization that can be used for this purpose.

We have updated the plugin documentation to include a snippet demonstrating how to tune the number of variables using scikit-learn's hyperparameter optimizers: https://github.com/dwavesystems/dwave-scikit-learn-plugin#tuning

Best Regards,
Tanjid

0

Comment actions Permalink

Please sign in to leave a comment.

Didn't find what you were looking for?

New post