Feature Selection
Most often a user does not know prior to running a machine learning model, how many features are important, and this is especially so as the number of features get very large. It would be great for the hybrid solver to have the ability to determine how many features are most important and return that list only so it could be used as a further import to additional machine learning modeling.
Comments
Hi Bill,
Thank you for submitting your feature request.
Would this satisfy your requirements for determining key features: https://github.com/dwavesystems/dwave-scikit-learn-plugin?
Please let us know if it doesn't and how it would be different from what this library can achieve.
Best Regards,
Tanjid
Thanks, but that is just pointing me to the plugin documentation. My question is very specific. For the feature selection I have to tell it how many features I want to retain. What if I don't know that information. It would be great if this plugin could determine the OPTIMAL NUMBER of features to return. If I have a dataset of 1000 features, I might not know the optimal number prior, so instead of giving an arbitrary number and then keep running over and over and over again with different values until I get a good model, it would be good if this was done in an unsupervised way for the user.
Just wondering if anyone took a look at this.
Hi Bill
Thank you for elaborating further.
It is not possible to have the hybrid solver determine the optimal number of features as part of the selection process.
To find the optimal number of features the performance of machine learning models with feature selection needs to be measured. As it is not possible to upload the actual classifier to the hybrid solver this must be done with an iterative process. One approach could be to use binary search, assuming a convex relationship between ML performance and feature selection. scikit-learn provides a binary search-based hyperparameter optimization that can be used for this purpose.
We have updated the plugin documentation to include a snippet demonstrating how to tune the number of variables using scikit-learn's hyperparameter optimizers: https://github.com/dwavesystems/dwave-scikit-learn-plugin#tuning
Best Regards,
Tanjid
Please sign in to leave a comment.