Qboost question

Dani C

November 25, 2020 09:20

I have been using the Qboost demo template to solve some binary classification problems. I have a 2 questions:

1) Is there any way to check which items in my test dataset are being classified correctly and which aren't? It merely gives me a percent accuracy, I've been unsuccessful at my attempts to extract any lower-level data.

2) Is there any way to "save" the model that is created by the training dataset in order to readily classify future items that aren't currently in my testing dataset? That is, do I need to feed in the same training data every time I want to re-run the machine learning algorithm to classify a new point?

Comments

1 comment

David J (Report)
November 27, 2020 02:32
Hello,

This will all depend on the approach you take.
The code in demo would need to be modified.

There are several models in the demo file.
For QBoost in particular, the model is currently named clf3.

I believe the model can be saved by using Python's pickle module it:
https://docs.python.org/3/library/pickle.html

You would just need to dump() it and then load() it.

Here's an example:
```
clf3 = QBoostClassifier(n_estimators=NUM_WEAK_CLASSIFIERS, max_depth=TREE_DEPTH)
clf3.fit(X_train, y_train, emb_sampler, lmd=lmd, **DW_PARAMS)

...

file = open("clf3.model", "wb")

pickle.dump(clf3, file)

file.close()

...

file = open("clf3.model", "rb")

loaded_clf3_model = pickle.load(file)

file.close()
```
As you can see, the model is inside of one of the function calls, so you would have to rewrite a bunch of this code and do some error handling around checking whether the file exists, etc.

This seems like the most simple way of saving a model.
There might be a way to just save the weights, etc, but this seemed like the easiest to start with.

For the other question, we should look at these two lines:
print('accu (train): %5.2f' % (metric(y_train, y_train_dw)))

print('accu (test): %5.2f' % (metric(y_test, y_test_dw)))
These two lines are saying what the training and test data are respectively, and what the output of both is based on the predictor.
You can just compare the actual values to the predicted values and print the output.

I did the following to compare +/- 1.

Multiplying the values will be negative if they are different and positive if they are the same.
for i in range(len(y_train)):
print('(train): \n%s' % ((y_train[i] * y_train_dw[i])>0))

for i in range(len(y_test)):
print('(test): \n%s' % ((y_test[i] * y_test_dw[i])>0))
I confirmed that the number of True values generated over the total number of samples is the same as the output from the metric call above.

I hope this was helpful.

Please let us know if you have any more questions!
0

Comment actions Permalink

Please sign in to leave a comment.

Didn't find what you were looking for?

New post