ValueError("no embedding found") in QBoost for Binary Classification
I have a considerably large dataset with 350000 rows and 9418 features, which I want to train and classify using Qboost. I have reduced the number of features using PCA and tried running QBoost with a range of features to understand the quality of the prediction results, e.g., [25, 50, ..., 225, 259] features. I am stopping at 259 features based on the explained variance ratio. Inside QBoost in the section below,
def _minimize_squared_loss_binary(H, y, lam):
"""Minimize squared loss using binary weight variables."""
bqm = _build_bqm(H, y, lam)
# Submit the problem to a QPU solver
sampler = *** HERE ***
results = sampler.sample(bqm, label=label)
weights = np.array(list(results.first.sample.values()))
energy = results.first.energy
return weights, energy
I tried the below-mentioned samplers. I ran cross-validation to find the optimum value of Lambda for the training process.
- LeapHybridSampler()
- EmbeddingComposite(DWaveSampler())
- FixedEmbeddingComposite(DWaveSampler())
- LazyFixedEmbeddingComposite(DWaveSampler())
With (1), I had no problems whatsoever in completing all the runs for training and testing the dataset. The training concluded fast, along with cross-validation. Understanding that LeapHybridSampler() is a hybrid sampler that uses QPUs only for specific tasks, the performance is satisfactory.
With (2), using 175 features for PCA, giving this error during crass-validation in the code below
normalized_lambdas = np.linspace(0.0, 1.75, 10)
lambdas = normalized_lambdas / n_features
print('Performing cross-validation using {} '
'values of lambda, this make take several minutes...'.format(len(lambdas)))
clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
print('Best Classifier: ', clf_qboost)
print('Best lambda value: ', lam)
Traceback (most recent call last):
File ".../main.py", line 454, in <module>
main()
File ".../main.py", line 407, in main
model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True) # use False to run without cv (harcoded cv value)
File ".../main.py", line 212, in train_classify_Qboost
clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
File ".../qboost/qboost.py", line 296, in __init__
weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
File ".../qboost/qboost.py", line 249, in _minimize_squared_loss_binary
results = sampler.sample(bqm, label=label)
File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 239, in sample
raise ValueError("no embedding found")
ValueError: no embedding found
What might be the issue? What am I not getting here?
With (3), it failed straight away even with 25 features, showing TypeError("either embedding or source_adjacency must be provided"). In this context, can you help me with how this can be used for the problem, or even this can be used at all for such problems? Please find the stacktrace below:
Traceback (most recent call last):
File ".../main.py", line 454, in <module>
main()
File ".../main.py", line 407, in main
model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True)
File ".../main.py", line 212, in train_classify_Qboost
clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
File ".../qboost/qboost.py", line 296, in __init__
weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
File ".../qboost/qboost.py", line 241, in _minimize_squared_loss_binary
sampler = FixedEmbeddingComposite(DWaveSampler())
File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 549, in __init__
raise TypeError("either embedding or source_adjacency must be "
TypeError: either embedding or source_adjacency must be provided
With (4), the code ran fine until 150 features, but while processing 175 features, it gave a ValueError("no embedding found") error after two rounds of cross-validation. The stack trace did not give any valuable information. Please find it here for reference.
Traceback (most recent call last):
File ".../main.py", line 454, in <module>
main()
File ".../main.py", line 407, in main
model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True) # use False to run without cv (harcoded cv value)
File ".../main.py", line 212, in train_classify_Qboost
clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
File ".../qboost/qboost.py", line 296, in __init__
weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
File ".../qboost/qboost.py", line 249, in _minimize_squared_loss_binary
results = sampler.sample(bqm, label=label)
File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 239, in sample
raise ValueError("no embedding found")
ValueError: no embedding found
My questions are below:
- What did I miss in (3), which is not missing in (1), (2) and (4)? What is the main issue here?
- Why am I getting the "no embedding found" error in (2) and (4)? Why is the code running with 150 features while failing with more features, is that something to do with the size of the problem? Shall I try with a smaller dataset or number of features?
- Am I correct to interpret that (1) is a Hybrid sampler, whereas the rest are completely QPU based?
Thanks in advance for your patience and any help.
Comments
Hello,
Thank you for reaching out to us with your questions!
To answer your questions:
The EmbeddingComposite accepts arbitrary problems and maps them to the architecture of the QPU. The FixedEmbeddingComposite is similar to the EmbeddingComposite, but requires an embedding to be provided. Check out this community post for more information: https://support.dwavesys.com/hc/en-us/community/posts/360016737274-Save-time-Reuse-your-embedding-when-possible
Embedding or minor embedding is the process of mapping an arbitrary problem onto the architecture or topology of a given QPU. If the problem is too big to be run on the QPU directly, using these methods, you will get a "no embedding found" error. There's a bit more information about minor embedding and topologies in the Getting Started with D-Wave Solvers guide.
That is correct, (1) uses a Hybrid Sampler and (2), (3), (4) use the QPU directly (with an embedding as mentioned above).
I hope this was helpful!
Please sign in to leave a comment.