ValueError("no embedding found") in QBoost for Binary Classification

Anibrata P

June 03, 2024 09:40

I have a considerably large dataset with 350000 rows and 9418 features, which I want to train and classify using Qboost. I have reduced the number of features using PCA and tried running QBoost with a range of features to understand the quality of the prediction results, e.g., [25, 50, ..., 225, 259] features. I am stopping at 259 features based on the explained variance ratio. Inside QBoost in the section below,

def _minimize_squared_loss_binary(H, y, lam):
"""Minimize squared loss using binary weight variables."""
bqm = _build_bqm(H, y, lam)

# Submit the problem to a QPU solver
sampler = * HERE *
results = sampler.sample(bqm, label=label)
weights = np.array(list(results.first.sample.values()))
energy = results.first.energy

return weights, energy

I tried the below-mentioned samplers. I ran cross-validation to find the optimum value of Lambda for the training process.

LeapHybridSampler()
EmbeddingComposite(DWaveSampler())
FixedEmbeddingComposite(DWaveSampler())
LazyFixedEmbeddingComposite(DWaveSampler())

With (1), I had no problems whatsoever in completing all the runs for training and testing the dataset. The training concluded fast, along with cross-validation. Understanding that LeapHybridSampler() is a hybrid sampler that uses QPUs only for specific tasks, the performance is satisfactory.

With (2), using 175 features for PCA, giving this error during crass-validation in the code below

normalized_lambdas = np.linspace(0.0, 1.75, 10)
lambdas = normalized_lambdas / n_features
print('Performing cross-validation using {} '
'values of lambda, this make take several minutes...'.format(len(lambdas)))
clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
print('Best Classifier: ', clf_qboost)
print('Best lambda value: ', lam)

Traceback (most recent call last):
  File ".../main.py", line 454, in <module>
    main()
  File ".../main.py", line 407, in main
    model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True)  # use False to run without cv (harcoded cv value)
  File ".../main.py", line 212, in train_classify_Qboost
    clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
  File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
    qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
  File ".../qboost/qboost.py", line 296, in __init__
    weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
  File ".../qboost/qboost.py", line 249, in _minimize_squared_loss_binary
    results = sampler.sample(bqm, label=label)
  File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 239, in sample
    raise ValueError("no embedding found")
ValueError: no embedding found

What might be the issue? What am I not getting here?

With (3), it failed straight away even with 25 features, showing TypeError("either embedding or source_adjacency must be provided"). In this context, can you help me with how this can be used for the problem, or even this can be used at all for such problems? Please find the stacktrace below:

Traceback (most recent call last):
  File ".../main.py", line 454, in <module>
    main()
  File ".../main.py", line 407, in main
    model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True)  
  File ".../main.py", line 212, in train_classify_Qboost
    clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
  File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
    qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
  File ".../qboost/qboost.py", line 296, in __init__
    weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
  File ".../qboost/qboost.py", line 241, in _minimize_squared_loss_binary
    sampler = FixedEmbeddingComposite(DWaveSampler()) 
  File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 549, in __init__
    raise TypeError("either embedding or source_adjacency must be "
TypeError: either embedding or source_adjacency must be provided

With (4), the code ran fine until 150 features, but while processing 175 features, it gave a ValueError("no embedding found") error after two rounds of cross-validation. The stack trace did not give any valuable information. Please find it here for reference.

Traceback (most recent call last):
File ".../main.py", line 454, in <module>
    main()
File ".../main.py", line 407, in main
    model, train_time, sampler_name = train_classify_Qboost(X_train, y_train, True)  # use False to run without cv (harcoded cv value)
File ".../main.py", line 212, in train_classify_Qboost
    clf_qboost, lam = qboost_lambda_sweep(X, y, lambdas, verbose=True)
File ".../qboost/qboost.py", line 361, in qboost_lambda_sweep
    qb = QBoostClassifier(X_train, y_train, lam, **kwargs)
File ".../qboost/qboost.py", line 296, in __init__
    weights, self.energy, self.sampler_name = _minimize_squared_loss_binary(H, y, lam)
File ".../qboost/qboost.py", line 249, in _minimize_squared_loss_binary
    results = sampler.sample(bqm, label=label)
File ".../lib/python3.10/site-packages/dwave/system/composites/embedding.py", line 239, in sample
    raise ValueError("no embedding found")
ValueError: no embedding found

My questions are below:

What did I miss in (3), which is not missing in (1), (2) and (4)? What is the main issue here?
Why am I getting the "no embedding found" error in (2) and (4)? Why is the code running with 150 features while failing with more features, is that something to do with the size of the problem? Shall I try with a smaller dataset or number of features?
Am I correct to interpret that (1) is a Hybrid sampler, whereas the rest are completely QPU based?

Thanks in advance for your patience and any help.

Comments

1 comment

David J (Report)
June 04, 2024 01:51
Hello,

Thank you for reaching out to us with your questions!

To answer your questions:
1. What did I miss in (3), which is not missing in (1), (2) and (4)? What is the main issue here?
  
  The EmbeddingComposite accepts arbitrary problems and maps them to the architecture of the QPU. The FixedEmbeddingComposite is similar to the EmbeddingComposite, but requires an embedding to be provided. Check out this community post for more information: https://support.dwavesys.com/hc/en-us/community/posts/360016737274-Save-time-Reuse-your-embedding-when-possible
2. Why am I getting the "no embedding found" error in (2) and (4)? Why is the code running with 150 features while failing with more features, is that something to do with the size of the problem? Shall I try with a smaller dataset or number of features?
  
  Embedding or minor embedding is the process of mapping an arbitrary problem onto the architecture or topology of a given QPU. If the problem is too big to be run on the QPU directly, using these methods, you will get a "no embedding found" error. There's a bit more information about minor embedding and topologies in the Getting Started with D-Wave Solvers guide.
3. Am I correct to interpret that (1) is a Hybrid sampler, whereas the rest are completely QPU based?
  
  That is correct, (1) uses a Hybrid Sampler and (2), (3), (4) use the QPU directly (with an embedding as mentioned above).
I hope this was helpful!
0

Comment actions Permalink

Please sign in to leave a comment.

ValueError("no embedding found") in QBoost for Binary Classification

Comments

Didn't find what you were looking for?