Is it possible to retry sample_qubo ?
I am developing a solver for a large QUBO problem, which repeats generating a small sub QUBO problem and solving it by "sample_qubo". It is expected to repeat calling "sample_qubo" 5000 times for obtaining a good solution. However, with high probability, my solver stops with the following message in less than 500 iterations. Probably, sample_qubo abnormally terminates due to some connection error. Is there any way to retry "sample_qubo" ? I am using DWaveCliqueSampler on Advantage_system4.1 and calling sample_qubo for 177 bit QUBO (largest_clique_size) and 300 numruns. Also, I am executing my solver on an Ubuntu server in my Lab.
----
Traceback (most recent call last):
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.6/http/client.py", line 1373, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 311, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.6/http/client.py", line 280, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nakano/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/nakano/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.6/http/client.py", line 1373, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 311, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.6/http/client.py", line 280, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
Comments
Koji,
Without seeing the code or a network trace, I cannot say for sure, but it may be that the D-Wave servers or some other network connection along the way may be interpreting your high number of repeated attempts as a Denial of Service attack or some other abuse of the system. You might try throttling your requests by placing a one or two second delay every 100 or 200 requests.
Also, if you are doing the repeated runs just to get a sample which contains the result you are looking for, try increasing the num_reads done per call (for example 5000, instead of 300) to sample_qubo, allowing you to reduce the number of times you need to call the api.
I hope this helps.
Ed
Hello,
If you could provide a simple code example showing the order in which the functions are called, it would help us better understand where exactly the issue is occurring.
We have, for instance, seen errors when many instances of DWaveSampler are being created consecutively.
Usually this happens because the user has put the DWaveSampler() constructor call inside the loop along with the sample_qubo() call.
To fix this issue, the DWaveSampler can be instantiated before the loop, and then the sample_qubo() calls can be made inside of the loop.
Once we have a few more details it will be easier to help determine the source of the issue and how to move forward to help remediate it.
Thank you very much for prompt reply.
My solver first calls DWaveCliqueSampler(solver='Advantage_system4.1'). After that sample_qubo is called repeatedly. So, DWaveCliqueSampler is called only once.
Since the source code of my solver contains a lot of technical contents that we do not want to reveal, I will make a simpler version of code that reproduces error and will post it here.
I found an inappropriate implementation in my solver, which may use quite large memory much larger than a host server memory. After fixing it, it never terminates. The inappropriate implementation is to use defaultdict to store values of a large QUBO input martix.
Since target QUBO matrices are sparse and most of elements are zero, I have used defaultdict so that only non-zero elements are stored. However, defaultdict generates new item {key:0} for non-registered elements if they are read. In my solver, zero-elements in QUBO matrices are read to generate a small QUBO matrix to be solved by a D-Wave quantum annealer. This will generates quite many new items {key:0} in defaultdict. So, I have modified my solver so that it simply uses dict to store a QUBO matrix and existence of key is checked before reading. By this modification, new items {key:0} are not generated and also the performance is much improved. So far, my solver never terminates in several times experiments.
Hello,
That's great to hear that you were able to find the issue that was causing a slow down.
Are you still seeing the connection error?
If this is a new issue, it might be a good idea to start a new thread that is specifically about this issue.
We will still need you to provide a simple code example that describes the issue you are seeing.
If you could clarify where the process gets stuck and does not terminate, it will help us better understand what is happening.
Thank you for your communication and patience.
Thank you very much for your comments.
After fixing the inappropriate usage of defaultdict, I never see connection error in several experiments. Since the error is probabilistic, I am still not sure if this inappropriate usage caused the connection error.
Below is a sample code extracted from my solver. My solver is much more complicated, but DwaveClique Sampler and sample_qubo are called in this way.
-----
Hello,
Thank you for sharing your code example.
This all looks like it should be ok.
If you do see the issue again, please let us know.
When result.first is executed, the result will be resolved before proceeding, but here the timing info will be a bit off, because the sample functions are asynchronous.
You can call result.resolve() before rtime = time.perf_counter()-start to correct this issue.
Please let us know if you have any more questions or if you see any of the issues you were seeing before.
I will report if I find the same connection error.
I did not know that sample function is non-blocking. Thank you very much for the information. This is very important to evaluate the performance.
Please sign in to leave a comment.