Use of scipy in SDK

Hello,
I have been working towards documenting how to compile DWave's SDK for FreeBSD-13.1, using the native llvm-based toolchain and only adding parts as needed.  This has moved along well enough until I hit scipy and its F77 code.  I started with LFortran (after adding it as a compiler to the build system) but it is missing some key features (COMMON and SAVE).  I then attempted to use f2c (David M. Gay's Fortran to C converter).  f2c threw up some warnings but emitted C code, which clang rejected the C code for multiple reasons.  Digging deep into why, I found some  seriously dubious code in scipy.

To quote the authors of code that f2c warned about and clang rejected:

"The Fortran source codes in this distribution pass real*8 variables as integer variables, integers as real*8’s, real*8’s as complex*16’s, and so on. This is common practice in numerical codes, and is not an error;" 

For example, in the arrays transforming some of the vectors, there is this code

scipy/scipy/linalg/src/id_dist/src/id_rtrans.f

        subroutine idd_random_transf_init(nsteps,n,w,keep)
        implicit real *8 (a-h,o-z)
        save
        dimension w(*)
   ...     

        w(1)=ialbetas+0.1
        w(2)=iixs+0.1
        w(3)=nsteps+0.1
        w(4)=iww+0.1
        w(5)=n+0.1
        
For those unfamiliar with Fortran (I am a newbie myself and had to 1) look stuff up and 2) consult with a Fortran expert), w is declared to be an array of real *8 (single precision) and any variable starting with i to n inclusive are implicitly declared to be integers.  Clearly the elements of w are real and not integers.  However, much later in the code

        subroutine idd_random_transf_inverse(x,y,w)
        implicit real *8 (a-h,o-z)
        save
        dimension x(*),y(*),w(*)
...
        ialbetas=w(1)
        iixs=w(2)
        nsteps=w(3)
        iww=w(4)
        n=w(5)
...
        call idd_random_transf0_inv(nsteps,x,y,n,w(iww),
     1      w(ialbetas),w(iixs))
       
where the real values obtained from w are called as indexes into arrays.

I do know from personal experience and consulting with the aforementioned Fortran expert that this is not "common practice."  I am quite concerned about any compiler that would accept the above practice.  An index of 1.5000000 is not prohibited in the above code, and as far as I can tell, at least one open source Fortran compiler accepts this.  As a real number the value should be in floating point registers, but when used as an integer, it has to go through conversion to go into integer registers and vice-versa.  When stored in memory, these will have radically different values.  I believe under some conditions, the results would simply be unpredictable, even if regression testing is not revealing any flaws in the logic.

In my opinion, even with or without warnings and flags to allow it, any compiler that accepts the above is suspect.  It suggests to me that other dubious coding practices would also pass, as this is merely a cursory examination of Fortran code.  Even with some loose type inherent to Fortran 77, I would question if the abovecode meets F77 standards, and the file is clearly an F77 file.

I have had discussions with other people more familiar with GitHub than I am and they confirm this is recently written and committed code, from 2013 (not legacy code).  In my opinion, this points to a decline in standards within the scipy community, as the above code should never have made it into the tree for such a critically important module.  The above code is not the only example of these coding practices.

I post this because I believe with quantum computing being in its infancy and with DWave using quantum annealing instead of quantum gates, the use of scipy and a compiler that accepts the above coding practices is a potential target for criticism if not outright failure.

Perhaps DWave could post (or has posted) the particular parts of scipy the SDK uses, so that we could review that code for any possible issues and extract the subset necessary for SDK functionality, with code fixes and/or replacement as needed.

Sincerely,
Gregory

0

Comments

6 comments
  • I do not seem to be able to edit, so I will comment to my own post.

     

    According to the SunSoft document Fortran 77 version 4.0

    https://docs.oracle.com/cd/E19957-01/802-2998/802-2998.pdf

    real *8  are double precision, not single

     

    There is the comment on page 16

    "f77 allows the BYTE type as an array index, just as it allows the REAL type, but it does not allow BYTE as a DO loop index (where it allows only INTEGER, REAL, and DOUBLE PRECISION). Wherever FORTRAN 77 makes an explicit check for INTEGER, it does not allow BYTE."

    It also says on page 180

    1. Do not use INTEGER*8 variables or 8-byte constants or expressions when indexing arrays, otherwise, only 4 low-order bytes are taken into account. This action can cause unpredictable results in your program if the index value exceeds the range for 4-byte integers."

    2.  

    Therefore, a compiler that sticks to the F77 standard may have different interpretations of 64 bit indices.  The authors of the scipy code explicitly note that reals*8 are passed as integers, so therefore I believe it is reasonable to conclude this might get extended to integer*8 in the compiler.

     

    Someone helping me research this has tested the following code, compiled with gcc:

    ~/Documents/tmp$ cat nasty.f
          PROGRAM NASTY
          INTEGER K(100)
          DOUBLE PRECISION D
          DO 1 J=1,100
            K(J)=J
        1 CONTINUE
          READ(5,*) D
          WRITE(6,*) D,K(D)
          STOP

          END

     

    ~/Documents/tmp$ ./nasty
    4.78
       4.7800000000000002                4

     

    While it may be possible to get consistent results with the same compiler, from my experience a compiler that relies on hardware conversions between integer and floating point will have results dependent on that hardware.  When reviewing some documentation on RISC-V, I see they use rounding, which means it is possible the above code would yield 5, not 4.

    From a historical perspective, the use of reals as an index was likely acceptable because array indexes did not need to be contiguous.  According to

    https://www.h-schmidt.net/FloatConverter/IEEE754.html

    the integer 4 is 0x40800000 in 32 bits and according to

    https://binaryconvert.com/result_double.html?decimal=052

    is 0x4010000000000000 in 64 bit double precision.  In theory, when taken literally, both are legitimate indexes.  However, when taken as actual values, the results are potentially different.  Also, the integer 5 is 0x4014000000000000 in 64 bit double precision.  A very different index.

     

    I have to say, I think this needs to be reviewed for its impact on the DWave SDK.

    Sincerely,

    Gregory

     

    0
    Comment actions Permalink
  • Thank you for reaching out to us.
    Have you reached out to SciPy to create an issue in their GitHub?
    https://github.com/scipy/scipy/issues
    All usage of SciPy can be found in our open-source Ocean Tools GitHub repository.

    0
    Comment actions Permalink
  • I am not a member of Microsoft's GitHub, but I reached out to one of the lead developers of the LFortran effort, as they have been trying to compile scipy as well.  He located the original PR and posted his thoughts on the commit:

     

    https://github.com/scipy/scipy/pull/2616

     

    At the time of the commit, it was flagged by rgommers as not properly compiling.  They were told to ignore the warnings and enable flags that would allow it to compile unsafely.

     

    I will review the Ocean Tools repository and see if I can discern if it calls any of the questionable code in scipy.  However, it is possible that it does so indirectly through hierarchical calls, and I am not familiar with Python or its tools.

    0
    Comment actions Permalink
  • Although I am not an expert on Microsoft's GitHub, it looks like there is no direct calling into scipy, but instead of a dependency on pyqubo, which depends on scipy.  Please confirm.

     

    Thank you,

    Gregory

    0
    Comment actions Permalink
  • I found that in addition to minorminer and penaltymodel depending on scipy, dwave-system requires it.  From the origination of the questionable code, it appears to have been merged into scipy.linalg.interpolative.  Please confirm this module is not used.

     

    Thank you,

    Gregory

    0
    Comment actions Permalink
  • We have not seen any issues with SciPy, but will continue to monitor.
    Since this is a third party library, we kindly ask you to direct code-related concerns to the SciPy library owner.
    If you create a free GitHub account you will be able to create an issue in the SciPy repo and comment on existing issues.

    0
    Comment actions Permalink

Post is closed for comments.

Didn't find what you were looking for?

New post