Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s_gd2 typeerror #95

Open
scottgigante opened this issue Jun 30, 2020 · 13 comments
Open

s_gd2 typeerror #95

scottgigante opened this issue Jun 30, 2020 · 13 comments
Assignees
Labels

Comments

@scottgigante
Copy link
Collaborator

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9418f70a3d50> in <module>
      1 import phate
----> 2 Y = phate.PHATE(knn_dist='precomputed').fit_transform(A)

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in fit_transform(self, X, **kwargs)
    939         with _logger.task("PHATE"):
    940             self.fit(X)
--> 941             embedding = self.transform(**kwargs)
    942         return embedding
    943 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in transform(self, X, t_max, plot_optimal_t, ax)
    908                         n_jobs=self.n_jobs,
    909                         seed=self.random_state,
--> 910                         verbose=max(self.verbose - 1, 0),
    911                     )
    912             if isinstance(self.graph, graphtools.graphs.LandmarkGraph):

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in embed_MDS(X, ndim, how, distance_metric, solver, n_jobs, seed, verbose)
    228         try:
    229             # use sgd2 if it is available
--> 230             Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
    231             if np.any(~np.isfinite(Y)):
    232                 _logger.warning("Using SMACOF because SGD returned NaN")

</mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/lib/python3.7/site-packages/decorator.py:decorator-gen-157> in sgd(D, n_components, random_state, init)

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/scprep/utils.py in _with_pkg(fun, pkg, min_version, *args, **kwargs)
     81         check_version(pkg, min_version=min_version)
     82         __imported_pkgs.add((pkg, min_version))
---> 83     return fun(*args, **kwargs)
     84 
     85 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in sgd(D, n_components, random_state, init)
     82     D = squareform(D)
     83     # Metric MDS from s_gd2
---> 84     Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
     85     return Y
     86 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/s_gd2/s_gd2.py in mds_direct(n, d, w, etas, num_dimensions, random_seed, init)
     82 
     83     # do mds
---> 84     cpp.mds_direct(X, d, w, etas, random_seed)
     85     return X
     86 

TypeError: Array of type 'double' required.  A 'unknown type' was given
@scottgigante scottgigante self-assigned this Jun 30, 2020
@trberg
Copy link

trberg commented Feb 3, 2021

Is there a resolution to this error? I keep running into this problem. I've been using pandas dataframes and I've tried changing data types with the same result.

Thanks!

@scottgigante
Copy link
Collaborator Author

Could you post the data and code you're using that produces the error? I'm having a hard time reproducing it.

In the meantime, you can avoid the error by using mds_solver='smacof'.

@trberg
Copy link

trberg commented Feb 3, 2021

I can't post all the data, but I've included a small print out of the data below.

data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)

data_phate = phate_op.fit_transform(data)

Here is the error this code outputs.

           002  003  004  005  006  007  008  009  010  ...  44786754  44786774  44786872  44787062  44816559   45771331  46234829  46235085  46235338
sample_id                                               ...                                                                                           
1    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  82.975610       0.0       0.0       0.0
2    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  91.886364       0.0       0.0       0.0
3    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  85.580645       0.0       0.0       0.0
4    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  89.466667       0.0       0.0       0.0
5    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0   0.000000       0.0       0.0       0.0
...        ...  ...  ...  ...  ...  ...  ...  ...  ...  ...       ...       ...       ...       ...       ...        ...       ...       ...       ...
96   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.828571       0.0       0.0       0.0
97   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.408163       0.0       0.0       0.0
98   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.040816       0.0       0.0       0.0
99   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  94.113924       0.0       0.0       0.0
100  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  88.694444       0.0       0.0       0.0

[100 rows x 3335 columns]
Calculating PHATE...
  Running PHATE on 100 observations and 3335 variables.
  Calculating graph and diffusion operator...
/data/users/trberg/anaconda3/lib/python3.7/site-packages/graphtools/graphs.py:121: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
  UserWarning,
    Calculating KNN search...
    Calculated KNN search in 0.11 seconds.
    Calculating affinities...
  Calculated graph and diffusion operator in 0.20 seconds.
  Calculating optimal t...
    Automatically selected t = 9
  Calculated optimal t in 0.04 seconds.
  Calculating diffusion potential...
  Calculating metric MDS...
  Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.26 seconds.
Traceback (most recent call last):
  File "feature_reduction.py", line 74, in <module>
    data_phate = phate_op.fit_transform(data)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 941, in fit_transform
    embedding = self.transform(**kwargs)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 910, in transform
    verbose=max(self.verbose - 1, 0),
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 230, in embed_MDS
    Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
  File "</data/users/trberg/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-146>", line 2, in sgd
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/scprep/utils.py", line 83, in _with_pkg
    return fun(*args, **kwargs)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 84, in sgd
    Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/s_gd2/s_gd2.py", line 84, in mds_direct
    cpp.mds_direct(X, d, w, etas, random_seed)
TypeError: Array of type 'double' required.  A 'unknown type' was given

@scottgigante
Copy link
Collaborator Author

Could you please run the following:

data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data.to_pickle("data.pickle.gz")

and then drag data.pickle.gz into your reply? That should be small enough to post.

@trberg
Copy link

trberg commented Feb 4, 2021

The issue isn't the size of the data, it's sensitive biomedical data that I don't have permission to upload in full.

@trberg
Copy link

trberg commented Feb 4, 2021

But what you're seeing in my comment above is pretty much what it looks like.

@scottgigante
Copy link
Collaborator Author

Unfortunately if I'm unable to view the data it's going to be difficult to diagnose. I tried to replicate data like yours and it runs fine.

>>> import numpy as np
>>> import pandas as pd
>>> import phate
>>> data = pd.DataFrame(np.random.normal(0, 1, (100, 3335)))
>>> data.index.name = "sample_id"
>>> data = data.astype(np.float64)
>>> phate_op = phate.PHATE()
>>> data_phate = phate_op.fit_transform(data)
Calculating PHATE...
  Running PHATE on 100 observations and 3335 variables.
  Calculating graph and diffusion operator...
/home/scottgigante/.local/lib/python3.8/site-packages/graphtools/graphs.py:118: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
  warnings.warn(
    Calculating KNN search...
    Calculated KNN search in 0.08 seconds.
    Calculating affinities...
    Calculated affinities in 0.01 seconds.
  Calculated graph and diffusion operator in 0.10 seconds.
  Calculating optimal t...
    Automatically selected t = 3
  Calculated optimal t in 0.02 seconds.
  Calculating diffusion potential...
  Calculating metric MDS...
  Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.14 seconds.

Some diagnostics that might help:

import phate
import s_gd2
print(phate.__version__)
print(s_gd2.__version__)

print(np.all([d == np.dtype('float64') for d in data.dtypes]))
print(data.sum(axis=0).tolist())
print(data.sum(axis=1).tolist())
print(np.all(np.isfinite(data)))

@trberg
Copy link

trberg commented Feb 4, 2021

So here are some results from this code.

print(phate.__version__)         1.0.4
print(s_gd2.__version__)         1.7

print(np.all([d == np.dtype('float64') for d in data.dtypes]))      True
print(np.all(np.isfinite(data)))                                    True
print (data.values.min(), data.values.max())                        0.0     10000000.0

@scottgigante
Copy link
Collaborator Author

First thing I would do is upgrade both of those packages and try again. If you're still having trouble, you could send me just the PHATE kernel which wouldn't contain any identifying information from your original data:

import pickle
import gzip
with gzip.open('kernel.pickle.gz', 'wb') as f: 
    pickle.dump(phate_op.graph.kernel, f)

@trberg
Copy link

trberg commented Feb 4, 2021

So the update didn't fix the issue and when I ran the zipping and pickling code, I got this error.

Traceback (most recent call last):
  File "feature_reduction.py", line 94, in <module>
    get_phate_transform(data)
  File "feature_reduction.py", line 62, in get_phate_transform
    pickle.dump(phate_op.graph.kernel, f)
AttributeError: 'NoneType' object has no attribute 'kernel'

@scottgigante
Copy link
Collaborator Author

Oops, sorry -- you'll need to run phate_op.fit(data) first.

@trberg
Copy link

trberg commented Feb 4, 2021

Here is the kernal.
kernel.pickle.gz

@scottgigante
Copy link
Collaborator Author

I've tested this on python 3.6 on windows subsystem for linux, python 3.7 (anaconda) on windows, and python 3.8 on arch linux. All work fine.

>>> import phate
>>> import pickle
>>> import gzip
>>> with gzip.open("kernel.pickle.gz") as f:
...     K = pickle.load(f)
>>> phate_op = phate.PHATE(knn_dist='precomputed_affinity')
>>> phate_op.fit_transform(K)

Can you check the version of the following packages? (you'll need to run in powershell and double the slashes if on windows.)

python -VV
pip freeze | grep "^\(cycler\|decorator\|Deprecated\|future\|graphtools\|joblib\|kiwisolver\|matplotlib\|numpy\|packaging\|pandas\|phate\|Pillow\|PyGSP\|pyparsing\|python\-dateutil\|pytz\|s\-gd2\|scikit\-learn\|scipy\|scprep\|six\|tasklogger\|threadpoolctl\|wrapt\)=="

My versions, for reference:

On Arch virtualenv:

Python 3.8.5 (default, Sep  5 2020, 10:50:12)
[GCC 10.2.0]

cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.11
future==0.18.2
graphtools==1.5.2
joblib==1.0.0
kiwisolver==1.3.1
matplotlib==3.3.4
numpy==1.20.0
packaging==20.9
pandas==1.2.1
phate==1.0.6
Pillow==8.1.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
s-gd2==1.8
scikit-learn==0.24.1
scipy==1.6.0
scprep==1.0.12
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1

On Arch:

Python 3.8.5 (default, Sep  5 2020, 10:50:12)
[GCC 10.2.0]

cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.2
joblib==0.16.0
kiwisolver==1.2.0
matplotlib==3.3.1
numpy==1.19.4
packaging==20.4
pandas==1.1.2
phate==1.0.4
Pillow==7.2.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
s-gd2==1.7
scikit-learn==0.23.2
scipy==1.5.2
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1

On WSL:

Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0]

cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.2
joblib==0.16.0
kiwisolver==1.2.0
matplotlib==3.3.0
numpy==1.19.4
packaging==20.4
pandas==1.0.5
phate==1.0.4
Pillow==7.2.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
s-gd2==1.8
scikit-learn==0.23.1
scipy==1.5.2
scprep==1.0.10
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1

On Windows:

Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]

cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.1
joblib==0.14.1
kiwisolver==1.1.0
matplotlib==3.2.1
numpy==1.18.1
packaging==20.3
pandas==1.0.3
phate==1.0.4
Pillow==7.0.0
PyGSP==0.5.1
pyparsing==2.4.6
python-dateutil==2.8.1
pytz==2019.3
s-gd2==1.7
scikit-learn==0.22.2.post1
scipy==1.4.1
scprep==1.0.4
six==1.14.0
tasklogger==1.0.0
wrapt==1.12.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants