Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be able to use minkowski metric with p < 1 #107

Open
ivan-marroquin opened this issue Jun 10, 2021 · 2 comments
Open

Be able to use minkowski metric with p < 1 #107

ivan-marroquin opened this issue Jun 10, 2021 · 2 comments

Comments

@ivan-marroquin
Copy link

Describe the bug
It seems that PHATE supports minkowski metric for both mds and knn computations. So, I would like to use this metric with p= 0.3 for running experiments. The code does not recognize the use of 'p= 0.3' when calling phate.PHATE

Thanks for your help,

Ivan

To Reproduce
embedding= phate.PHATE(n_components= intrinsic_dim, knn= 5, decay= None, n_landmark= 2000, t= 'auto',
gamma= 1.0, n_pca= input_data.shape[1], mds_solver= 'smacof',
knn_dist= 'minkowski', mds_dist= 'minkowski', mds= 'classic', random_state= 1969,
n_jobs= cpu_count, verbose= False, p= 0.3)

Expected behavior
The initialization of phate object should take 'p= 0.3' as part of the parameters to initialize phate object

Actual behavior
Traceback (most recent call last):
File "test_phenograph_clustering.py", line 94, in
projected_data= embedding.fit_transform(X= input_data)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\phate\phate.py", line 961, in fit_transform
self.fit(X)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\phate\phate.py", line 853, in fit
**(self.kwargs)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\api.py", line 288, in Graph
return Graph(**params)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\graphs.py", line 132, in init
super().init(data, n_pca=n_pca, **kwargs)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\graphs.py", line 524, in init
super().init(data, **kwargs)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\base.py", line 1019, in init
super().init(data, **kwargs)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\base.py", line 135, in init
super().init(**kwargs)
File "C:\Temp\Python\Python3.6.5\lib\site-packages\graphtools\base.py", line 505, in init
super().init(**kwargs)
TypeError: init() got an unexpected keyword argument 'p'

System information:

Output of phate.__version__:

Please run phate.__version__ and paste the results here.

You can do this with `python -c 'import phate; print(phate.__version__)'`
phate-1.0.7

Output of pd.show_versions():

Please run pd.show_versions() and paste the results here.

You can do this with `python -c 'import pandas as pd; pd.show_versions()'`
INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 0.25.0
numpy            : 1.19.5
pytz             : 2018.5
dateutil         : 2.7.3
pip              : 9.0.3
setuptools       : 41.0.1
Cython           : 0.29.14
pytest           : 6.0.1
hypothesis       : None
sphinx           : 2.3.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 0.9999999
pymysql          : None
psycopg2         : None
jinja2           : 2.11.0
IPython          : 7.11.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.2
numexpr          : 2.7.3
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : 3.6.1
xarray           : None
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None

Additional context
Python 3.6.5 with Deprecated-1.2.12 graphtools-1.5.2 phate-1.0.7 pygsp-0.5.1 s-gd2-1.8 scprep-1.1.0 tasklogger-1.1.0

@scottgigante
Copy link
Collaborator

Hi @ivan-marroquin ,

PHATE doesn't currently offer this functionality, though the new maintainers may choose to add it. In the meantime, you should be able to achieve you desired outcome with a custom metric function as follows:

import phate
from functools import partial
from scipy.spatial.distance import minkowski

dist_fn = partial(minkowski, p=0.3)
phate_op = phate.PHATE(knn_dist=dist_fn, mds_dist=dist_fn)

I'm leaving this issue open as a feature request but please let me know if the proposed alternative doesn't work and we can open up a bug report separately.

@ivan-marroquin
Copy link
Author

Hi @scottgigante ,

Many thanks for the tip! Following your advice, I decided to use numba to define a minkowski metric. Here is the code:

@numba.njit(fastmath= True)
def fractional_dist(p_vec, q_vec, fraction= 0.1):
result= 0.0

    for isamp in range(0, p_vec.shape[0]):
        if (p_vec[isamp] > q_vec[isamp]):
            result += (p_vec[isamp] - q_vec[isamp]) ** fraction
            
        else:
            result += (q_vec[isamp] - p_vec[isamp]) ** fraction
    
    dist= result ** (1 / fraction)
    
    return dist

Then, I will compare the result using your proposed approach.

Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants