Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support R factor data type #18

Open
BoPeng opened this issue Jul 6, 2019 · 7 comments
Open

Support R factor data type #18

BoPeng opened this issue Jul 6, 2019 · 7 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Jul 6, 2019

http://pandas-docs.github.io/pandas-docs-travis/user_guide/categorical.html

Pandas now has categorical data type.

@rezaeir
Copy link
Contributor

rezaeir commented Jun 14, 2020

I just used R built-in InsectSprays dataset which has a factor column and used %get to import it into SoS and it imported the dataframe without problem and the factor column is interpreted as category in pandas. Moreover, I deleted this dataset from my R variables and imported it from SoS and it recognized the column as factor. It doesn't seem to have any problem. What should be fixed here?

@BoPeng
Copy link
Contributor Author

BoPeng commented Jun 14, 2020

Your tests work because dataframes are passed through feather/arrow. You should notice problem if you create a categorical variable and pass it directly.

@rezaeir
Copy link
Contributor

rezaeir commented Jun 14, 2020

I just checked that!! it doesn't import the variable and just assigns the variable name to 'Untransferrable variable' string. It seems like there is a specific function handling this, because I'm not familiar with SoS backend code, could you please give me a link to that specific function or should I find it myself?

@BoPeng
Copy link
Contributor Author

BoPeng commented Jun 14, 2020

You need to first read https://vatlab.github.io/sos-docs/doc/user_guide/language_module.html and then dive into the source code of sos-r. I think support for factor on both ends are missing.

@rezaeir
Copy link
Contributor

rezaeir commented Jun 14, 2020

usually, R's vector should translate to python's bulit-in list. However, as much as I'm aware python lists are not able to handle categorical data (correct me if I'm wrong and I search for a way to transfer data to list). My current idea is to translate R's factor vector to a categorical pandas series and if someone writes a categorical pandas series, I translate it into a factor R vector. What do you think?

@BoPeng
Copy link
Contributor Author

BoPeng commented Jun 14, 2020

I agree, R's category type matches perfectly to Panda's category series http://pandas-docs.github.io/pandas-docs-travis/user_guide/categorical.html , and vise versa.

rezaeir added a commit to rezaeir/sos-r that referenced this issue Jun 15, 2020
issue: Support R factor data type vatlab#18 
- Pandas Categorical series will be coerced into an R factor vector depending on its features as if it is named or unnamed, ordered or unordered. 
- R factor vector will be coerced into a Pandas categorical series depending on its features as if it is named or unnamed, ordered or unordered
@rezaeir
Copy link
Contributor

rezaeir commented Jun 15, 2020

I tried to test all the possible factors or pandas categories in my jupyterlab but I think there should be some predetermined tests for it!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants