-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode unicode characters in UTF8 #1
Comments
I've tried to replicate the issue above by forking the code and running Analysis_06_Extract-Journals.R; on a system with WSL/Ubuntu it produced an alljournals-.csv file with correct encodings, i.e.: |
Thanks, @jeroenbaas, for taking such a deep look. Now that I repeated the same procedure of scraping SAGE journals via I believe that I originally used a different computer for scraping the journals. Possibly the encoding settings differed there. Anyway, in 6937419, I fixed some of the gravest encoding issues (merely ex post). However, Chinese characters as well as Korean and Russian letters still need to be fixed. But again, thank you, @jeroenbaas, for pointing out this issue. I really should take a better look at all the encoding-related aspects. |
No worries. I will when I get a chance also have a closer look on how the scopus title list is used. From what I could quickly tell it only serves as a count of journals per publisher. There's much more value in the source list potentially, and linking it to your final dataset (e.g. by carying the Scopus Source ID) may unlock a lot of analytical power further down stream. |
Some titles have ascii converted special character encodings, such as:
Otolaryngology<U+0096>Head and Neck Surgery
It would be helpful if these could be encoded in UTF-8.
The text was updated successfully, but these errors were encountered: