Skip to content

Commit

Permalink
Merge pull request #219 from Roche/dev
Browse files Browse the repository at this point in the history
version 1.2.1
  • Loading branch information
ofajardo authored Feb 22, 2023
2 parents e666ef8 + 4b8f09d commit 06dbeec
Show file tree
Hide file tree
Showing 38 changed files with 6,976 additions and 5,582 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ authors:
given-names: "Otto"
orcid: "https://orcid.org/0000-0002-3363-9287"
title: "Pyreadstat"
version: 1.2.0
version: 1.2.1
doi: 10.5281/zenodo.6612282
date-released: 2018-09-24
url: "https://github.com/Roche/pyreadstat"
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ the original applications in this regard.**
- [Missing Values](#missing-values)
+ [SPSS](#spss)
+ [SAS and STATA](#sas-and-stata)
- [Reading datetime and date columns](#reading-datetime-and-date-columns)
- [Other options](#other-options)
+ [More writing options](#more-writing-options)
- [File specific options](#file-specific-options)
Expand Down Expand Up @@ -637,6 +638,36 @@ This is a list listing all user defined missing values.
User defined missing values are currently not supported for file types other than sas7bdat,
sas7bcat and dta.

#### Reading datetime and date columns

SAS, SPSS and STATA represent datetime, date and other similar concepts as a numeric column and then applies a
display format on top. Roughly speaking, internally there are two possible representations: one for concepts with a day or lower
granularity (date, week, quarter, year, etc.) and those with a higher granularity than a day (datetime, time, hour, etc).
The first group is suceptible to be converted to a python date object and the second to a python datetime object.

Pyreadstat attempts to read columns with datetime, date and time formats that are convertible
to python datetime, date and time objects automatically. However there are other formats that are not fully convertible to
any of these formats, for example SAS "YEAR" (displaying only the year), "MMYY" (displaying only month and year), etc.
Because there are too many of these formats and these keep changing, it is not possible to implement a rule for each of
those, therefore these columns are not transformed and the user will obtain a numeric column.

In order to cope with this issue, there are two options for each reader function: extra\_datetime\_formats and
extra\_date\_formats that allow the user to
pass these datetime or date formats, to transform the numeric values into datetime or date python objects. Then, the user
can format those columns appropiately; for example extracting the year only to an integer column in the case of 'YEAR' or
formatting it to a string 'YYYY-MM' in the case of 'MMYY'. The choice between datetime or date format depends on the granularity
of the data as explained above.

This arguments are also useful in the case you have a valid datetime, date or time format that is currently not recognized in pyreadstat.
In those cases, feel free to file an issue to ask those to be added to the list, in the meantime you can use these arguments to do
the conversion.

```python
import pyreadstat

df, meta = pyreadstat.read_sas7bdat('/path/to/a/file.sas7bdat', extra_date_formats=["YEAR", "MMYY"])
```

#### Other options

You can set the encoding of the original file manually. The encoding must be a [iconv-compatible encoding](https://gist.github.com/hakre/4188459).
Expand Down
8 changes: 8 additions & 0 deletions change_log.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# 1.2.1 (github, pypi and conda 2023.02.22)
* Readstat source updated to version 1.1.9
* introduced recognition for pandas datatype datetime64[ns, UTC] and other datetime64 types when writing,
so that this column type gets correctly written as datetime
* introduced extra_datetime_formats and extra_date_formats arguments for read functions, cleaned the list of
sas date, datetime and time formats to exclude those not directly convertible to python objects
* improved performace of writer when there are datetime64 columns

# 1.2.0 (github, pypi and conda 2022.10.25)
* Fixed #206, #207
* added pyproject.toml
Expand Down
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: d1957ba96adbb9536e51c852822e9ccb
config: dc63e4405a0437fb9efe8c4f5ffb3848
tags: 645f666f9bcd5a90fca523b33c5a78b7
2 changes: 1 addition & 1 deletion docs/_build/html/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '1.2.0',
VERSION: '1.2.1',
LANGUAGE: 'None',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index &mdash; pyreadstat 1.2.0 documentation</title>
<title>Index &mdash; pyreadstat 1.2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
Expand Down
12 changes: 11 additions & 1 deletion docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Welcome to pyreadstat’s documentation! &mdash; pyreadstat 1.2.0 documentation</title>
<title>Welcome to pyreadstat’s documentation! &mdash; pyreadstat 1.2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
Expand Down Expand Up @@ -154,6 +154,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -252,6 +254,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -335,6 +339,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -384,6 +390,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -422,6 +430,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/py-modindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Python Module Index &mdash; pyreadstat 1.2.0 documentation</title>
<title>Python Module Index &mdash; pyreadstat 1.2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Search &mdash; pyreadstat 1.2.0 documentation</title>
<title>Search &mdash; pyreadstat 1.2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />

Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# The short X.Y version
version = ''
# The full version, including alpha/beta/rc tags
release = '1.2.0'
release = '1.2.1'


# -- General configuration ---------------------------------------------------
Expand Down
Loading

0 comments on commit 06dbeec

Please sign in to comment.