Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use real json #69

Merged
merged 17 commits into from
May 8, 2024
1 change: 1 addition & 0 deletions .covrignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
R/zzz.R
src/register.cpp
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Depends: R (>= 3.4.0)
Suggests:
knitr (>= 1.19),
rmarkdown (>= 1.8),
testthat (>= 3.0.0)
testthat (>= 3.0.0),
utils
URL: https://github.com/MEO265/loggit2
BugReports: https://github.com/MEO265/loggit2/issues
RoxygenNote: 7.3.0
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ export(stop)
export(stopifnot)
export(warning)
export(with_loggit)
useDynLib(loggit2, .registration=TRUE)
8 changes: 7 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
# loggit2 DEV

## Breaking changes
* Custom `sanitizer`s and `unsanitizer`s are no longer supported. This decision was made because no active user is known and this functionality severely limits further development.
* Custom `sanitizer`s and `unsanitizer`s are no longer supported. This decision was made because no active
user is known and this functionality severely limits further development.
If custom `sanitizer`s were previously used, they could simply be executed before or after instead in `loggit()`
or `read_logs()`. If custom sanitizer has been used to get around bugs, please report them so that they can be fixed.
* Special characters are no longer escaped by replacement, but rather by "\".

## New features
* Add `convert_to_csv()` to convert log files to CSV format.
* Add `with_loggit()` to log third-party code or to easily use different `loggit()`-parameters for a chunk of code.
* `NA`s are now stored as `null` in the json log. And `read_logs()` also restores these as `NA`.
This was previously (unintentionally) guaranteed by replacing the `NA` with `"__NA__"`.

## Bugfixes
* `read_logs()` now correctly reads empty character values `""`, as in `{"key": ""}`, as such.
Expand All @@ -17,12 +21,14 @@
## Minor changes
* `read_logs()` now returns a `data.frame` with the empty character columns "timestamp", "log_lvl" and "log_msg"
instead of an empty (0x0) `data.frame` if the log file has no entries.
* The Json reading functions are more tolerant of manual changes to the log.

## Internals
* `write_ndjson` no longer warns if the log contains unsanitized line breaks.
This warning could only be generated by package-internal errors (therefore nonsensical in the cran package)
or by a custom `sanitizer`, but in this case only this one character was specifically tested and thus provides a
false sense of security.
* The package now requires compilation. This is necessary because the JSON parser was written in C++ for faster reading.

# loggit2 2.2.2

Expand Down
75 changes: 19 additions & 56 deletions R/json.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
sanitizer_map <- list(
"{" = "__LEFTBRACE__",
"}" = "__RIGHTBRACE__",
'"' = "__DBLQUOTE__",
"," = "__COMMA__",
"\r" = "__CR__",
"\n" = "__LF__"
)
"\\" = "\\\\",
'"' = '\\\"',
"\r" = "\\r",
"\n" = "\\n")


#' Sanitization for ndJSON.
Expand All @@ -20,13 +17,10 @@ sanitizer_map <- list(
#' The default sanatizer and unsanatizer are based on the following mapping:
#'
#' | Character | Replacement |
#' |:--------- | :---------------------- |
#' | `{` | `__LEFTBRACE__` |
#' | `}` | `__RIGHTBRACE__` |
#' | `"` | `__DBLQUOTE__` |
#' | `,` | `__COMMA__` |
#' | `\r` | `__CR__` |
#' | `\n` | `__LF__` |
#' |:--------- | :-----------|
#' | `"` | `\"` |
#' | `\r` | `\\r` |
#' | `\n` | `\\n` |
#'
#' This type of function is needed because because some characters in a JSON cannot appear unescaped and
#' since `loggit2` reimplements its own very simple string-based JSON parser.
Expand All @@ -47,20 +41,15 @@ default_ndjson_sanitizer <- function(string) {
string <- gsub(pattern = k, replacement = sanitizer_map[[k]], string, fixed = TRUE)
}

# Explicit NAs must be marked so that no new ones are inserted when rotating the log
string[is.na(string)] <- "__NA__"

string
}

#' @rdname sanitizers
default_ndjson_unsanitizer <- function(string) {
for (k in names(sanitizer_map)) {
for (k in rev(names(sanitizer_map))) {
string <- gsub(pattern = sanitizer_map[[k]], replacement = k, string, fixed = TRUE)
}

string[string == "__NA__"] <- NA_character_

string
}

Expand All @@ -74,31 +63,25 @@ default_ndjson_unsanitizer <- function(string) {
#' @param echo Echo the `ndjson` entry to the R console? Defaults to `TRUE`.
#' @param overwrite Overwrite previous log file data? Defaults to `FALSE`, and
#' so will append new log entries to the log file.
#' @param sanitizer Should the log data be sanitized before writing to json?
#'
#' @keywords internal
write_ndjson <- function(log_df, logfile = get_logfile(), echo = TRUE, overwrite = FALSE, sanitize = TRUE) {

if (sanitize) {
for (field in colnames(log_df)) {
log_df[, field] <- default_ndjson_sanitizer(log_df[, field])
}
}
write_ndjson <- function(log_df, logfile = get_logfile(), echo = TRUE, overwrite = FALSE) {

# logdata will be built into a character vector where each element is a valid
# JSON object, constructed from each row of the log data frame.
logdata <- character(nrow(log_df))

field_names <- paste0("\"", colnames(log_df), "\"")
log_df <- as.data.frame(lapply(log_df, function (x) default_ndjson_sanitizer(as.character(x))))

row_names <- paste0("\"", colnames(log_df), "\"")

for (row in seq_len(nrow(log_df))) {

row_data <- as.character(log_df[row,])
na_entries <- is.na(row_data)
row_data <- row_data[!na_entries]
row_names <- field_names[!na_entries]

row_data <- paste0("\"", row_data, "\"")
na_entries <- is.na(row_data)
row_data[!na_entries] <- paste0("\"", row_data[!na_entries], "\"")
row_data[na_entries] <- "null"
row_data <- paste(row_names, row_data, sep = ": ", collapse = ", ")
logdata[row] <- paste0("{", row_data, "}")
}
Expand All @@ -111,7 +94,7 @@ write_ndjson <- function(log_df, logfile = get_logfile(), echo = TRUE, overwrite
#' Read ndJSON-formatted log file
#'
#' @param logfile Log file to read from, and convert to a `data.frame`.
#' @param unsanitize Should the log data be unsanitized after reading from json?
#' @param unsanitize Should the log data be unsanitized?
#'
#' @keywords internal
#'
Expand All @@ -121,28 +104,8 @@ read_ndjson <- function(logfile, unsanitize = TRUE) {
# Read in lines of log data
logdata <- readLines(logfile)

# List first; easier to add to dynamically
log_df <- data.frame()

# Split out the log data into individual pieces, which will include JSON keys AND values
logdata <- substring(logdata, first = 3L, last = nchar(logdata) - 2L)
logdata <- strsplit(logdata, '", "', fixed = TRUE)
log_kvs <- lapply(logdata, FUN = function(x) strsplit(x, '": "', fixed = FALSE))
for (kvs in seq_along(log_kvs)) {
missing_key <- which(lengths(log_kvs[[kvs]]) == 1L)
for (mk in missing_key) {
log_kvs[[kvs]][[mk]] <- c(log_kvs[[kvs]][[mk]], "")
}
}

key_value_split <- function(x) {
x <- unlist(x, use.names = FALSE)
keys <- x[c(TRUE, FALSE)]
values <- x[c(FALSE, TRUE)]
list(keys = keys, values = values)
}
log_kvs <- split_ndjson(logdata)

log_kvs <- lapply(log_kvs, key_value_split)
rowcount <- length(log_kvs)

all_keys <- unique(unlist(lapply(log_kvs, FUN = function(x) x[["keys"]])))
Expand All @@ -158,7 +121,7 @@ read_ndjson <- function(logfile, unsanitize = TRUE) {
}
}

if (unsanitize) log_df <- lapply(log_df, FUN = default_ndjson_unsanitizer)
if(unsanitize) log_df <- lapply(log_df, default_ndjson_unsanitizer)

log_df <- as.data.frame(log_df)

Expand Down
3 changes: 3 additions & 0 deletions R/loggit.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#' @useDynLib loggit2, .registration=TRUE
NULL

#' Log entries to file
#'
#' Log entries to a [ndjson](https://github.com/ndjson) log file, defined by [set_logfile()].
Expand Down
3 changes: 3 additions & 0 deletions R/split_json.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
split_ndjson <- function(x) {
.Call("split_ndjson", x)
}
29 changes: 14 additions & 15 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#' This function returns a `data.frame` containing all the logs in the provided `ndjson` log file.
#'
#' @param logfile Path to log file.
#' @param unsanitize Should the log messages be unsanitized?
#'
#' @return A `data.frame`.
#'
Expand All @@ -15,11 +16,11 @@
#' read_logs()
#'
#' @export
read_logs <- function(logfile = get_logfile()) {
read_logs <- function(logfile = get_logfile(), unsanitize = TRUE) {

base::stopifnot("Log file does not exist" = file.exists(logfile))

log <- read_ndjson(logfile)
log <- read_ndjson(logfile, unsanitize = unsanitize)

if (nrow(log) == 0L) log <- data.frame(timestamp = character(), log_lvl = character(), log_msg = character())

Expand Down Expand Up @@ -56,12 +57,12 @@ rotate_logs <- function(rotate_lines = 100000L, logfile = get_logfile()) {
cat(NULL, file = logfile)
return(invisible(NULL))
}
log_df <- read_ndjson(logfile, unsanitize = FALSE)
if (nrow(log_df) <= rotate_lines) {
log_df <- readLines(logfile)
if (length(log_df) <= rotate_lines) {
return(invisible(NULL))
}
log_df <- log_df[seq.int(from = nrow(log_df) - rotate_lines + 1L, length.out = rotate_lines),]
write_ndjson(log_df, logfile, echo = FALSE, overwrite = TRUE, sanitize = FALSE)
log_df <- log_df[seq.int(from = length(log_df) - rotate_lines + 1L, length.out = rotate_lines)]
write(log_df, logfile, append = FALSE)
}

#' Find the Call of a Parent Function in the Call Hierarchy
Expand All @@ -88,22 +89,20 @@ find_call <- function() {
#'
#' @param file Path to write csv file to
#' @param logfile Path to log file to read from
#' @param remove_message_lf Should the line breaks at the end of messages be removed?
#' @param unsanitize Should the line breaks at the end of messages be not escaped?
#' @param ... Additional arguments to pass to `utils::write.table()`
#'
#' @return Invisible `NULL`.
#'
#' @export
convert_to_csv <- function(file, logfile = get_logfile(), remove_message_lf = TRUE, ...) {
log <- read_logs(logfile = logfile)

if (remove_message_lf) {
msg_flag <- log$log_lvl == "INFO"
msg <- log$log_msg[msg_flag]
log$log_msg[msg_flag] <- gsub("\n$", "", msg)
convert_to_csv <- function(file, logfile = get_logfile(), unsanitize = FALSE, ...) {
if (!requireNamespace(package = "utils", quietly = TRUE)) {
stop("Package 'utils' is not available. Please install it, if you want to use this function.") # nocov
}

write.table(log, file = file, ...)
log <- read_logs(logfile = logfile, unsanitize = unsanitize)

utils::write.table(log, file = file, ...)

return(invisible(NULL))
}
4 changes: 2 additions & 2 deletions man/convert_to_csv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/message.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/read_logs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_ndjson.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 3 additions & 6 deletions man/sanitizers.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/warning.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 1 addition & 9 deletions man/write_ndjson.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.o
*.so
*.dll
7 changes: 7 additions & 0 deletions src/loggit.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// meineFunktionen.h
#ifndef LOGGIT
#define LOGGIT

extern "C" SEXP split_ndjson(SEXP strVecSEXP);

#endif
Loading
Loading