Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor and patch)
- New additions without breaking backward compatibility bumps the minor (and resets the patch)
- Bug fixes and misc. changes bumps the patch
CHANGES IN qdap VERSION 2.2.5-
BUG FIXES
-
check_spelling
and other spell checkers threw an error with a custom dictionary that did not have at least one word beginning with all 26 letters of the alphabet. The dictionary automatically usesassume.first.correct=FALSE
if this occurs. Reported by @CallumH of StackOverflow: http://stackoverflow.com/q/33516466/1000343 See issue #217 for details. -
check_spelling_interactive
replace substrings rather then bounded words. This was caught by @chrisjacques See issue #221 -
replace_abbreviation
threw an error becausedata.frame
converts character to factor by default andnchar
no longer works on factor. This was caught by @karilint See issue #225
NEW FEATURES
MINOR FEATURES
IMPROVEMENTS
CHANGES
CHANGES IN qdap VERSION 2.2.4
NEW FEATURES
add_s
added to add -s, -es, or -ies to word endings.
MINOR FEATURES
IMPROVEMENTS
-
common
now returnsNULL
invisibly with a message rather than an error if no groups meet the parmeters. Suggested by @bitanshu via issue #213 -
word_cor
's defualtgroup.var
is no longerNULL
but set to use1:nrow
viaqdapTools::id(text.var)
. Thanks to Drew Schmidt for bringing this issue to attention. Documentation and an error forgroup.var = NULL
has been updated to add clarity.
CHANGES
CHANGES IN qdap VERSION 2.2.2
BUG FIXES
type_token_ratio
was misnamed astype_text_ratio
, this has been corrected. The plot for this class also contained a misspelling "type-toke ratio" which has been corrected as well.
NEW FEATURES
inspect_text
added to allow for pretty printed viewing of text strings and tmCorpus
es.
CHANGES
- The following functions had been previously deprecated and now have been
removed:
df2tm_corpus
,tm2qdap
,tm_corpus2wfm
,tm_corpus2df
,tdm
,dtm
, andpolarity_frame
.
CHANGES IN qdap VERSION 2.2.1
BUG FIXES
-
The internal vignette "An Introduction to qdap" produced errors when compiled by
build_qdap_vignete
. This behavior has been fixed by using static reporting. The root of the behavior is the ability ofcm_
functions to grab data from the global environment, which may not be the case in aknitr
/rmarkdown
generated environment. -
polarity
no longer handled phrases (words + spaces) forpolarity.frame
. This behavior was caught by @Benasso http://stackoverflow.com/q/27156834/1000343. This bug is a result of the changes made tobag_o_words
earlier this year. The bug has been fixed and a unit test put in place to ensure the bug is not reintroduced. -
Network.formality
did not include edge width handling. This has been corrected. -
word_stats
gave an incorrect warning message for missing endmarks: "Some sentences not have standard qdap punctuation endmarks." The "do" has been added: "Some sentences do not have standard qdap punctuation endmarks." -
pres_debates2012
data set contained missplits in lines: 544, 1054. These have been corrected (GitHub issue #205). -
pos
threw an error if only one word was passed totext.var
. Fix:drop = FALSE
has been added to data frame indexing. Caught by StackOverflow user G_1991 http://stackoverflow.com/q/29896488/1000343. -
as.tdm.wfm
would error if no grouping variable was supplied. This behavior has been corrected.
NEW FEATURES
-
word_length
function added to give counts of word length usage by grouping variable. See?word_length
for details` -
word_position
function added to give counts of the position of words within a sentence. -
sent_detect_nlp
added in thesentSplit
family to wrap NLP package functionality into a convenient function. -
lexical_classification
provides a means of assessing content vs. functional word usage at the grouping variable and sentence level. The class comes with generic methods forpreprocessed
,scores
(and plots of these methods),Animated
,Network
,cumulative
andAnimate.cumulative
. -
Animate.character
added as a generic method that allows for the animation of text. This is useful in conjunction with other \code{Animate} objects to create complex animations with accompanying text. -
add_incomplete
added to replace sentences with missing endmarks with a|
to indicate an incomplete sentence. -
type_toke_ratio
added to determine type-token ratio per grouping variable.
IMPROVEMENTS
-
polarity
takespolarity.frame
with phrases (words with spaces). -
The
Animate
method for the classes:polarity
&formality
gains the ability to print corresponding animated text for combined use with otherAnimated
methods. -
multigsub
/mgsub
get a speed boost through better programming choices. See issue #201 for details. Thank you to @Alexey Ferapontov for his critical post http://stackoverflow.com/q/27367914/1000343 that inspired the changes. -
formality
andpos
now have minimal unit tests. -
trans_context
usedmessage
to print to the console. This results in truncated output.message
has been replaced withcat
. -
strip
gets a speed boost (~10x) by using better regex algorithms, consolidating code/function calls, and by creating a genericstrip
method for different classes. Additionally, mutiple white spaces are now condensed to a single white space. -
scrubber
would automatically take a space and a single last character and remove the space. This was to remove spaces before ending punctuation.scrubber
usedsubstring
rather than a more controlled regular expression. This has been corrected. Report thanks to @Fabrizio Maccallini. See issue #207 for more information. -
pres_debates2012
picks up arole
column to make fitering out the candidates easier. The variable order has also changed to put thedialogue
last.
CHANGES
-
The ggplot2 package is no longer in Depends. This means the user will have to manually load the package to use additional ggplot2 features. See GitHub issue #199 for more.
-
pos
now treats contractions words as 2 words. For example the word count on what's is 2 for what + is. The previous behavior was to strip out the apostrophes. This was undesirable as the sentence "She's cool" would have no verb in thepos
output. This change affectspos_by
andformality
as well.
CHANGES IN qdap VERSION 2.2.0
BUG FIXES
-
bag_o_words
did not make use of thebag_o_words2
helper function that has finer grained control of the output....
were ignored but now are respected. -
fry
threw an error if a group contained < 300 words but had enough text to generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine. The bug has been fixed as these groups are dropped and a warning given. -
phrase_net
threw an error caused by dplyr's (0.3) approach to subsetting columns. Previously a vector was returned, now atbl_df
object is returned: tidyverse/dplyr#587. This was addressed by using explicitdf[[index]]
rather thandf[, index]
.
NEW FEATURES
chunker
added to break text, optionally by grouping variables, into equal chunks. The chunk size can be specified by giving number of words to be in each chunk or the number of chunks.
IMPROVEMENTS
all_words
gains char.keep
and char2space
arguments to enable retention
of characters and multi word phrases. These features are passed to
freq_terms
as well. Suggested by stackoverflow's lawyeR
(http://stackoverflow.com/a/26162401/1000343).
CHANGES
-
rm_url
has been moved into its own canned regex pattern extraction/replacer package namedqdapRegex
. -
name2sex
now uses the gender package to predict sex. This makes the function slightly slower but much more accurate than previous versions.
Because of this increased accuracy and dependence ongender
, the argumentspred.sex
,fuzzy.match
, anddatabase
are no longer necessary and have been removed.
CHANGES IN qdap VERSION 2.1.1
BUG FIXES
-
syllable_count
returned the sentence (recycled) in thewords
column of the output. This behavior has been fixed. See GitHub issue #188 for details. -
syn
returned antonyms for some words. This was caused by the dictionary:qdapDictionaries::key.syn
contained antonyms and elements the were error messages (character). This has been fixed. Reference issue #190. (Jingjing Zou) -
The
pres_debates2012
data set contained three errors in speech attribution. This has been corrected and the turn of talk (tot
) as well. -
word_stats
would throw an error if no poly-syllable words existed. This has been corrected (reported by Nicolas Turenne).
NEW FEATURES
-
qdap_df
and%&%
added to mimic some of the functionality ofdplyr
'stbl_df
and chaining pipe in a more specific, less flexible,qdap
oriented way. -
Text
added to view and change thetext.var
attribute of adata.frame of the class
qdap_df`. -
cumulative
generic method added to view cumulative scores over time. -
formality
picks up acumulative
method. -
polarity
picks up acumulative
method. -
end_mark
picks up aclass
(end_mark
),plot
method, and acumulative
method. -
syllable_sum
,polysyllable_sum
, andcombo_syllable_sum
pick up aclass
,plot
method, and acumulative
method. -
wfm
becomes a generic method currently applied to atext.var
that is:character
,factor
(coerced tocharacter
), orwfdf
. -
unbag
added as a compliment tobag_o_words
and friends for undoing string splitting. A convenience wrapper forpaste(collapse = " ")
. -
as.Corpus.TermDocumentMatrix
,as.Corpus.DocumentTermMatrix
, andas.Corpus.wfm
added to convert a matrix format to atm::Corpus
. -
exclude
becomes a generic method for various classes. Functionality is the same but with improved code readability. -
check_spelling_interactive
,check_spelling
,which_misspelled
, andcorrect
allow the user to identify potentially misspelled words and optionally suggest replacements. -
random_data
&random_sent
added to generate random sentence data sets and vectors. -
comma_spacer
added to ensure strings with commas contain a space after them. -
check_text
added to identify potential problems in text. -
replace_ordinal
added to convert ordinal representations of 1 through 100 to strictly ordinal text (e.g., "1st" becomes "first"). -
A vignette:
Cleaning Text & Debugging
was added to assist users with cleaning and debugging problems inqdap
. -
pronoun_type
, andsubject_pronoun_type
,object_pronoun_type
added to examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
dplyr
's chaining pipe imported for convenience. See http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
-
wfm
gains a speed-up through generic classes andtm
package integration (strip
is no longer used inwfm
). -
as.tdm.character
andas.dtm.character
gain a speed boost with atm
package integration. -
Added message to
as.data.frame.Corpus
for missing end-marks suggesting the use of:sent.split = FALSE
. -
as.Corpus
family of functions didn't necessarily respect document names and sometimes used numeric sequence instead. The introduction of a reader viatm::readTabular
has fixed this. -
sentSplit
now gives warnings for text that may contain anomalies such as: non-ASCII characters, factors, missing punctuation, empty cells, and no alphabetic characters found. -
read.transcript
now gives a warning when reading from a .docx file and the separator (sep
) used is still found in the text as this may indicate the data did not split correctly. -
dispersion_plot
now takes a named list of vectors of terms as the argument tomatch.terms
. The vectors are combined as a unified theme named with the names of the list supplied tomatch.terms
.
CHANGES
-
as.data.frame.Corpus
's default value forsent.split
is nowFALSE
. -
The
state
column in theqdap::DATA2
data-set is now character (previously factor).
CHANGES IN qdap VERSION 2.1.0
BUG FIXES
-
new_project
did not copy the .Rprofile over into the new project. This has been fixed. Reference issue #184. -
sentiment_frame
coerced words to factor.stringsAsFactors = FALSE
has been added to prevent this. -
polarity
did not work on > 1 grams due to a bug insentiment_frame
converting character to factor (thanks for the find @chewth). See GitHub issue #185 for details.
NEW FEATURES
-
unique_by
added to allow the user to find terms unique to individual elements of a grouping variable. -
build_qdap_vignette
replaces the temporary place holder version of the Introduction to qdap vignette. This function will replace the (1) HTML, (2) source, & (3) R code found inbrowseVignettes(package = 'qdap')
.
MINOR FEATURES
-
sub_holder
picks up aalpha.type
argument that allows the user to specify whether alpha or numeric keys should be used. -
replace_number
picks up aremove
argument that removes numbers from text.
IMPROVEMENTS
-
qheat
becomes a generic method. This means some of the internal function class checking has been moved to individual methods for those classes.
Additionally,qheat
now works with logical matrices/data.frames. -
The
tm
package compatibility functions have been renamed in a more R-ish way and take the form of generic methods for specific classes. For example,df2tm_corpus
becomesas.Corpus
. Here is a complete list of changes:df2tm_courpus
is nowas.Corpus
tm_corpus2df
is nowas.data.frame
as.wfm
is now a generic methodtm_corpus2wfm
is nowas.wfm
tm2qdap
is nowas.wfm
tdm
is nowas.tdm
oras.TermDocumentMatrix
dtm
is nowas.dtm
oras.DocumentTermMatrix
CHANGES
-
colsplit2df
andcolpaste2df
no longer convert character columns to factor. -
df2tm_corpus
is deprecated. It will be removed in a subsequent version ofqdap
. Useas.Corpus
instead. -
tm_corpus2df
is deprecated. It will be removed in a subsequent version ofqdap
. Useas.data.frame
instead. -
tm2qdap
is deprecated. It will be removed in a subsequent version ofqdap
. Useas.wfm
instead. -
tm_corpus2wfm
is deprecated. It will be removed in a subsequent version ofqdap
. Useas.wfm
instead. -
tdm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.tdm
oras.TermDocumentMatrix
instead. -
dtm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.dtm
oras.DocumentTermMatrix
instead. -
The Introduction to qdap .Rmd vignette has been moved to an internal directory. The HTML version is not built by default. This saves CRAN space and time checking the package source. The file has been replaced with a temporary place holder that contains instructions for building the actual vignette. The user may also use the
build_qdap_vignette
directly. -
qdap
incorporates the changes from thetm
package version: 0.6: http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.
CHANGES IN qdap VERSION 2.0.0
The qdapTools
package now houses several former qdap functions. While
qdapTools
is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatibility if these functions
are included in code. For this reason this release is a major bump of qdap.
BUG FIXES
replace_number
did not replace single digits numbers. Spotted by Ben Bolker. This behavior has been fixed and unit testing added for this function. See issue #178.
NEW FEATURES
-
sub_holder
added; this function holds the place for particular character values, allowing the user to manipulate the vector and then revert the place holders back to the original values. -
Network
method added to make network plots of select qdap objects. -
qtheme
,theme_nightheat
,theme_duskheat
, theme_norah,
theme_cafe,
theme_grayscale,
theme_badkitchen, and
theme_hipsteradded to style
Network` plots. -
polarity
picks up aNetwork
method. -
formality
picks up aNetwork
method. -
qdap officially begins utilizing the
testthat
package for unit testing, though only a few functions have begun the process, more will be added over time.
MINOR FEATURES
IMPROVEMENTS
CHANGES
-
The
qdapTools
package now houses the following formerqdap
functions:hash
,%ha%
,hash_look
,hms2sec
,id
,lookup
,%l%
,%l+%
,%l*%
,repo2github
,sec2hms
,text2color
,url_dl
,v_outer
,list2df
,matrix2df
,vect2df
,list_df2df
,list_vect2df
,counts2list
,
vect2list
, &mtabulate
. These functions will continue to be available to qdap users in interactive mode (qdapTools
is a Dependency and thus these functions are loaded into the workspace by default). This will allow this bundle of functions to be used outside of qdap without calling the larger qdap package per the request of Kirill Muller (see issue #165). -
As scheduled the
dissimilarity
function has been removed from the qdap package to avoid conflict with thetm
package. UseDissimilarity
function instead.
CHANGES IN qdap VERSION 1.3.6
MINOR FEATURES
polarity
picks up aconstrain
argument that constrains the polarity values to be between -1 and 1.
IMPROVEMENTS
-
polarity
's equation now uses primes on the de-amplifiers before they're confined to be >= -1. This avoids confusion in the indicator function that took the de-amplifiers variable and returned the same variable. -
dist_tab
's frequency columns used a capital F in Freq. This was not consistent across all column names and has been changed to lower case.
CHANGES
polarity_frame
is deprecated and will be removed in a subsequent release. Please usesentiment_frame
instead.
CHANGES IN qdap VERSION 1.3.5
BUG FIXES
-
The An Introduction to qdap vignette contained a broken link in the tm Package Compatibility section. This has been fixed. Also the reliance on
Rgraphviz
from the vignette has been removed. This will eliminate CRAN WARN in CRAN checks (for some OS) but not the note fortm
's reliance onRgraphviz
. -
polarity
reported the incorrect number of words for sentences containing commas. This has been fixed (Max Ghenis).
NEW FEATURES
-
formality
picks up anAnimate
method. -
end_mark_by
function added as a aggregated grouping version ofend_mark
.
MINOR FEATURES
raj.act.1POS
added.raj.act.1POS
is a data set for Romeo and Juliet: Act 1 broken into parts of speech.
IMPROVEMENTS
discourse_map
picks up apause
argument that enables the user to pause between plots in interactive mode.
CHANGES
CHANGES IN qdap VERSION 1.3.4
BUG FIXES
NEW FEATURES
-
gantt
andgantt_wrap
(single facet) pick up andAnimate
method. -
polarity
picks up anAnimate
method. -
vertex_apply
andedge
apply added to make uniform changes to lists ofigraph
objects.
MINOR FEATURES
IMPROVEMENTS
-
discourse_map
picks up acondense
argument that allows the user to condense sequential rows for like grouping variable sub groups. -
list_df2df
names now use a zero padded numeric portion for default names.
For examplec("L1", "L2", "L3", ... "L10")
, becomesc("L01", "L02", "L03", ... "L10")
.
CHANGES
CHANGES IN qdap VERSION 1.3.3
BUG FIXES
-
colpaste2df
dropped the column name for a single retained column whenkeep.orig = FALSE
. See GitHub issue #157 for more. -
multigsub
(mgsub
) would returnNA
for replacement of length 1 after the addition of theorder.pattern
(used to prevent substrings from replacing meta-strings) in version 1.3.2.
NEW FEATURES
-
phrase_net
function provides functioning similar to the Many Eyes Phrase Net plot. -
discourse_map
function provides a network mapping of the flow of discourse between social actors. Function output isAnimate
ready as well. See?discourse_map
and http://trinker.github.io/qdap_examples/animation_dialogue for more. -
Animate
function added to convert select qdap outputs to an animated
sequence. See?Animate.discourse_map
for more.
MINOR FEATURES
-
synonyms_frame
(syn_frame
) added to allow the user to create a synonym hash for the revampedsynonyms
function. -
repo2github
function added to send a directory to GitHub upon first commit.
IMPROVEMENTS
-
new_project
has an improved directory structure and works with any version of thereports
package. -
synonyms
function used theenv.syl
hash data from qdapDictionaries internally. This approach could cause problems if used within other functions in a package. It also limits the usability of synonyms. Thesynonyms
function picks up asynonym.frame
argument that allows the user to specify a synonym hash table. This can be created via thesynonyms_frame
function (per a request from J. Aravind).
CHANGES
CHANGES IN qdap VERSION 1.3.2
This is a patch release to address the archiving of the lsa
package.
BUG FIXES
- The qdap-tm Package Compatibility Vignette contained an error in the Feinerer I, Hornik K, Meyer D (2008) reference (pages listed as 51-54 has been corrected to pages 1-54 as well as incorrect journal). Caught by Kurt Hornik.
MINOR FEATURES
DocumentTermMatrix
andTermDocumentMatrix
from the tm package pick up aFilter
method.
IMPROVEMENTS
-
multigsub
picks up an argument,order.pattern
, to prevent substrings from replacing meta-strings. -
The following data sets were added to qdapDictionaries package:
Fry_1000
,Leveled_Dolch
,Dolch
CHANGES
- The package
lsa
has been removed from Suggests field in the DESCRIPTIONN file, examples, and vignettes.
CHANGES IN qdap VERSION 1.3.1
A version bump necessary for Re-Submission to CRAN.
CHANGES
new_project
was reconfigured with the old code that does not require the newest version of the reports package.
CHANGES IN qdap VERSION 1.3.0
BUG FIXES
-
read.transcript
could leave a QDAP_PLACE_HOLDER behind if a colon was found in the person column. This behavior has been fixed. -
word_cor
's plotting method threw an error if a word did not have any words above the r threshold. This behavior has been corrected. -
Filter
overwrote a base R function; this has been fixed per Joshua Ulrich. -
scores.polarity
's print method would return an error if columns were not indexed yet were rounded. For instance, the following threw an error:scores(with(sentSplit(DATA, 4), polarity(state, person)))[, 1:4]
This behavior has been fixed.
NEW FEATURES
-
qdap adds an HTML vignette to better explain the intended work flow and function use for the package. Use
browseVignettes(package = "qdap")
to open. -
qdap adds a PDF vignette to describe the compatibility and navigation between qdap and the
tm
packages. UsebrowseVignettes(package = "qdap")
to open.
MINOR FEATURES
IMPROVEMENTS
-
apply_as_df
picks up astopwords
andfilter
arguments that allows the user to remove stopwords and min/max length words. -
plot.word_cor
picks up the argumentncol
that allows the user to specify the number of columns used. This usesggplot2
'sfacet_wrap
rather thanfacet_grid
(which is the default ifncol =NULL
). -
name2sex
relied upon having qdapDictionaries loaded. This could be an issue if the function were used internally. The user now supplies a dictionary of names and probabilities. -
df2tm_corpus
gains ademographics.vars
argument that allows the user to add demographic information to the resulting corpusDMetaDat
. -
tm_corpus2df
gains the ability to convertDMetaDat
into demographic data.frame columns.
CHANGES
CHANGES IN qdap VERSION 1.2.0
BUG FIXES
NEW FEATURES
-
Filter
added to give the ability to provide a range of character lengths to filter from awfm
object. -
scores
generic method added to view scores from select qdap objects. -
counts
generic method added to view counts from select qdap objects. -
proportions
generic method added to view proportions from select qdap objects. -
preprocessed
generic method added to view preprocessed data from select qdap objects. -
apply_as_df
added to allow the user to apply qdap functions to a Corpus directly.
MINOR FEATURES
-
tm_corpus2wfm
added to quickly convert from a tm packageCorpus
to a qdapwfm
object. -
as.wfm
added as a means to attempt to coerce a matrix to awfm
object. -
%l+%
added as a counterpart to%l%
that assumesmissing = NULL
. -
%bs%
added as quick counterpart toboolean_search
for indexing.
IMPROVEMENTS
-
df2tm_corpus
now sets metaData information for ID and creator (based on)Sys.info()["user"]
. -
matrix2df
now accepts a simple_triplet_matrix object as well. -
word_cor
output that was a list (not a correlation matrix) did not have a plot method. The plot method forword_cor
now handles both matrices and the list of correlations. -
rm_row
picks up thecontains
argument that allows the user to search for, and remove rows of, within the string, not just the beginning. -
read.transcript
now handles multiple character spaces as an argument tosep
whentext
argument is used.
CHANGES
dissimilarity
has been renamed toDissimilarity
to prevent tm package conflicts. The old version has been deprecated and will be removed in a the next version (minor or major) push to CRAN.
CHANGES IN qdap VERSION 1.1.0
A version bump necessary for Re-Submission to CRAN.
CHANGES
- Downgraded the version requirement for the reports package to reports (>= 0.1.2) in order to upload to CRAN. reports (>= 0.2.0) is not yet available on CRAN.
CHANGES IN qdap VERSION 1.0.0
The word lists and dictionaries in qdap
have been moved to qdapDictionaries
.
Additionally, many functions have been renamed with underscores instead of the
former period separators. These changes break backward compatibility. Thus
this is a major release (ver. 1.0.0).
It is the general practice to deprecate functions within a package before removal, however, the number of necessary changes in light of qdap being relatively new to CRAN, made these changes sensible at this point.
BUG FIXES
-
qheat
's argumentby.column = FALSE
resulted in an error. This behavior has been fixed. -
question_type
did not work because of changes tolookup
that did not accept a two column matrix forkey.match
. See GitHub issue #127 for more. -
combo_syllable.sum
threw an error if thetext.var
contained a cell with an all non-character ([a-z]) string. This behavior has been fixed. -
todo
function created bynew_project
would not report completed tasks ifreport.completed = TRUE
. -
termco
andtermco.d
threw an error if more than one consecutive regex special character was passed tomatch.list
ormatch.string
. See GitHub issue #128 for more. -
trans.cloud
threw an error if a single list with a named vector was passed totarget.words
. This behavior has been fixed. -
sentSplit
now returns the "tot" column whentext.place = "original"
. -
all_words
output dataframe FREQ column class has been changed from factor to numeric. Additionally, the WORDS column prints usingleft.just
but retains traditional character properties (print class added).all_words
also picks upapostrophe.remove
andldots
(forstrip
) arguments. -
gantt_plot
did not handlefill.vars
, particularly if the fill was nested within thegrouping.vars
. This behavior has been fixed with corresponding examples added. -
url_dl
- Downloaded an empty file when not using a Dropbox key. This behavior has been fixed. -
The
cm_code.
family of functions had a bug in the output due tocm_long2dummy
andcm_dummy2long
's handling of stretching spans. This has been corrected. -
cm_code.exclude
did not output the correct excluded spans. This behavior has been corrected. -
The use of
comment
to convey object characteristics has been replaced with the use ofclass
. -
question_type
did not include question words ending in 'd as part of the category. For instance "How'd you like it?" was not classified as a how question. -
beg2char
would not include thechar
ifinclude = TRUE
andnoc = 1
. -
cm_range2long
returnedNA
s for vectors containing multiple single values.
See GitHub issue #144 for more. -
termco
family of functions did not handleNA
values. This has been fixed. (Matt Williamson) See GitHub issue #147 for details. -
pos
threw an error for vectors of length 1. This has been fixed (Kurt Hornik). See GitHub issue #150 for details. -
formality
threw an error for vectors of length 1. This has been fixed. (Kurt Hornik) See GitHub issue #151 for details.
NEW FEATURES
-
The
cm_xxx2long
family of functions (cm_df2long
,cm_range2long
andcm_time2long
) now have a generic wrapper,cm_2long
, to generate the long formats. -
hash_look
(and%ha%
) a counterpart tohash
added to allow quick access to a hash table. Intended for use within functions or multiple uses of the same hash table, whereaslookup
is intended for a single external (non function) use which is more convenient though could be slower. -
boolean_search
, a Boolean term search function, added to allow for indexed searches of Boolean terms. -
trans_context
is a printing function desired to grab the context (n rows before and after) an event (an index from a vector of indices). The function prints the indices around the episode from a transcript to the console or a .csv, .xlsx, .txt, or .doc file. -
colpaste2df
is a wrapper forpaste2
that pastes dataframe columns together and outputs a dataframe. -
colcomb2class
quickly combines columns for number of qdap classes including output from:termco
,question_type
,pos_by
, andcharacter_table
. -
lview
a function to unclass a list output that has a special print method that returns only a portion of the output.lview
re-classes to "list". -
word_cor
added to find words within grouping variables that are associated based on correlation. -
tm2qdap
a function to convert"TermDocumentMatrix"
and"DocumentTermMatrix"
to awfm
added to allow easier integration with thetm
package. -
apply_as_tm
a function to allow functions intended to be used on thetm
package'sTermDocumentMatrix
to be applied to awfm
object. -
tm_corpus2df
anddf2tm_corpus
added to convert a tm package corpus to a dataframe for use in qdap or vice versa. -
tdm
anddtm
are now truly compatible with thetm
package.tdm
anddtm
produce outputs of the class"TermDocumentMatrix"
and"DocumentTermMatrix"
respectively. This change (coupled with the renaming ofstopwords
torm_stopwords
) should make the two packages logical companions and further extend the qdap package to integrate with the many packages that already handle"TermDocumentMatrix"
and"DocumentTermMatrix"
. -
cm_distance
now uses resampling of data from the null model to generate pvalues for the mean code distances. Useful for determining if an association (small distance) between codes is likely to happen if the null is true. -
dispersion_plot
added to enable viewing of word dispersion through discourse. -
word_proximity
added to complimentdispersion_plot
andword_cor
functions.word_proximity
gives the average distance between words in the unit of sentences.
MINOR FEATURES
-
url_dl
now takes quoted string urls supplied to ... (no url argument is supplied) -
condense
is a function that condense dataframe columns that are a list of vectors to a single vector of strings. This outputs a dataframe with condensed columns that can be wrote to csv/xlsx. -
mcsv_w
now usescondense
to attempt to attempt to condense columns that are lists of vectors to a single vector of strings. This adds flexibility tomcsv_w
with more data sets.mcsv_w
now writes lists of dataframes to multiple csvs (e.g., the output fromtermco
orpolarity
).mcsv_w
picks up a dataframes argument, an optional character vector supplied in lieu of \ldots that grabs the dataframes from an environment (default id the Global environment). -
ngrams
now has an argument ellipsis that passes further arguments supplied tostrip
-
dtm
added to complimenttdm
, allowing for easier integration with other R packages that utilizetdm
/dtm
. -
dir_map
picks up ause.path
argument that allows the user to specify a more flexible path to the created pre-formedread.transcript
scripts based on something likefile.path(getwd(), )
. This means portability of code on different machines. -
polarity_frame
a function to make a hash environment lookup for use with thepolarity
function. -
DATA.SPLIT
asentSplit
version of theDATA
data-set has been added to qdap. -
gantt_plot
acceptsNULL
.forgrouping.var
and figures for "all" rows as a single grouping var. -
replace_number
now handles 10^47 digits compared to 10^14 previously. -
The
new_project
function gains agithub
argument that optionally sends the repo to GitHub public account upon creation. -
qheat
,polarity.plot
andformality.plot
pick up the argumentplot
which optionally suppresses the plotting. This is useful if the user is operating in knitr, sweave, etc. and wishes to alter/add onto the plot. -
lookup
now takesmissing = NULL
. This results in the original values interms
corresponding to the missing elements being retained. -
cm_time.temp
picks up agrouping.var
argument that works similarly tocm_range.temp
'sgrouping.var
.cm_time.temp
also takes hour values forstart
andend
as inend = "01:22:03"
. -
gantt_rep
picks up a genericplot
method. -
Functions in the
cm_code.xxx
andcm_xxx2long
pick up a generic plot method that utilizesgantt_wrap
to plot a Gantt plot of the span data. -
Functions in the
cm_code.xxx
andcm_xxx2long
pick up a generic summary method. This summary method has its own plot method that utilizesqheat
to plot a heatmap of the summary statistics. The generic print method (print.sum_cmspans
) is useful for output intended for publication. -
qheat
picks up afacet.vars
argument that allows a character vector of length 1 or 2 to facet by. -
question_type
gives the indices of questions via$inds
. -
colsplit2df
not splits multiple columns to match the capabilities ofcolpaste2df
. -
sentSplit
now handles repeated measures and picks up a turn of talk plot method. -
tot_plot
now handles repeated measures andgrouping.var
to be nested within the turn of talk. -
wfm
now usesmtabulate
and is ~10x faster. -
plot.polarity
gains arguments for optional error bars using the standard error of the mean polarity. -
exclude
now works withwfm
and thetm
package'sDocumentTermMatrix
andTermDocumentMatrix
classes. -
rm_url
removes/replaces URLs in a string(s). -
matrix2df
added (underlist2df
) to convertrownames
of matrix to a dataframe column.
CHANGES
-
The dictionaries and word lists for qdap have been moved to their own package,
qdapDictionaries
. This will allow easier access to these resources beyond the qdap package as well as reducing the overall size of the qdap package.
Because this is a major change that make break the code of some users the major release number has been upped to 1. The following name changes have occurred:-
increase.amplification.words
-> became ->amplification.words
-
The
deamplification.words
wordlist andenv.pol
dictionary were added as well.
-
-
qdap gains an HTML package vignette to better explain the intended work flow and function use for the package. This is not currently a part of the build but can be accessed via:
http://htmlpreview.github.io/?https://github.com/trinker/qdap/blob/master/vignettes/qdap_vignette.html
Note that the vignette may include development version functions not yet available in the current CRAN version
-
polarity
utilizes a new, unbounded algorithm based on weighting to determine polarity. -
gantt_wrap
no longer accepts unquoted strings to theplot.var
argument. -
cm_df.temp
loses the logicalcsv
argument.file.name
have been replaced withfile
to fit conventional R naming schemes. -
The plotting feature of
gantt
has been removed and aplot
method has been added. The user can plot the output fromgantt
inbase
orggplot2
graphics. -
cm_time2long
loses the argumentstart.end
to ensure that thecmspans
class produced would operate as expected. -
Most exported functions utilizing a period separator have been replaced with underscore named versions.
-
wf_combine
renamedwfm_combine
to be consistent. -
question_type
algorithm improvements including implied do/does/did handling. -
list2df
andmtabulate
now exported. -
stopwords
has been renamed torm_stopwords
(rm_stop
shorthand) to better fit what the action the function performs and to avoid conflicts with thetm
package. -
replace_number
'snum.paste
becomes logical rather than character input. This makes use easier as the user doesn't need to remember arguments.
CHANGES IN qdap VERSION 0.2.5
Patch release. This version deals with the changes in the openNLP
package
that effect qdap. Next major release scheduled after slidify
package is
pushed to CRAN.
BUG FIXES
-
new_project
placed a report in the CORRESPONDENCE directory rather than CONTACT_INFO -
strip
would not allow the characters "/" and "-" to be passed tochar.keep
. This has been fixed. (Jens Engelmann) -
beg2end
would only grab first character of a string after n -1 occurrences of the character. For example:beg2char(c("abc-edw-www", "nmn-ggg", "rer-qqq-fdf"), "-", 2)
resulted in "abc-e" "nmn-g" "rer-q" rather than "abc-edw" "nmn-ggg" "rer-qqq"
NEW FEATURES
-
names2sex
a function for predicting gender from name. -
Added
NAMES
andNAMES_SEX
data-sets, based on 1990 U.S. census data. -
tdm
added as an equivalent to TermDocumentMatrix from the tm package. This allows for portability across text analysis packages.
MINOR FEATURES
-
mgsub
now gets atrim
argument that optionally removes trailing leading white spaces. -
lookup
now takes a list of named vectors for the key.match argument.
CHANGES
new_project
directory can now be transferred without breaking paths (i.e.,file.path(getwd(), "DIR/file.ext")
is used rather than the full file path).
CHANGES IN qdap VERSION 0.2.2
BUG FIXES
-
genXtract
labels returned the word "right" rather than the right edge string. See http://stackoverflow.com/a/15423439/1000343 for an example of the old behavior. This behavior has been fixed. -
gradient_cloud
'smin.freq
locked at 1. This has been fixed. (Manuel Fdez-Moya) -
termco
would produce an error if single length named vectors were passed to match.list and no multi-length vectors were supplied. Also an error was thrown if an unnamed multi-length vector was passed tomatch.list
. This behavior has been fixed.
NEW FEATURES
-
tot_plot
a visualizing function that uses a bar graph to visualize patterns in sentence length and grouping variables by turn of talk. -
beg2char
andchar2end
functions to grab text from beginning of string to a character or from a character to the end of a string. -
ngrams
function to calculate ngrams by grouping variable.
MINOR FEATURES
-
genX
andbracketX
gain an extra argumentspace.fix
to remove extra spaces left over from bracket removal. -
Updated out of date Dropbox url download in
url_dl
.url_dl
also takes the Dropbox key as well.
CHANGES
- qdap is now compiled for mac users (as
openNLP
now passes CRAN checks with no Errors on Mac).
CHANGES IN qdap VERSION 0.2.1
BUG FIXES
-
word_associate
colors the word cloud appropriately and deals with the error caused by a grouping variable not containing any words from 1 or more of the vectors of a list supplied to match string -
trans.cloud
produced an error when expand.target wasTRUE
. This error has been eliminated. -
termco
would eliminate > 1 columns matching an identical search.term found in a second vector of match.list.termco
now counts repeated terms multiple times. -
cm_df.transcript
did not give the correct speaker labels (fixed).
NEW FEATURES
-
gradient_cloud
: Binary gradient Word Cloud - A new plotting function that plots and colors words for a binary variable based on which group of the binary variable uses the term more frequently. -
new_project
: A project template generating function designed to increase efficiency and standardize work flow. The project comes with a .Rproj file for easy use with RStudio as well as a .Rprofile that makes loading and sourcing of packages, data and project functions. This function uses the reports package to generate an extensive reports folder.
MINOR FEATURES
-
stemmer
,stem2df
andstem.words
now explicitly have the argumentchar.keep
set to "~~" to enable retaining special character formerly stripped away. -
hms2sec
: A function to convert from h : m : s format to seconds. -
mcsv_w
now takes a list of data.frames. -
cm_range.temp
now takes the arguments text.var and grouping.var that will automatically output these (grouping.var) columns as range coded indices. -
wfm
gets as speed boost as the code has been re-written to be faster. -
read.transcript
now reads .txt files as well as text similar to read.table.
CHANGES
-
sec2hms
is the new name forconvert
-
folder
anddelete
have been moved to the reports package which is imported by qdap. Previouslyfolder
would not generate a directory with the time/date stamp if no directory name was given; this has been fixed, though the function now resides in the reports package.
CHANGES IN qdap VERSION 0.2.0
-
The first installation of the qdap package
-
Package designed to bridge the gap between qualitative data and quantitative analysis