Requirements #1

joeflack4 · 2022-06-16T19:30:47Z

Description

(Originally taken from: Requirements google doc)
Zulip Terminology Stream Text Mining Project

Base Zulip bulletin board application is supported by a REST API that can be interrogated (?) via Python scripts. Bots can be configured via Python to provide real-time monitoring as well. Text mining of the terminology stream in the FHIR Zulip community bulletin board to discover trends regarding use of terminologies and terminology services within the HL7 FHIR community.

Objective of this exercise is to review the history of the content and activity Terminology stream.

Task list

Task details

(Refer to for more info, especially for 1-5: Requirements google doc)

6a. Thread length

6a.i. Average length of threads: Determine average length (in days / wells / months) in terminology stream threads.
6b.i. Identify outlier threads in terms of length: Identify outliers in length - longer running threads

Possible solutions:
For this, can aggregate all thread lengths (i.e. in terms of number of messages) and report 2 different classes of identifiers: (i) 1 standard deviation away from norm, and (ii) 2 standard deviations.

6b. Threads lacking adequate resolution

Identify those topics with (i) many responses (not necessarily with longer length, but will likely be one of these as well) that (ii) do not have some sort of resolution. Will require iterative review with SME (Davera or others)

Possible solutions:
(i) Many responses: Can potentially be defined as 1 standard deviation away from mean.
(ii) Lacking resolution: This would likely be too time consuming to automate; so should go with suggestion of SME review. However, we could programmatically automate / aid this analysis, perhaps, by re-reading the analytical output. The output (likely a CSV file) could have 1+ codified curator columns, where data will be manually entered by SMEs. Then, that information could be re-read if further programmatic analysis is needed.

6c. Frequency variance

For each of the count categories (1-4) above, when is the occurrence of these topics, when are they more frequent / less frequent

6d. Activity variance

Date-base counts for all topics indicating activity levels: when is the stream more active / less active

Additional info

Links

Requirements google doc
Chat URL: http://chat.fhir.org
Zulip API docs: https://zulip.com/api/rest
Category keywords google sheet

joeflack4 · 2022-06-16T19:32:47Z

@DaveraGabriel FYI
@stephanieshong I don't remember who else might be working on this, but feel free to link them to this / or "add to assignees".

stephanieshong · 2022-06-16T20:10:36Z

We will assign this task to Rohan Hurer.

stephanieshong · 2022-06-16T22:39:36Z

#example of nlp keyword search that might be useful:

get_ipython().system('pip3 install --user nltk flashtext')

nltk.download('punkt')
from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor()
keyword_dict = {
     "codesystem": ["DICOM","SNOMED", "LOINC", "ICD10CM", "ICD10PCS", "NDC", "RxNorm" ],
     "HL7Productfamilies": ["CDA", "C-CDA", "V3", "Version3"], 
     "TerminologyResources": ["ConceptMap", "CodeSystem","ValueSet","Terminology Service","TerminologyCapabilities", "NamingSystem", "Coding", "Code", "CodeableConcept"],
     "Operations": ["$lookup", "$validate-code", "$subsumes", "$find-matches", "$expand", "$validate-code", "$translate", "$closure"]
}

keyword_processor.add_keywords_from_dict(keyword_dict)
keyword_processor.extract_keywords('zulip activities based on code system, HL7Product family, Terminology Resources and Operations')

joeflack4 · 2022-06-21T17:17:10Z

Some options we discussed:
a. Fetch stream topic message text strings and query them separately, then aggregate the results.
b. Concatenating the text of all topics together into one big string of text, and then query that.

My instincts lean me towards (a) for some reason, but I think both are potentially good.

joeflack4 assigned joeflack4 and stephanieshong Jun 16, 2022

joeflack4 added the requirements label Jun 16, 2022

joeflack4 mentioned this issue Jul 28, 2022

Spelling variations #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirements #1

Requirements #1

joeflack4 commented Jun 16, 2022 •

edited

Loading

joeflack4 commented Jun 16, 2022 •

edited

Loading

stephanieshong commented Jun 16, 2022

stephanieshong commented Jun 16, 2022 •

edited by joeflack4

Loading

joeflack4 commented Jun 21, 2022 •

edited

Loading

Requirements #1

Requirements #1

Comments

joeflack4 commented Jun 16, 2022 • edited Loading

Description

Task list

Task details

6a. Thread length

6b. Threads lacking adequate resolution

6c. Frequency variance

6d. Activity variance

Additional info

Links

joeflack4 commented Jun 16, 2022 • edited Loading

stephanieshong commented Jun 16, 2022

stephanieshong commented Jun 16, 2022 • edited by joeflack4 Loading

joeflack4 commented Jun 21, 2022 • edited Loading

joeflack4 commented Jun 16, 2022 •

edited

Loading

joeflack4 commented Jun 16, 2022 •

edited

Loading

stephanieshong commented Jun 16, 2022 •

edited by joeflack4

Loading

joeflack4 commented Jun 21, 2022 •

edited

Loading