Skip to content

Commit

Permalink
output results from text mining
Browse files Browse the repository at this point in the history
  • Loading branch information
pnrobinson committed Jun 23, 2024
1 parent 96fcf80 commit 162ef67
Show file tree
Hide file tree
Showing 8 changed files with 448 additions and 221 deletions.
7 changes: 7 additions & 0 deletions docs/cases/PMID_25163805.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[source]
pmid = PMID:25163805
title = Further delineation of Loeys-Dietz syndrome type 4 in a family with mild vascular involvement and a TGFB2 splicing mutation
[diagnosis]
disease_id = OMIM:614816
disease_label = Loeys-Dietz syndrome 4
[text]
26 changes: 26 additions & 0 deletions docs/cases/PMID_30249733.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[source]
pmid = PMID:30249733
title = Novel mutation in the CHST14 gene causing musculocontractural type of Ehlers-Danlos syndrome
[diagnosis]
disease_id = OMIM:601776
disease_label = Ehlers-Danlos syndrome, musculocontractural type 1
[text]
A 3-year-old boy, born out of a third-degree consanguineous marriage was referred to us by the paediatric surgeons on
suspicion of an underlying genetic disorder. He was being followed by them for bilateral hydronephrosis with bilateral
pelviureteric junction obstruction (right >left). The child was born by normal vaginal delivery with a birth weight of 2.7kg.
The length and head circumference at birth were not recorded. At birth, he was noted to have bilateral clubfeet.
The child started sitting at around 7–8 months of age however had difficulty in standing and walking.
At the current age of 3 years also, he is able to stand with support only for few minutes.
In the other sectors of development like cognition and language, the child showed appropriate gain and currently is able
to tell short stories and enjoys playing with family members. Anthropometry at the age of 3 years showed weight to be 12.6kg,
length to be 88cm and head circumference of 47cm at 3 years of age. For the initial 1–1.5 years of life,
the parents were mainly concerned about clubfeet in their child and were taking opinion of local practitioners for the same.
During an episode of acute febrile illness, he was coincidently diagnosed to have hydronephrosis and in view of cryptorchidism
noted by the examining physician was referred to our centre for evaluation and management.
On examination, the child had facial dysmorphism in the form of synophrys, hypertelorism, down slanting palpebral fissures,
low set ears, thin upper lip, high arched palate and prominent nasolabial folds (figures 1A and 2A–C).
He had tapering fingers with bilaterally thin and adducted thumbs (figure 1B). The deep palmer creases were absent and
only a few fine creases were seen. Feet showed bilateral talipes equinovarus deformity (figure 1C).
The skin was hyperelastic and hypermobility of fingers, elbow and knee joints was noted. Generalised hypotonia was present.
The child also had bilateral cryptorchidism. No bruises or haematomas were seen and even on repeated asking the parents denied
any bleeding tendency.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ public static void main(String[] args){
.addSubcommand("download", new DownloadCommand())
.addSubcommand("prompt", new PromptCommand())
.addSubcommand("mine", new TextMineCommand())
.addSubcommand("batchmine", new TextMineCommand())
.addSubcommand("translate", new GptTranslateCommand())
;
cline.setToggleBooleanFlags(false);
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
package org.monarchinitiative.phenopacket2prompt.cmd;


import org.monarchinitiative.phenol.base.PhenolRuntimeException;
import org.monarchinitiative.phenopacket2prompt.mining.CaseBundle;
import org.monarchinitiative.phenopacket2prompt.mining.FenominalParser;
import org.monarchinitiative.phenopacket2prompt.model.PpktIndividual;
import org.monarchinitiative.phenopacket2prompt.output.CorrectResult;
import org.monarchinitiative.phenopacket2prompt.output.PpktCopy;
import org.monarchinitiative.phenopacket2prompt.output.PromptGenerator;
import picocli.CommandLine;

import java.io.File;
import java.util.List;
import java.util.concurrent.Callable;

@CommandLine.Command(name = "batchmine", aliases = {"B2"},
mixinStandardHelpOptions = true,
description = "Batch Text mine, Translate, and Output phenopacket and prompt")
public class BatchMineCommand implements Callable<Integer> {
@CommandLine.Option(names={"-d","--data"}, description ="directory to download data (default: ${DEFAULT-VALUE})" )
public String datadir="data";

@CommandLine.Option(names={"-i","--inputdir"}, description ="input files (directory)" )
public String input = "docs/cases/"; // provide path for testing

@CommandLine.Option(names = { "-o", "--output"}, description = "Path to output file dir(default: ${DEFAULT-VALUE})")
private String output = "mined_out";

@CommandLine.Option(names = {"-e", "--exact"}, description = "Use exact matching algorithm")
private boolean useExactMatching = false;

@CommandLine.Option(names = {"--translations"},
description = "path to translations file")
private String translationsPath = "data/hp-international.obo";

@CommandLine.Option(names = {"--verbose"}, description = "show results in shell (default is to just write to file)")
private boolean verbose;



@Override
public Integer call() throws Exception {
File inDirectory = new File(input);
if (!inDirectory.isDirectory()) {
throw new PhenolRuntimeException("Could not find directory at " + input);
}
File hpoJsonFile = new File(datadir + File.separator + "hp.json");
if (! hpoJsonFile.isFile()) {
System.out.printf("[ERROR] Could not find hp.json file at %s\nRun download command first\n", hpoJsonFile.getAbsolutePath());
}
File translationsFile = new File(translationsPath);
if (! translationsFile.isFile()) {
System.err.printf("Could not find translations file at %s. Try download command", translationsPath);
return 1;
}
Utility utility = new Utility(translationsFile);
List<PpktIndividual> individualList = getIndividualsFromTextMining(inDirectory,hpoJsonFile);
PromptGenerator spanish = utility.spanish();
Utility.outputPromptsInternationalMining(individualList,"es", spanish);
// Dutch
PromptGenerator dutch = utility.dutch();
Utility.outputPromptsInternationalMining(individualList,"nl", dutch);
// GERMAN
PromptGenerator german = utility.german();
Utility.outputPromptsInternationalMining(individualList,"de", german);
// ITALIAN
PromptGenerator italian = utility.italian();
Utility.outputPromptsInternationalMining(individualList,"it", italian);

// output file with correct diagnosis list
List<CorrectResult> correctResultList =Utility.outputPromptsEnglishFromIndividuals(individualList);
Utility.outputCorrectTextmined(correctResultList);
return 0;
}

/**
* Get all of the individual objects by text mining the input files
* @param inDirectory Input directory. Should hold input files formatted for this project (demonstration)
* @param hpoJsonFile File representing hp.json
* @return list of individuals
*/
protected List<PpktIndividual> getIndividualsFromTextMining(File inDirectory, File hpoJsonFile) {
FenominalParser parser = new FenominalParser(hpoJsonFile, useExactMatching);
List<CaseBundle> caseBundleList = Utility.getAllCaseBundlesFromDirectory(inDirectory, parser);
return caseBundleList.stream().map(CaseBundle::individual).toList();
}



private void outputTextmined(FenominalParser parser) {

List<CaseBundle> caseBundleList = Utility.getCaseBundleList(input, parser);
if (caseBundleList.isEmpty()) {
System.err.println("Could not extract cases from " + input);
}
// for now, just output one case
Utility.outputPromptFromCaseBundle(caseBundleList.getFirst().individual(), output);
}
}
Loading

0 comments on commit 162ef67

Please sign in to comment.