Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments to the solr part #4

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
ab1a30d
Ran mvn clean install
larsmahler Oct 31, 2013
8763dec
fix the index out of bound bug
kuoliu Nov 3, 2013
d4d1c57
Merge pull request #6 from kuoliu/master
larsmahler Nov 4, 2013
6422aea
allow solr to run locally
kuoliu Nov 4, 2013
fec7b2a
Merge pull request #7 from kuoliu/master
larsmahler Nov 4, 2013
cc28fe9
Add a simple survey file for data to data folder.
twlabc123 Nov 6, 2013
f56f19c
Temp commit, prior to fetching from origin repo.
larsmahler Nov 8, 2013
a2052ef
Added SymbolAnnotator.java, updated TypeSystemDescriptor.xml (and
larsmahler Nov 11, 2013
5e37a83
Included SymbolAnnotator.xml
larsmahler Nov 11, 2013
2a8342a
Modified the type system for Coref and more phrases
kartikgo Nov 12, 2013
78c0ec9
Modified the type system for Coref and more phrases
kartikgo Nov 12, 2013
0e4b37e
Generated new types after modifying type system for Coref
kartikgo Nov 12, 2013
8ec1e56
Merge branch 'master' of https://github.com/larsmahler/hw5-team08.git
kartikgo Nov 12, 2013
95caec3
Coreference Done! Type System Finalized.
kartikgo Nov 12, 2013
ea5c166
change the solr index, add the co-ref and synonyms
helloeve Nov 13, 2013
9657844
add new solr start.jar
helloeve Nov 13, 2013
e020740
New AcronymAnnotator.xml and AcronymAnnotator.java
larsmahler Nov 13, 2013
e576e51
Upload M1 presentation.
larsmahler Nov 14, 2013
7085dac
add solr zip file
helloeve Nov 18, 2013
a059c6d
Revision includes AnswerHeuristic annotator
larsmahler Nov 22, 2013
efbbe3c
Merge remote-tracking branch 'origin/master'
larsmahler Nov 22, 2013
6d47992
update the typesystem to add the "candidateSentenceList" for Answer
helloeve Nov 22, 2013
17ea4c3
Added Targets to the question
kartikgo Nov 24, 2013
d7e2cd0
Add a "finalScore" in CandidateAnswer and add a new scorer that can
twlabc123 Nov 24, 2013
a498129
fix bugs caused by AnswerHeuristicAnnotator, change the
helloeve Nov 24, 2013
5c45b72
Modify AnswerSelectionByCand*.java by adding "finalScore" into
twlabc123 Nov 24, 2013
48a44c2
Added M2 report
larsmahler Nov 26, 2013
f719990
Added M2 report
larsmahler Nov 26, 2013
89da422
Merge remote-tracking branch 'origin/master'
larsmahler Nov 26, 2013
6fd5506
Update scorer.
twlabc123 Nov 30, 2013
2e1c9ab
Merge branch 'master' of https://github.com/larsmahler/hw5-team08.git
twlabc123 Nov 30, 2013
93bdcd2
Updated documentation
larsmahler Dec 6, 2013
b3b2532
Haven't fully test the CasConsumer, but this version can be used to test
twlabc123 Dec 6, 2013
3c131bb
Merge branch 'master' of https://github.com/larsmahler/hw5-team08.git
twlabc123 Dec 6, 2013
72744c7
Updated documentation
larsmahler Dec 6, 2013
ee8e41b
Merge remote-tracking branch 'origin/master'
larsmahler Dec 6, 2013
ac496a4
Document my Scorer and Voting selector.
twlabc123 Dec 6, 2013
7bbf973
Added "Working Notes Paper.pdf" and "outputXML.zip"
larsmahler Dec 6, 2013
56b53a7
Updated Working Notes paper
larsmahler Dec 6, 2013
70320d1
Documented Kartik's part!
kartikgo Dec 6, 2013
13f12b4
Merge remote-tracking branch 'origin/master'
larsmahler Dec 6, 2013
75b7584
Merge remote-tracking branch 'origin/master'
larsmahler Dec 6, 2013
86e45aa
Milestone 3 - FINAL
larsmahler Dec 6, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions qa4mre-alzheimer-task/.classpath
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,7 @@
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER">
<attributes>
<attribute name="maven.pomderived" value="true"/>
Expand Down

This file was deleted.

2 changes: 1 addition & 1 deletion qa4mre-alzheimer-task/XMIs/12-test-alzheimer/QA4MRE-2012_BIOMEDICAL_GS.xml_1.xmi
100755 → 100644

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion qa4mre-alzheimer-task/XMIs/12-test-alzheimer/QA4MRE-2012_BIOMEDICAL_GS.xml_2.xmi
100755 → 100644

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion qa4mre-alzheimer-task/XMIs/12-test-alzheimer/QA4MRE-2012_BIOMEDICAL_GS.xml_3.xmi
100755 → 100644

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion qa4mre-alzheimer-task/XMIs/12-test-alzheimer/QA4MRE-2012_BIOMEDICAL_GS.xml_4.xmi
100755 → 100644

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,7 @@ Of mice and men: an Alzheimer’s cure for our murine brethren. Alzheimer's Dise
<answer a_id='1'>Alzheimer's treatment</answer>
<answer a_id='2'>nest making</answer>
<answer a_id='3'>restoring smell</answer>
<answer a_id='4'>neurodegeneration
</answer>
<answer a_id='4'>neurodegeneration</answer>
<answer a_id='5'>None of the above</answer>
</q>
<q q_id="10" >
Expand Down Expand Up @@ -247,8 +246,7 @@ Fighting Alzheimer’s disease? Get the immune system on board. James Fuller Jam
<q_str>Name a similarity between AD and TB.</q_str>
<answer a_id='1'>the body is slowly destroying the brain</answer>
<answer a_id='2'>the vaccine that teaches the immune system to fight off the infection</answer>
<answer a_id='3'>drugs and therapy
</answer>
<answer a_id='3'>drugs and therapy</answer>
<answer a_id='4'>side effects of the vaccines</answer>
<answer a_id='5'>None of the above</answer>
</q>
Expand Down Expand Up @@ -291,8 +289,7 @@ Alanna Shaikh: How I'm preparing to get Alzheimer's. I'd like to talk about my d
</q>
<q q_id="5" >
<q_str>What is Alanna's aim when building her physical strength?</q_str>
<answer a_id='1'>to become a better person
</answer>
<answer a_id='1'>to become a better person</answer>
<answer a_id='2'>to win a tai chi medal</answer>
<answer a_id='3'>to fill out forms</answer>
<answer a_id='4'>to have the ability to knit a sweater</answer>
Expand Down Expand Up @@ -434,8 +431,7 @@ Financial challenges faced by person with dementia. The idea: A person with deme
<q q_id="7" >
<q_str>All but one of the following are reasons why a total of $100,000 for the common funds of a family seeking assistance is not enough. Which one is that?</q_str>
<answer a_id='1'>The needs of the family continue.</answer>
<answer a_id='2'>The cost of living constantly increases.
</answer>
<answer a_id='2'>The cost of living constantly increases.</answer>
<answer a_id='3'>The costs incident to the disease constantly increase.</answer>
<answer a_id='4'>Health care covers assisted living.</answer>
<answer a_id='5'>None of the above</answer>
Expand Down
13 changes: 13 additions & 0 deletions qa4mre-alzheimer-task/data/survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
In 12 test data, we have 4 document, each with10 questions.
In 13 sample data, we have 1 document with 10 questions.
In 13 test data, we have 16 document, each with 15-20 questions.

Each question in 12 test data and 13 sample data have 5 options and an implicit option of choosing nothing.
But questions in 13 test data have 5 options which contains a explicit "None of above" option.

The type of the questions are mainly factoid.
First is question about a specific part of a fact like what, where, how, why, who, aim, purpose.
Second is asking about some numeric features like how many, how old.
Third is asking us to name 2-3 examples, like "Name 2 ways to do ..."
There is a special type of question in 13 test data, which asks about the degree of a certain fact. The options are like "Absolutely yes", "Probably yes", "Probably not" and "Absolutely not".
We definitely need a special pipeline for the last type of questions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added qa4mre-alzheimer-task/documentation/outputXML.zip
Binary file not shown.
Binary file added qa4mre-alzheimer-task/solr/apache-solr-3.6.1.zip
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
package edu.cmu.lti.deiis.hw5.annotators;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.ListIterator;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.collections.iterators.ArrayListIterator;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.jcas.JCas;
import org.apache.uima.jcas.cas.FSList;
import org.apache.uima.jcas.cas.NonEmptyFSList;
import org.apache.uima.resource.ResourceInitializationException;

import edu.cmu.lti.qalab.types.Answer;
import edu.cmu.lti.qalab.types.Sentence;
import edu.cmu.lti.qalab.types.Synonym;
import edu.cmu.lti.qalab.types.Token;
import edu.cmu.lti.qalab.utils.Utils;

/**
* Finds Token instances that appear to be acronyms and determines their likely
* expansion (ex: "AD" expands to "Alzheimer's Disease"). Updates the Token.synonyms
* FSList to include the expansion as a synonym (e.g. after processing the Token "AD",
* its synonyms list will contain "Alzheimer's Disease" as a synonym).
*/
public class AcronymAnnotator extends JCasAnnotator_ImplBase{

// Regex patterns (to find acronyms)
// A) all UPPERCASE* (ex: "IDE")
// B) lowercase with no vowels* (ex: "sst")
// C) MixedCase with uppercase in middle/end (ex: "LoB")
// *with/without numbers
Pattern uppercasePattern=Pattern.compile("^[A-Zß]{2,6}[\\d]?$");
Pattern lowercasePattern=Pattern.compile("^[bcdfghjklmnpqrstvwxz\\d]{2,4}$");
Pattern mixedcasePattern=Pattern.compile("^[A-Zß]{1,6}[\\d]?[a-z]{0,3}[\\d]?[A-Zß]{1,6}[\\d]?[a-z\\d]{0,3}[\\d]?$");
HashMap<String, ArrayList<Synonym>> acronymSynonymMap = new HashMap<String, ArrayList<Synonym>>();

@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {

// Loop through tokens from test doc
ArrayList<Sentence> sentences = Utils.getSentenceListFromTestDocCAS(jCas);
for (Sentence s : sentences) {
ArrayList<Token> tokens = Utils.getTokenListFromSentenceList(s);
annotateAcronyms(tokens, jCas);
}

// Loop through tokens from source doc
ArrayList<Sentence> sentences2 = Utils.getSentenceListFromSourceDocCAS(jCas);
for (Sentence s : sentences2) {
ArrayList<Token> tokens = Utils.getTokenListFromSentenceList(s);
annotateAcronyms(tokens, jCas);
}

// Loop through tokens from answers
ArrayList<ArrayList<Answer>> answers = Utils.getAnswerListFromTestDocCAS(jCas);
for (ArrayList<Answer> aList : answers) {
for (Answer a : aList) {
ArrayList<Token> tokens = Utils.getTokenListFromAnswer(a);
annotateAcronyms(tokens, jCas);
}
}
}

private void annotateAcronyms(ArrayList<Token> tokens, JCas jCas) {

// Loop through tokens, and Find something that looks like an acronym:
// A) all UPPERCASE* (ex: "IDE")
// B) lowercase with no vowels* (ex: "sst")
// C) MixedCase with uppercase in middle/end (ex: "LoB")
// *with/without numbers

int histLength = Math.min(5, tokens.size()-1);

ArrayList<Token> prevTokens = new ArrayList<Token>(histLength);
ArrayList<Token> nextTokens = new ArrayList<Token>(histLength);

for (int i=0; i<tokens.size(); i++) {
// Store values of previous and next tokens
if (i==0) {
for (int j=1; j<=histLength; j++) {
prevTokens.add(tokens.get(i));
nextTokens.add(tokens.get(i+j));
}
}
else if (i < tokens.size()-histLength) {
prevTokens.add(tokens.get(i-1));
prevTokens.remove(0);
nextTokens.add(tokens.get(i+histLength));
nextTokens.remove(0);
}
else if (i < tokens.size()-histLength) {
prevTokens.add(tokens.get(i-1));
prevTokens.remove(0);
nextTokens.remove(0);
}

Token t = tokens.get(i);
String text = t.getText();

// If the token already exists in the acronym hashmap / DB, retrieve synonyms from hashmap.
ArrayList<Synonym> existingSynonyms = this.acronymSynonymMap.get(t.getText());
if (existingSynonyms != null) {
addUpdateTokenSynonyms(t, existingSynonyms, jCas);
} else {
// Else: determine whether the token is an acronym
// A) all UPPERCASE* (ex: "IDE")
Matcher upperMatch = uppercasePattern.matcher(text);
while (upperMatch.find()) {
confirmMatch(tokens.get(i), prevTokens, nextTokens, jCas);
}
// B) lowercase with no vowels* (ex: "sst")
Matcher lowerMatch = lowercasePattern.matcher(text);
while (lowerMatch.find()) {
confirmMatch(tokens.get(i), prevTokens, nextTokens, jCas);
}
// C) MixedCase with uppercase in middle/end (ex: "LoB")
Matcher mixedMatch = mixedcasePattern.matcher(text);
while (mixedMatch.find()) {
confirmMatch(tokens.get(i), prevTokens, nextTokens, jCas);
}
}
}
}

private void confirmMatch(Token t, ArrayList<Token> prevToks, ArrayList<Token> nextToks, JCas jCas) {

String acronym = t.getText();
boolean leftParensFound = false;
boolean rightParensFound = false;
boolean matchFound = false;
HashMap<String, Integer> synonymMap= new HashMap<String, Integer>();


// Determine if acronym is contained within parens
for (int i=0; i<prevToks.size(); i++) {
if (prevToks.get(i).getText().equalsIgnoreCase("(")) {
leftParensFound = true;
break;
}
}
for (int i=0; i<nextToks.size(); i++) {
if (nextToks.get(i).getText().equalsIgnoreCase(")")) {
rightParensFound = true;
break;
}
}

// If acronym is contained within parens
// 1) look to the left
// 2) if preceding tokens form NP and the first letter of any token
// = a letter within acronym
// 3) then match.
if (leftParensFound == true && rightParensFound == true) {
for (int i=0; i<prevToks.size(); i++) {
for (int j = 0; j < acronym.length(); j++){
char c = acronym.charAt(j);
if (Character.toLowerCase(prevToks.get(i).getText().charAt(0)) == Character.toLowerCase(c)) {
matchFound = true;
synonymMap.put(prevToks.get(i).getText(), 1);
}
}
}
} else {
// If acronym is not contained within parens
// 1) look to the right
// 2) if following tokens consist of parens with NP within, and the
// first letter of any token = a letter within acronym
// 3) then match.
for (int i=0; i<nextToks.size(); i++) {
for (int j = 0; j < acronym.length(); j++){
char c = acronym.charAt(j);
if (Character.toLowerCase(nextToks.get(i).getText().charAt(0)) == Character.toLowerCase(c)) {
matchFound = true;
synonymMap.put(prevToks.get(i).getText(), 1);
}
}
}
}

// If a match is found, store its expansion (noun phrase) as a synonym
if (matchFound == true) {
// Copy hashmap to ArrayList
ArrayList<Synonym> newSynonyms = new ArrayList<Synonym>();
for (Map.Entry<String, Integer> entry : synonymMap.entrySet())
{
Synonym newSynonym = new Synonym(jCas);
newSynonym.addToIndexes(jCas);
newSynonym.setText(entry.getKey());
newSynonyms.add(newSynonym);
}
addUpdateTokenSynonyms(t, newSynonyms, jCas);
this.acronymSynonymMap.put(t.getText(), newSynonyms);

}

}

private void addUpdateTokenSynonyms(Token t, ArrayList<Synonym> synonymList, JCas jCas) {

// If token already has synonym list, append it to synonymList
FSList prevSynonyms = t.getSynonyms();
if (prevSynonyms == null) {
// Do nothing
} else {
boolean dupFlag = false;
try {
ArrayList<Synonym> prevSynonymsArrayList = Utils.fromFSListToCollection(prevSynonyms, Synonym.class);
for (Synonym s : prevSynonymsArrayList) {
for (Synonym s2 : synonymList) {
if (s.getText().equalsIgnoreCase(s2.getText())) {
dupFlag = true;
}
}
if (dupFlag == false) {
synonymList.add(s);
}
}
} catch (NullPointerException e) {
// Some tokens seemed to not exist (caused null pointer exceptions). In this case, do not try to update them.
return;
}

}

// Set synonymList as the new FSList<Synonym> for the token
FSList updatedSynonyms = Utils.fromCollectionToFSList(jCas, synonymList);
updatedSynonyms.addToIndexes(jCas);
t.setSynonyms(updatedSynonyms);
t.addToIndexes();

}
}
Loading