Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement/make brat reader more forgiving take2 #1448

Open
wants to merge 55 commits into
base: 1.12.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
4059834
Merge pull request #1 from dkpro/1.12.x
alaindesilets Dec 18, 2019
9564372
Simplified test runner for BratReader and BratWriter
alaindesilets Dec 19, 2019
3eb373f
[REF] Move stripProtocol to BratReader
alaindesilets Dec 19, 2019
86277ae
BratReader can now receive .txt file path
alaindesilets Dec 19, 2019
85e9519
BratReader can now deal with brat dir without *.ann
alaindesilets Dec 19, 2019
5b09727
BratReader with directory path now reads all files in path
alaindesilets Dec 20, 2019
f1e13d2
BratReader read dir test now compares content of the files
alaindesilets Dec 20, 2019
fb740f5
BratReader automatically creates empty .ann files if not already exist
alaindesilets Dec 20, 2019
5d85261
Generate default mappings for types
alaindesilets Dec 20, 2019
0c74007
Call stub Mapping.merge()
alaindesilets Dec 20, 2019
3dc1a66
Commented out a call to Mapping.merge()
alaindesilets Dec 21, 2019
76fc777
Revert back to "Call stub Mapping.merge()"
alaindesilets Dec 21, 2019
92b40f9
Map.merge() implemented but does not work yet
alaindesilets Dec 21, 2019
277724a
Merge branch '1.12.x' into Improvement/Make_BratReader_more_forgiving…
reckart Dec 24, 2019
2f61fbf
#1443 - Make BratReader more forgiving
reckart Dec 26, 2019
ec0598b
[BUG] getDefaultMapping() used wrong order of arguments when
alaindesilets Dec 23, 2019
dc5372a
Get rid of explicit PARAM_MAPPING for some tests
alaindesilets Dec 23, 2019
e04bd88
Delete duplicate FileCopy class
alaindesilets Dec 24, 2019
588c228
Fixing testing code (tests were passing trivially)
alaindesilets Jan 6, 2020
145a7b5
#1443 - Make BratReader more forgiving
reckart Jan 6, 2020
5a35178
Migrate more tests to use testOneWaySimple()
alaindesilets Jan 7, 2020
e3f5d69
Continue migrating Brat tests to new Reader/WriterAssert approach
alaindesilets Jan 8, 2020
ec31d03
[REF] Minor refactoring to DKProTestContext
alaindesilets Jan 8, 2020
f4da1db
Proper setting of "overwrite" param in testing methods
alaindesilets Jan 13, 2020
92e57ca
In test harness, decouple ref, input and output locations
alaindesilets Jan 13, 2020
423d059
Migrate misc Reader/Writer tests to new testing approach
alaindesilets Jan 13, 2020
d0f3b93
Added missing dependencies in bnc and brat modules
alaindesilets Jan 14, 2020
9f5e153
Added missing assertj dependency in some pom.xml
alaindesilets Jan 14, 2020
049c8fb
Correct a style violation in io-negra
alaindesilets Jan 14, 2020
ad3a51a
Additional tests for io-brat module
alaindesilets Jan 14, 2020
5f46c59
Add License terms to Assert*.java files
alaindesilets Jan 15, 2020
e26aa9e
Added BratReader/Writer test for case where .ann files contains custom
alaindesilets Jan 15, 2020
a77d374
[REF] New constructors for TypeMapping
alaindesilets Jan 15, 2020
d95d379
Mapping.merge can now handle situations where some of the mappings are
alaindesilets Jan 15, 2020
0ffc59d
TypeMappings can now have a defaultUimaMapping
alaindesilets Jan 15, 2020
0b77643
Upon encountering an unknown Brat label, emit a "generic" BratAnnot
alaindesilets Jan 15, 2020
f011657
Partially undoing commit for unknown brat labels
alaindesilets Jan 16, 2020
03d7a7b
Can deal with unknown Brat annots as long as they don't have attributes
alaindesilets Jan 16, 2020
67d024d
BratWriter.PARAM_ENABLE_MAPPINGS now defaults to true
alaindesilets Jan 20, 2020
f9e78bf
BratReader now checks for conflicting mappings
alaindesilets Jan 20, 2020
4d4093c
BratWriter now checks for conflicting mappings
alaindesilets Jan 21, 2020
a682a00
[REF] Centralizing default Brat mappings in separate class
alaindesilets Jan 21, 2020
7b1c956
BratWriter now uses combination of custom and default mappings
alaindesilets Jan 21, 2020
1182c8f
[REF] Deleting un-necessary code
alaindesilets Jan 21, 2020
80b1c44
Add a setUp() to initialize workspace in a io-bincas test
alaindesilets Jan 21, 2020
a9bb51a
[REF] Deleting more un-needed code
alaindesilets Jan 21, 2020
986d627
Fixed bad expectations in two tests
alaindesilets Jan 21, 2020
ace3808
Removed unused dependencies in io-brat
alaindesilets Jan 22, 2020
52aeb58
BratWriter.PARAM_TYPE_MAPPINGS now defaults to {}
alaindesilets Jan 22, 2020
a264132
Improved comment
alaindesilets Jan 22, 2020
4bf1013
Change name of a test to make its intent clearer
alaindesilets Feb 3, 2020
2dbd59b
Checkstyle fixes.
reckart Feb 28, 2020
2ad6edf
Merge branch '1.12.x' into Improvement/Make_BratReader_more_forgiving…
reckart Feb 28, 2020
0c99472
No issue. Upgrade checkstyle. Simplify type generation.
reckart Feb 28, 2020
c0109ab
Merge branch '1.12.x' into Improvement/Make_BratReader_more_forgiving…
reckart Oct 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ public abstract class JCasFileWriter_ImplBase
*/
public static final String PARAM_STRIP_EXTENSION = "stripExtension";
@ConfigurationParameter(name = PARAM_STRIP_EXTENSION, mandatory = true, defaultValue = "false")
private boolean stripExtension;
protected boolean stripExtension;

/**
* Use the document ID as file name even if a relative path information is present.
Expand Down Expand Up @@ -175,11 +175,12 @@ else if (singularTarget) {
return getOutputStream((String) null, aExtension);
}
else {
return getOutputStream(getRelativePath(aJCas), aExtension);
String relPath = getRelativePath(aJCas);
return getOutputStream(relPath, aExtension);
}
}

protected String getTargetLocation()
public String getTargetLocation()
{
return targetLocation;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ public abstract class ResourceCollectionReaderBase
*/
public static final String PARAM_PATTERNS = ComponentParameters.PARAM_PATTERNS;
@ConfigurationParameter(name = PARAM_PATTERNS, mandatory = false)
private String[] patterns;
protected String[] patterns;

/**
* Use the default excludes.
Expand Down Expand Up @@ -171,7 +171,7 @@ public void initialize(UimaContext aContext)
throw new IllegalArgumentException(
"Either a source location, pattern, or both must be specified.");
}

// if an ExternalResourceLocator providing a custom ResourcePatternResolver
// has been specified, use it, by default use PathMatchingResourcePatternresolver

Expand Down Expand Up @@ -358,7 +358,7 @@ protected String getSourceLocation()
return sourceLocation;
}

protected boolean isSingleLocation()
public boolean isSingleLocation()
{
return patterns == null;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*
* Copyright 2020
* National Research Council of Canada
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.dkpro.core.api.resources;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.NoSuchFileException;
import java.nio.file.Path;

import org.apache.commons.io.FileUtils;

public class FileCopy {

public static void copyFolder(File srcFolder, File destFolder) throws NoSuchFileException {
copyFolder(srcFolder.toPath(), destFolder.toPath());
}


public static void copyFolder(Path srcFolder, Path destFolder) throws NoSuchFileException {
final Path srcFolderAbs = srcFolder.toAbsolutePath();
Path destFolderAbs = destFolder.toAbsolutePath();
if (!srcFolder.toFile().exists()) {
throw new NoSuchFileException(srcFolder.toString());
}

try {
Files.walk(srcFolderAbs).forEach(s -> {
try {
Path d = destFolder.resolve(srcFolderAbs.relativize(s));
if (Files.isDirectory(s)) {
if (!Files.exists(d)) {
Files.createDirectory(d);
}
return;
}
Files.copy(s, d);
}
catch (Exception e) {
e.printStackTrace();
}
});
}
catch (Exception ex) {
ex.printStackTrace();
}
}

public static void copyFileToFolder(Path srcFile, Path destFolder) throws IOException
{
if (Files.isDirectory(srcFile)) {
throw new IllegalArgumentException("Source file path " + srcFile + " is a directory");
}
if (!Files.isDirectory(destFolder)) {
throw new IllegalArgumentException(
"Destination directory path " + destFolder + " is not a directory");
}

FileUtils.copyFileToDirectory(srcFile.toFile(), destFolder.toFile());
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
/*
* Copyright 2019
* National Research Council of Canada
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.dkpro.core.api.resources;

import java.io.File;
import java.io.IOException;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.function.Consumer;

import org.apache.commons.io.FilenameUtils;

public class FileGlob {

public static class FileDeleter implements Consumer<File> {

@Override
public void accept(File file) {
file.delete();
}

}

public static class FileGlobVisitor extends SimpleFileVisitor<Path> {
private PathMatcher matcher = null;
private List<File> visitedFiles = null;
private Consumer<File> action = null;

public FileGlobVisitor(String pattern) {
initFileGlobVisitor(pattern, null);
}

public FileGlobVisitor(String pattern, Consumer<File> _action) {
initFileGlobVisitor(pattern, _action);
}

private void initFileGlobVisitor(String pattern, Consumer<File> _action) {
visitedFiles = new ArrayList<File>();
FileSystem fs = FileSystems.getDefault();
//Have to escape windows file separators since \\ is a glob escape character
matcher = fs.getPathMatcher("glob:" + pattern.replace("\\", "\\\\"));
action = _action;
}

@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attribs) {
Path fPath = file.toAbsolutePath();
if (matcher.matches(fPath)) {
visitedFiles.add(new File(file.toString()));
if (action != null) {
action.accept(file.toFile());
}
}
return FileVisitResult.CONTINUE;
}

@Override
public FileVisitResult visitFileFailed(Path file, IOException io)
{
return FileVisitResult.SKIP_SUBTREE;
}

public File[] getFiles() {
File[] files = (File[]) visitedFiles.toArray(new File[visitedFiles.size()]);
return files;
}

}

public static File[] listFiles(String pattern) {
pattern = new File(pattern).getAbsolutePath();
Path startDir = Paths.get(getStartingDir(pattern));

File[] files = new File[0];
FileGlobVisitor matcherVisitor = new FileGlobVisitor(pattern);
try {
Files.walkFileTree(startDir, matcherVisitor);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

files = matcherVisitor.getFiles();

return files;
}


public static File[] listFiles(File rootDir, String[] patterns) {
if (!rootDir.isDirectory()) {
throw new IllegalArgumentException("Root path was not a directory (was " + rootDir.toString() + ")");
}

for (int ii = 0; ii < patterns.length; ii++) {
patterns[ii] = FilenameUtils.concat(rootDir.toString(), patterns[ii]);
}
File[] matchingFiles = listFiles(patterns);

return matchingFiles;
}


public static File[] listFiles(String[] patterns) {
Set<File> matchingFilesLst = new HashSet<File>();
for (String aPattern: patterns) {
File[] filesThisPattern = listFiles(aPattern);
for (File aFile: filesThisPattern) {
matchingFilesLst.add(aFile);
}
}

File[] matchingFilesArr = matchingFilesLst.toArray(new File[matchingFilesLst.size()]);
return matchingFilesArr;
}

public static void deleteFiles(String pattern) {
pattern = new File(pattern).getAbsolutePath();

Path startDir = Paths.get(getStartingDir(pattern));

File[] files = new File[0];
FileGlobVisitor matcherVisitor = new FileGlobVisitor(pattern, new FileDeleter());
try {
Files.walkFileTree(startDir, matcherVisitor);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

files = matcherVisitor.getFiles();
}

public static void main(String[] args) {
String pattern = args[0];
System.out.println("Files matching: " + pattern);
File[] files = FileGlob.listFiles(pattern);
if (files.length == 0) {
System.out.println("No match found");
}
for (File aFile: files) {
System.out.println(aFile.getAbsolutePath());
}
}

protected static String getStartingDir(String pattern) {
String startingDir = truncatePatternToFirstWildcard(pattern);
if (!endsWithFileSeparator(startingDir)) {
File parentDir = Paths.get(startingDir).toFile().getParentFile();
if (parentDir != null) {
startingDir = parentDir.toString();
}
}

return startingDir;
}

private static boolean endsWithFileSeparator(String path) {
//Non-Windows OS
if (!System.getProperty("os.name").toLowerCase().startsWith("win")) {
return path.endsWith(File.separator);
} else {
return path.endsWith(File.separator) | path.endsWith("/");
}
}

private static String truncatePatternToFirstWildcard(String pattern) {
pattern = pattern.replaceFirst("[\\*\\?].*$", "");

return pattern;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,9 @@ public class BinaryCasWriterReaderTest
private File testFolder;

@Before
public void setup()
public void setup() throws IOException
{
DkproTestContext.get().initializeTestWorkspace();
testFolder = testContext.getTestOutputFolder();
}

Expand Down Expand Up @@ -363,11 +364,11 @@ BinaryCasWriter.PARAM_TARGET_LOCATION, new File(testFolder, "out.bin"),
assertEquals("out.bin", getFeature(dmd, "documentTitle", String.class));
assertEquals("out.bin", getFeature(dmd, "documentId", String.class));
assertTrue(separatorsToUnix(getFeature(dmd, "documentUri", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/out.bin"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/output/out.bin"));
assertTrue(separatorsToUnix(getFeature(dmd, "collectionId", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/output/"));
assertTrue(separatorsToUnix(getFeature(dmd, "documentBaseUri", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileWithoutDocumentMetaData/output/"));
assertEquals(false, getFeature(dmd, "isLastSegment", Boolean.class));
}

Expand Down Expand Up @@ -401,11 +402,11 @@ BinaryCasWriter.PARAM_TARGET_LOCATION, new File(testFolder, "out.bin"),
assertEquals("out.bin", getFeature(dmd, "documentTitle", String.class));
assertEquals("out.bin", getFeature(dmd, "documentId", String.class));
assertTrue(separatorsToUnix(getFeature(dmd, "documentUri", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/out.bin"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/output/out.bin"));
assertTrue(separatorsToUnix(getFeature(dmd, "collectionId", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/output/"));
assertTrue(separatorsToUnix(getFeature(dmd, "documentBaseUri", String.class))
.endsWith("/target/test-output/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/"));
.endsWith("test-workspaces/BinaryCasWriterReaderTest-testReadingFileOverridingDocumentMetaData/output/"));
assertEquals(false, getFeature(dmd, "isLastSegment", Boolean.class));
}

Expand Down
5 changes: 5 additions & 0 deletions dkpro-core-io-bnc-asl/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.dkpro.core</groupId>
<artifactId>dkpro-core-api-io-asl</artifactId>
Expand Down
Loading