Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline should handle manifest files with OSX/Windows line endings #119

Open
mhidas opened this issue Sep 6, 2018 · 4 comments
Open

Pipeline should handle manifest files with OSX/Windows line endings #119

mhidas opened this issue Sep 6, 2018 · 4 comments

Comments

@mhidas
Copy link
Contributor

mhidas commented Sep 6, 2018

I've just tried uploading a .map_manifest file into the new AODN_moorings_nocheck pipeline (see https://github.com/aodn/chef-private/pull/2984) on 4-nec-hob. The manifest file looked like this:

/mnt/ebs/tmp/test_data/QLD/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc,IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc
/mnt/ebs/tmp/test_data/QLD/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-19_END-20100601T040000Z_C-20120201T063238Z.nc,IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-19_END-20100601T040000Z_C-20120201T063238Z.nc
/mnt/ebs/tmp/test_data/QLD/IMOS_ANMN-QLD_CTPSOKUE_20101103T090500Z_GBRMYR_FV01_GBRMYR-1010-WQM-188_END-20110414T212900Z_C-20120129T141746Z.nc,IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20101103T090500Z_GBRMYR_FV01_GBRMYR-1010-WQM-188_END-20110414T212900Z_C-20120129T141746Z.nc

The pipeline sort of pretended to process the file, run a harvester, and eventually reported SUCCESS in the log (see 4-nec-hob:/mnt/ebs/log/pipeline/process/tasks.AODN_moorings_nocheck.log, task id aa617f77-4c7a-46a9-828f-964672c52a8e), but in fact it failed completely:

  • Only one harvester (moorings_metadata) was selected to run, though several others have regexes matching the files in the collection;
  • The harvester ran without errors, but actually harvested nothing (it did report a warning for every file like "FILE_INDEX_UPDATER - WARNING: IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc not found on index");
  • It did something on S3, but the files are not correctly uploaded. I can see them e.g. in here, but clicking on any of the files results in the error message "The specified key does not exist."

Trying to list the uploaded files on S3 gives weird results:

4-nec-hob:/mnt/imos-test-data$ ll IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/
ls: cannot access 'IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc'$'\n': No such file or directory
ls: cannot access 'IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-19_END-20100601T040000Z_C-20120201T063238Z.nc'$'\n': No such file or directory
ls: cannot access 'IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20101103T090500Z_GBRMYR_FV01_GBRMYR-1010-WQM-188_END-20110414T212900Z_C-20120129T141746Z.nc'$'\n': No such file or directory
ls: cannot access 'IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-187_END-20120412T221800Z_C-20121112T033903Z.nc'$'\n': No such file or directory
ls: cannot access 'IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-19_END-20120412T221800Z_C-20121112T033855Z.nc'$'\n': No such file or directory
total 2
drwxrwxrwx 1 root root 0 Jan  1  1970 ./
drwxrwxrwx 1 root root 0 Jan  1  1970 ../
?????????? ? ?    ?    ?            ? IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc?
?????????? ? ?    ?    ?            ? IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-19_END-20100601T040000Z_C-20120201T063238Z.nc?
?????????? ? ?    ?    ?            ? IMOS_ANMN-QLD_CTPSOKUE_20101103T090500Z_GBRMYR_FV01_GBRMYR-1010-WQM-188_END-20110414T212900Z_C-20120129T141746Z.nc?
?????????? ? ?    ?    ?            ? IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-187_END-20120412T221800Z_C-20121112T033903Z.nc?
?????????? ? ?    ?    ?            ? IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-19_END-20120412T221800Z_C-20121112T033855Z.nc?
drwxrwxrwx 1 root root 0 Jan  1  1970 non-QC/

Looks like maybe some characters got added on to the end of each file name somewhere along the way?

@mhidas mhidas added the bug label Sep 6, 2018
@mhidas
Copy link
Contributor Author

mhidas commented Sep 6, 2018

Hmmm... I've just tested a similar manifest file in the ASYNC_UPLOAD pipeline, and got the same result (minus the harvesting, which that pipeline explicitly does not do).

@ghost
Copy link

ghost commented Sep 6, 2018

The keys actually do have a newline character on the end, which is valid in S3 but confuses the S3 client applications like s3fuse and our bucket browser script:

01:34:15 ~$ aws s3api list-objects-v2 --no-sign --bucket imos-test-data --prefix IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/
{
    "Contents": [
        {
            "LastModified": "2018-09-06T02:01:49.000Z", 
            "ETag": "\"c18c09d3f737a88bfbcfbdea9d52fa4d-12\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-189_END-20100601T040000Z_C-20120201T063245Z.nc\n", 
            "Size": 100566012
        }, 
        {
            "LastModified": "2018-09-06T02:01:52.000Z", 
            "ETag": "\"27fb05f929e127e0754b27b0d2bcbfb1-12\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20091120T045000Z_GBRMYR_FV01_GBRMYR-0911-WQM-19_END-20100601T040000Z_C-20120201T063238Z.nc\n", 
            "Size": 93945740
        }, 
        {
            "LastModified": "2018-09-06T02:01:54.000Z", 
            "ETag": "\"825f276178715c4ec8e61c9714197560-11\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20101103T090500Z_GBRMYR_FV01_GBRMYR-1010-WQM-188_END-20110414T212900Z_C-20120129T141746Z.nc\n", 
            "Size": 84451780
        }, 
        {
            "LastModified": "2018-09-06T02:01:56.000Z", 
            "ETag": "\"6d0a812979510f78b64ba1383a460281-13\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-187_END-20120412T221800Z_C-20121112T033903Z.nc\n", 
            "Size": 108330448
        }, 
        {
            "LastModified": "2018-09-06T02:01:58.000Z", 
            "ETag": "\"9d6fe256ca93bf2a3c634b6f253e726e-9\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/IMOS_ANMN-QLD_CTPSOKUE_20111017T062000Z_GBRMYR_FV01_GBRMYR-1110-WQM-19_END-20120412T221800Z_C-20121112T033855Z.nc\n", 
            "Size": 75105704
        }, 
        {
            "LastModified": "2018-09-06T02:02:01.000Z", 
            "ETag": "\"56e74a75ab329970f388200a721e75a6-13\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/non-QC/IMOS_ANMN-QLD_RCTPSOKUE_20111017T062000Z_GBRMYR_FV00_GBRMYR-1110-WQM-187_END-20120412T221800Z_C-20121112T033903Z.nc\n", 
            "Size": 108329076
        }, 
        {
            "LastModified": "2018-09-06T02:02:03.000Z", 
            "ETag": "\"398c2d0d047c16dd315c1e70f6448717-9\"", 
            "StorageClass": "STANDARD", 
            "Key": "IMOS/ANMN/QLD/GBRMYR/Biogeochem_timeseries/non-QC/IMOS_ANMN-QLD_RCTPSOKUE_20111017T062000Z_GBRMYR_FV00_GBRMYR-1110-WQM-19_END-20120412T221800Z_C-20121112T033855Z.nc\n", 
            "Size": 75104268
        }
    ]
}

Do you have the exact input file still? I'm sure this could be picked up in unit tests...

@mhidas
Copy link
Contributor Author

mhidas commented Sep 6, 2018

Yes, here it is (just added the ".txt" so GitHub accepts it)
GBRMYR.2012.map_manifest.txt

Could the problem be with the input file? It seems to have just standard CRLF line terminators.

@mhidas mhidas changed the title Harvest doesn't work when using .map_manifest input file Pipeline fails but pretends to succeed with .map_manifest input file Sep 6, 2018
@ghost
Copy link

ghost commented Sep 6, 2018

Yes that's the problem... CRLF is DOS line endings, which I'd say is what tripped it up. Might need to look at handling that scenario when parsing the text input files, because they could come from OSX or Windows via pasting etc., so it should handle line endings from all 3 platforms "just in case".

@mhidas mhidas changed the title Pipeline fails but pretends to succeed with .map_manifest input file Pipeline should handle manifest files with OSX/Windows line endings Oct 30, 2018
@mhidas mhidas added enhancement and removed bug labels Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant