Skip to content

Commit

Permalink
refactor(etl): update CORDIS producer to cover FP1 to FP7 on top of H…
Browse files Browse the repository at this point in the history
…2020 (#164)

* refactor(etl): update CORDIS producer to cover FP1 to FP7 on top of H2020

* update docs

* Correction

* Ensure value is returned
  • Loading branch information
kalinchernev authored and yhuard committed Oct 5, 2018
1 parent d7acd04 commit 5567376
Show file tree
Hide file tree
Showing 6 changed files with 226 additions and 32 deletions.
71 changes: 42 additions & 29 deletions docs/types/etls/cordis-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Transform function: [implementation details][2]

**Parameters**

- `record` **[Object][3]** Piece of data to transform before going to harmonized storage.
- `record` **[Object][3]** Piece of data to transform before going to harmonized storage.

Returns **Project** JSON matching the type fields.

Expand All @@ -19,52 +19,66 @@ Returns **Project** JSON matching the type fields.
Preprocess `funding_area`
Input fields taken from the `record` are:

- `fundingScheme`
- `fundingScheme`

**Parameters**

- `record` **[Object][3]** The row received from parsed file
- `record` **[Object][3]** The row received from parsed file

Returns **[Array][4]**
Returns **[Array][4]**

### getBudget

Preprocess budget

**Parameters**

- `record` **[Object][3]** The row received from parsed file
- `record` **[Object][3]** The row received from parsed file

Returns **Budget**
Returns **Budget**

### getDescription

Preprocess description
Concatenation of several fields as requested in [https://webgate.ec.europa.eu/CITnet/jira/browse/EUBFR-200?focusedCommentId=2808845&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2808845][5]
Input fields taken from the `record` are:

- `acronym`
- `objective`
- `rcn`
- `topic`
- `acronym`
- `objective`
- `rcn`
- `topic`

**Parameters**

- `record` **[Object][3]** The row received from parsed file
- `record` **[Object][3]** The row received from parsed file

Returns **[String][6]**
Returns **[String][6]**

### getProjectId

Preprocess `project_id`
Seeks for values in the following precedence:

- `id`
- `reference`

**Parameters**

- `record` **[Object][3]** The row received from parsed file

Returns **[String][6]**

### getLocations

Preprocess project_locations
Input fields taken from the `record` are:

- `participants`
- `participantCountries`
- `participants`
- `participantCountries`

**Parameters**

- `record` **[Object][3]** The row received from parsed file
- `record` **[Object][3]** The row received from parsed file

Returns **[Array][4]** List of {Location} objects for `project_locations` field

Expand All @@ -73,14 +87,14 @@ Returns **[Array][4]** List of {Location} objects for `project_locations` field
Preprocess third parties
Input fields taken from the `record` are:

- `coordinator`
- `coordinatorCountry`
- `participants`
- `participantCountries`
- `coordinator`
- `coordinatorCountry`
- `participants`
- `participantCountries`

**Parameters**

- `record` **[Object][3]** The row received from parsed file
- `record` **[Object][3]** The row received from parsed file

Returns **[Array][4]** List of {ThirdParty} objects

Expand All @@ -90,27 +104,26 @@ Format date

**Parameters**

- `date` **[Date][7]** Date in `YYYY-MM-DD` (ISO) format
- `date` **[Date][7]** Date in `YYYY-MM-DD` or `DD/MM/YYYY` formats.

**Examples**

```javascript
input => "2018-12-31"
output => "2018-12-31T00:00:00.000Z"
input => '2018-12-31';
output => '2018-12-31T00:00:00.000Z';
```

```javascript
input => '01/01/1986';
output => '1986-01-01T00:00:00.000Z';
```

Returns **[Date][7]** The date formatted into an ISO 8601 date format

[1]: https://github.com/ec-europa/eubfr-data-lake/blob/master/services/ingestion/etl/cordis/csv/test/stubs/record.json

[2]: https://github.com/ec-europa/eubfr-data-lake/blob/master/services/ingestion/etl/cordis/csv/src/lib/transform.js

[3]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object

[4]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array

[5]: https://webgate.ec.europa.eu/CITnet/jira/browse/EUBFR-200?focusedCommentId=2808845&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2808845

[6]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String

[7]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Date
2 changes: 1 addition & 1 deletion services/ingestion/etl/cordis/csv/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Model to compare with is available at: https://ec-europa.github.io/eubfr-data-la
| Field | Target |
| -------------------- | --------------------- |
| rcn | description |
| id | project_id |
| id, reference | project_id |
| acronym | description |
| status | status |
| programme | sub_programme_name |
Expand Down
33 changes: 31 additions & 2 deletions services/ingestion/etl/cordis/csv/src/lib/transform.js
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,18 @@ const getDescription = record => {
.join('\n');
};

/**
* Preprocess `project_id`
* Seeks for values in the following precedence:
* - `id`
* - `reference`
*
* @memberof CordisCsvTransform
* @param {Object} record The row received from parsed file
* @returns {String}
*/
const getProjectId = record => record.id || record.reference || '';

const getCode = code => (code ? getCountryCode(code.trim().toUpperCase()) : '');

/**
Expand Down Expand Up @@ -198,16 +210,33 @@ const getThirdParties = record => {
* Format date
*
* @memberof CordisCsvTransform
* @param {Date} date Date in `YYYY-MM-DD` (ISO) format
* @param {Date} date Date in `YYYY-MM-DD` or `DD/MM/YYYY` formats.
* @returns {Date} The date formatted into an ISO 8601 date format
*
* @example
* input => "2018-12-31"
* output => "2018-12-31T00:00:00.000Z"
*
* @example
* input => "01/01/1986"
* output => '1986-01-01T00:00:00.000Z'
*/
const formatDate = date => {
if (!date || typeof date !== 'string') return null;

// Case `DD/MM/YYYY`:
if (date.includes('/')) {
const d = date.split(/\//);
if (d.length !== 3) return null;
const [day, month, year] = d;
if (!day || !month || !year) return null;
try {
return new Date(Date.UTC(year, month - 1, day)).toISOString();
} catch (e) {
return null;
}
}
// Case `YYYY-MM-DD`:
try {
return new Date(date).toISOString();
} catch (e) {
Expand Down Expand Up @@ -240,7 +269,7 @@ export default (record: Object): Project | null => {
ec_priorities: [],
media: [],
programme_name: record.frameworkProgramme || '',
project_id: record.id || '',
project_id: getProjectId(record),
project_locations: getLocations(record),
project_website: record.projectUrl || '',
complete: true,
Expand Down
25 changes: 25 additions & 0 deletions services/ingestion/etl/cordis/csv/test/stubs/recordReference.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"rcn": "14088",
"reference": "EN3M0034",
"acronym": "",
"status": "",
"programme": "FP1-ENNONUC 3C",
"topics": "",
"frameworkProgramme": "FP1",
"title":
"Energy and environment - Optimal control strategies for reducing emissions from energy production and energy use",
"startDate": "01/01/1986",
"endDate": "31/03/1988",
"projectUrl": "",
"objective":
"As a result of the rapid increase in forest damages in Mid-Europe, the need for the reduction of air pollutions from energy conversion and energy-end-use technologies became an important political objective.",
"totalCost": "",
"ecMaxContribution": "",
"call": "",
"fundingScheme": "CSC",
"coordinator": "UNIVERSITAET KARLSRUHE (TECHNISCHE HOCHSCHULE)",
"coordinatorCountry": "DE",
"participants": "FORSCHUNGSZENTRUM JUELICH GMBH;UNIVERSITAET STUTTGART",
"participantCountries": "DE",
"subjects": ""
}
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,116 @@ objective: The overarching objective of UNISECO is to strengthen the sustainabil
}
`;

exports[`DG CORDIS CSV transformer Can handle records which contain project_id in field called reference instead of an id 1`] = `
Object {
"action": "",
"budget": Object {
"eu_contrib": Object {
"currency": "",
"raw": "",
"value": 0,
},
"funding_area": Array [
"CSC",
],
"mmf_heading": "",
"other_contrib": Object {
"currency": "",
"raw": "",
"value": 0,
},
"private_fund": Object {
"currency": "",
"raw": "",
"value": 0,
},
"public_fund": Object {
"currency": "",
"raw": "",
"value": 0,
},
"total_cost": Object {
"currency": "",
"raw": "",
"value": 0,
},
},
"call_year": "",
"complete": true,
"description": "rcn: 14088
objective: As a result of the rapid increase in forest damages in Mid-Europe, the need for the reduction of air pollutions from energy conversion and energy-end-use technologies became an important political objective.",
"ec_priorities": Array [],
"media": Array [],
"programme_name": "FP1",
"project_id": "EN3M0034",
"project_locations": Array [
Object {
"address": "",
"centroid": null,
"country_code": "DE",
"location": null,
"nuts": Array [],
"postal_code": "",
"region": "",
"town": "",
},
],
"project_website": "",
"related_links": Array [],
"reporting_organisation": "RTD",
"results": Object {
"available": "",
"result": "",
},
"status": "",
"sub_programme_name": "FP1-ENNONUC 3C",
"success_story": "",
"themes": Array [],
"third_parties": Array [
Object {
"address": "",
"country": "DE",
"email": "",
"name": "UNIVERSITAET KARLSRUHE (TECHNISCHE HOCHSCHULE)",
"phone": "",
"region": "",
"role": "coordinator",
"type": "",
"website": "",
},
Object {
"address": "",
"country": "DE",
"email": "",
"name": "FORSCHUNGSZENTRUM JUELICH GMBH",
"phone": "",
"region": "",
"role": "participant",
"type": "",
"website": "",
},
Object {
"address": "",
"email": "",
"name": "UNIVERSITAET STUTTGART",
"phone": "",
"region": "",
"role": "participant",
"type": "",
"website": "",
},
],
"timeframe": Object {
"from": "1986-01-01T00:00:00.000Z",
"from_precision": "day",
"to": "1988-03-31T00:00:00.000Z",
"to_precision": "day",
},
"title": "Energy and environment - Optimal control strategies for reducing emissions from energy production and energy use",
"type": Array [],
}
`;

exports[`DG CORDIS CSV transformer Produces correct JSON output structure 1`] = `
Object {
"action": "",
Expand Down
17 changes: 17 additions & 0 deletions services/ingestion/etl/cordis/csv/test/unit/lib/transform.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,17 @@
import mapper from '../../../src/lib/transform';
import testRecord from '../../stubs/record.json';
import testRecord2 from '../../stubs/record2.json';
import testRecordReferenceId from '../../stubs/recordReference.json';

describe('DG CORDIS CSV transformer', () => {
let result = {};
let resultMultiple = {};
let resultReferenceId = {};

beforeAll(() => {
result = mapper(testRecord);
resultMultiple = mapper(testRecord2);
resultReferenceId = mapper(testRecordReferenceId);
});

test('Returns null when record is not provided', () => {
Expand All @@ -26,4 +29,18 @@ describe('DG CORDIS CSV transformer', () => {
test('Can handle multi-value inputs for participants and coordinators', () => {
expect(resultMultiple).toMatchSnapshot();
});

test('Can handle records which contain project_id in field called reference instead of an id', () => {
// FP before 5 are with `reference`, whereas newer FPs are with `id`.
expect(resultReferenceId).toMatchSnapshot();
});

test('Can handle 2 types of dates: `YYYY-MM-DD` and `DD/MM/YYYY`', () => {
// Newer FPs are `YYYY-MM-DD`:
expect(result.timeframe.from).toEqual('2018-11-01T00:00:00.000Z');
// Older FPs are `DD/MM/YYYY`:
expect(resultReferenceId.timeframe.from).toEqual(
'1986-01-01T00:00:00.000Z'
);
});
});

0 comments on commit 5567376

Please sign in to comment.