- Common Data Formats and the Entity-Relational Model
The NIST Special Publication (SP) 1500 series common data formats were developed from a high level UML model (more here). One advantage of this approach is that many implementation formats can be derived from it. NIST provides JSON and XML schemas, but other formats are possible.
Although NoSQL has picked up some traction, the vast majority of systems of record (e.g. Election Management Systems) continue to use relational (SQL) databases. To interoperate between such systems, we must map not only the syntax of JSON or XML, but also the semantics, or meaning of the data. The difficulty of mapping can be eased somewhat if both the system of record and the CDFs use the same conceptual definitions.
This prototype explores the use of the developed UML models to derive relational schemas, such that the same conceptual definitions are shared.
It is recommended to start with the pregenerated SQL Data Definition Language (DDL) scripts. If these scripts do not serve your needs, you can edit the Entity-Relational (logical) or relational model directly using the freely available Oracle Data Modeler and generate new DDL.
The Relational Database Management System (RDBMS) format specific files are located in the CVR/relational and ENR/v2/relational directories of this repository, respectively.
DDL scripts have been provided for Oracle and SQL Server. You can also generate DB2 scripts from Oracle Data Modeler, but this has not been tested.
- Generates names using under_score notation
- Generates names using CamelCase notation
This section describes additional, optional tasks that can be performed to customize the DDL to meet your individual needs. You will need to use the ODM designs to generate new DDL.
While Entity-Relational (ER) models do not require the length of the data type to be specified, a practical reality of relational databases is that data type length needs to be set. All String types are mapped to the maximum character length allowed by the database. However, you will likely want to adjust the lengths to suit your needs.
The relational model has no concept of hierarchies (i.e. parent/child relationships). However, the concept can be simulated in a variety of ways, described below.
Examples of parent/child relationships in CDFs include
Contest
(parent) andCandidateContest
(child), orContestSelection
(parent) andBallotMeasureSelection
(child).
- Single Table, subtype hierarchies flattened into a single table, a type discriminant used to determine which attributes should be used.
- Table per child, table includes all attributes of child and parent in single table
- Table per entity, maps 1-1 with UML model
The DDL provided as part of this repository uses the table per entity approach. However, if the ER model is adjusted to use a different method, and forward engineered to the relational model, the DDL can be regenerated.
Repeating primitives can be mapped one of two ways:
- Each enumeration is given its own entity, with mapping tables to resolve many-to-many relationship.
- Each enumeration/attribute combination is given its own entity, with a many to one relationship.
The ER and relational models use the second approach. It is strongly recommended to accept this approach, as the first leads to a much more complex relational model. This setting cannot be changed in Oracle Data Modeler (ODM). Please contact Hilton Roscoe for customization options.
The relational model was mechanically derived from the UML class model, using previously developed Model Driven Architecture (MDA) tools. A script was constructed to translate the representations in UML to equivalent structures in ODM's Entity-Relational model. The ER model was forward-engineered to the relational model (either Oracle or SQL Server), and finally DDL generated for each database.
Warning: This section assumes a fundamental knowledge of The Entity-Relational and UML Class Models. Understanding of this material is NOT required to use the prototype.
This section uses Barker Entity-Relational notation.
The Unified Modeling Language (UML) and the Entity-Relational (ER) model are quite different. The following sections describe how the UML Classes and associations were mapped to equivalent structures in the ER model.
UML and ER uses language somewhat differently. Multiplicity in UML refers to how many instances of something may or must appear. In ER, two concepts are used, cardinality and optionality. Cardinality can be seen as representing the upper bound, i.e. *
in the UML multiplicity 1..*
. Cardinality allows for descriptions of one (1
) or many (*
). A UML upper bound of 5
, would become a cardinality of *
. To represent the lower bound, optionality can be set (optional
or mandatory
).
UML directed associations are mapped to ER relations. The association source end (e.g. the end without an arrow) plays no part in the mapping. The target end is mapped to a cardinality of 1
if its multiplicity is 0..1
or 1
, otherwise it is mapped to *
. The optionality for the source end is always optional
, as the entity may be instantiated without the relationship (unlike directed compositions). The optionality for the target is based on the lower cardinality being greater than zero.
End | Cardinality | Optional |
---|---|---|
Source | * | Yes |
Target | * if upper multiplicity > 0 else 1 |
Yes if lower multiplicity = 0 else No |
In this example, the Election
class has a directed association with ReportingUnit
. Additionally, a Role Name is set on the target end. In the translated ER model, ElectionScope
becomes a role on election (which will influence the generated foreign key name in the relational model). Note also that each ReportingUnit
may be scoping one or more Election
(the crow's foot).
Directed compositions behave like directed associations, except that their source end is never optional. This is because the composition is stating a part/whole relationship.
End | Cardinality | Optional |
---|---|---|
Source | 1 | No |
Target | * if upper multiplicity > 1 else 1 |
Yes if lower multiplicity = 0 else No |
In this example, Election
composes zero or more Contest
. In the translated ER model, the relationship is required on both ends (notated by the solid line on both sides), and the cardinality is indicated with the crow's foot.
UML attributes are mapped to ER attributes, except when the upper multiplicity is greater than 1 or the attribute represents a non-primitive type (i.e. a class). In that case they are mapped to their own entity. This is done to preserve First Normal Form (1NF).
UML primitives and enumerations are mapped to relational attributes, except when the UML attribute's upper multiplicity is greater than 1. In this case it is mapped to its own entity. This entity takes the name of the parent class. So if CastVoteRecordReport
contains an attribute of type ReportType
, the generated entity will be named CastVoteRecordReportType
.
[should be identifying? They aren't now]
End | Cardinality | Optional |
---|---|---|
Source | 1 | No |
Target | * if upper multiplicity > 1 else 1 |
Yes if lower multiplicity = 0 else No |
The AddressLine
has an UML multiplicity of 0..*
. In the translated ER model, a new entity is created with a name in the form of {ClassName}{AttributeName}
.
UML attributes whose type is a class are given their own entity. This entity takes the name of the parent class. So if GpUnit
contains an attribute of type Code
, the generated class will be named GpUnitCode
End | Cardinality | Optional |
---|---|---|
Source | * | Yes |
Target | * if upper multiplicity > 0 else 1 |
Yes if lower multiplicity = 0 else No |
The Entity-Relational model has no conception of roles. However, ODM supports passing role names through in order to influence the names of foreign keys columns in the relational model.
Role mapping fkRole
is handled as follows:
Upper Multiplicity | FK owner | FK Name |
---|---|---|
* | source | Class name |
1 | target | Property name |
Without the fkRole being set, the foreign key column will take the name of the reference table, e.g. GpUnit_ID (for Election_ReportingUnit_FK). In this case, the role name is ElectionScope
and the FK column becomes ElectionScope_ID
.
Data types in the UML model are mapped to equivalent types in the ER model. ER data types can be either Logical
data types or Domain
data types. Logical data types map directly to the physical type used by the database. Domain data types can be seen a subtypes of logical types, allowing them to be further constrained. For example, enumerations in the UML model are represented as domain types of the same name.
UML Type | Relational Type | Oracle Type | SQL Server Type |
---|---|---|---|
Integer | Integer | INTEGER | INTEGER |
Float | Numeric | NUMBER | NUMERIC |
Double | Numeric | NUMBER | NUMERIC |
Boolean | Boolean | NUMBER | BIT |
String | String | VARCHAR2 | VARCHAR |
DateTime | Timestamp | TIMESTAMP | DATETIME |
Byte | BLOB | BLOB | IMAGE |
Enumerations become identically named domains types of logical type String
. The enumeration values form the value list, which will become a CHECK
constraint in the generated DDL
.