Skip to content

Latest commit

 

History

History
11 lines (10 loc) · 1.95 KB

metrics.md

File metadata and controls

11 lines (10 loc) · 1.95 KB

We classify data quality issues into 7 Quality Metrics, with the following definitions:

Quality Metric Description
COMPLETENESS Refers to data that is incomplete or completely missing. For example, whether some text data is truncated or the content is empty.
EFFECTIVENESS Refers to whether the data is meaningful, suitable for a specific task, and conforms to the expected format or standard. For example, whether the text content contains garbled characters.
FLUENCY Refers to whether the data is fluent, grammatically correct, and can be read naturally. For example, whether sentences conform to the grammatical rules.
RELEVANCE Refers to data that contains data that is irrelevant to the task. For example, some texts describe medical knowledge, but insert irrelevant advertising content.
SECURITY Refers to whether the data contains sensitive or private information and whether it conforms to the culture and values of various countries (the other party's values & our values).
SIMILARITY Refers to whether the data content is repeated or there is very similar content.
UNDERSTANDABILITY Refers to whether the data is easy to understand and interpret. For example, whether the data is clear, unambiguous, and meaningful in context.