Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: facet missing data bug fix #9434

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

wirhabenzeit
Copy link

@wirhabenzeit wirhabenzeit commented Sep 16, 2024

PR Description

This is a suggested fix for #5937 and potentially also #8675 and vega/altair#3588 and vega/altair#3481

I included two examples demonstrating the bug. For instance the spec

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "data/cars.json"},
  "facet": {
    "column": {"field": "Origin", "sort": ["USA", "Europe", "Japan"]},
    "row": {"field": "Cylinders", "sort": {"field": "Horsepower", "op": "mean"}}
  },
  "spec": {
      "mark": "point",
      "encoding": {
        "x": {"field": "Horsepower", "type": "quantitative"},
        "y": {"field": "Acceleration", "type": "quantitative"},
        "color": {"field": "Origin", "type": "nominal"},
        "shape": {"field": "Cylinders", "type": "nominal"}
      }
  }
}

results in

visualization-2

while it should look like this:

facet_bug_cars

I think the bug is due to the following. The key part of the compiled Vega spec looks like

{
      "name": "cell",
      "type": "group",
      "style": "cell",
      "from": {
        "facet": {
          "name": "facet",
          "data": "source_0",
          "groupby": ["Cylinders", "Origin"],
          "aggregate": {
            "cross": true,
            "fields": [
              "mean_Horsepower_by_Cylinders",
              "column_Origin_sort_index"
            ],
            "ops": ["max", "max"],
            "as": ["mean_Horsepower_by_Cylinders", "column_Origin_sort_index"]
          }
        }
      },
}

As far as I understand this performs a groupby according to cylinders and origin, and then for each group determines mean_Horsepower_by_Cylinders and column_Origin_sort_index used for sorting by aggregating over the data points in the facet and taking the max (it does not matter as the column sort index is the same for all points in the facet anyways, so one could also take e.g. the min). The problem arises when a facet is empty as then the aggregation cannot determine the valid sort index. As a result, some facets are not assigned a sort-index, see this screenshot from the Vega Editor -> Data Viewer -> cell:

Screenshot 2024-09-17 at 00 32 24

For some reason the cells with missing sort-index result in some data points being placed in the wrong facet (not exactly sure what goes wrong here exactly).

My bug fix suggestion simply replaces the above spec by

{
      "name": "cell",
      "type": "group",
      "style": "cell",
      "from": {
        "facet": {
          "name": "facet",
          "data": "source_0",
          "groupby": [
            "mean_Horsepower_by_Cylinders",
            "column_Origin_sort_index"
          ],
          "aggregate": {"cross": true}
        }
      },
}

i.e. it directly groups by the sort index, avoiding cells without assigned indices. This is simpler than the original code. For all examples I looked at, it worked. However, maybe I overlook some unintended side effects.

Out of all the tests a single test in test/compile/facet.test.ts fails, namely line 507. As far as I can see this test directly checks the section of the compiled Vega spec outlined above, so in some sense I would expect it to fail.

Checklist

  • This PR is atomic (i.e., it fixes one issue at a time).
  • The title is a concise semantic commit message (e.g. "fix: correctly handle undefined properties").
  • yarn test runs successfully

@wirhabenzeit wirhabenzeit requested a review from a team as a code owner September 16, 2024 22:41
@wirhabenzeit wirhabenzeit changed the title facet missing data bug fix fix: facet missing data bug fix Sep 16, 2024
@domoritz
Copy link
Member

Thanks for the pull request. Please make sure that the test pass and if you can that the linting also passes.

@wirhabenzeit
Copy link
Author

wirhabenzeit commented Sep 17, 2024

@domoritz As I mentioned in the pull request there is one failing test, namely

it('should add cross and sort if we facet by multiple dimensions with sort fields', () => {
const model: FacetModel = parseFacetModelWithScale({
facet: {
row: {field: 'a', type: 'ordinal', sort: {field: 'd', op: 'median'}},
column: {field: 'b', type: 'ordinal', sort: {field: 'e', op: 'median'}}
},
spec: {
mark: 'point',
encoding: {
x: {field: 'c', type: 'quantitative'}
}
}
});
model.parse();
const marks = model.assembleMarks();
expect(marks[0].from.facet.aggregate).toEqual({
cross: true,
fields: ['median_d_by_a', 'median_e_by_b'],
ops: ['max', 'max'],
as: ['median_d_by_a', 'median_e_by_b']
});
expect(marks[0].sort).toEqual({
field: ['datum["median_d_by_a"]', 'datum["median_e_by_b"]'],
order: ['ascending', 'ascending']
});
});

Since my suggested workaround changes the facet grouping as described above, it is clear that this test should fail in

expect(marks[0].from.facet.aggregate).toEqual({
cross: true,
fields: ['median_d_by_a', 'median_e_by_b'],
ops: ['max', 'max'],
as: ['median_d_by_a', 'median_e_by_b']
});

I have now updated the testing file in my fork by simply changing the above to

expect(marks[0].from.facet.aggregate).toEqual({
cross: true
});

in order to match the updated grouping logic.

Regarding the linting, it already seems to pass?

@joelostblom
Copy link
Contributor

@domoritz Is this PR looking good to you? Fixing this bug in VL would help us understand if there is anything additional going wrong in Altair for vega/altair#3588.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants