Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NewTarIndex race condition when run in a goroutine #192

Open
Noxsios opened this issue Jul 7, 2023 · 6 comments
Open

NewTarIndex race condition when run in a goroutine #192

Noxsios opened this issue Jul 7, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Noxsios
Copy link

Noxsios commented Jul 7, 2023

What happened:

In our build system we pull+cache images using crane in an OCI layout format. We then attempt to SBOM these images which may share layer tarballs.

Concurrently running Image.Read results in the following error being returned:

l.indexedContent, err = file.NewTarIndex(
	tarFilePath,
	layerTarIndexer(tree, l.fileCatalog, &l.Metadata.Size, l, monitor),
)
if err != nil {
	return fmt.Errorf("failed to read layer=%q tar : %w", l.Metadata.Digest, err)
}
failed to read
 layer="sha256:994393dc58e7931862558d06e46aa2bb17487044f670f310dffe1d24e4d1eec7" tar : unexpected EOF

I believe this is caused by stereoscope re-using file handles in os.Open resulting in a race condition of reads.

What you expected to happen:

Image SBOMing is go-routine safe w/ images that share layers.

How to reproduce it (as minimally and precisely as possible):

In a system w/ low IO speed (Github's default runner works):

  1. Pull multiple images that share layers across them and store as OCI layout (images w/ alpine base work)
  2. Attempt to go-routine image SBOMing

Anything else we need to know?:

Full stereoscope + syft logs in our CI run: https://github.com/defenseunicorns/zarf/actions/runs/5485435230/jobs/9994240328?pr=1887

Environment:

  • Github default runner ubuntu-latest
@Noxsios Noxsios added the bug Something isn't working label Jul 7, 2023
@tgerla
Copy link

tgerla commented Jul 13, 2023

Hi @Noxsios, thanks for filing this issue. Could you tell us a bit more about how you are making use of Syft and Stereoscope in this environment? Are you calling them as libraries? We are trying to sort through the logs but we could use more guidance from you.

I'll save the relevant logs here for posterity.

logs_92731.zip

@tgerla tgerla moved this to Awaiting Response in OSS Jul 13, 2023
@Noxsios
Copy link
Author

Noxsios commented Jul 14, 2023

Thanks for reaching out @tgerla!

Our tool zarf is a CLI designed to package up cloud native (docker/k8s) artifacts and be able to deploy them into air gapped (no connection) environments. During this process we SBOM all the files+images on a per-component level. An example component:

  - name: flux
    description: Installs the flux CRDs / controllers to use flux-based deployments in the cluster
    required: true
    manifests:
      - name: flux-crds
        namespace: flux
        files:
          - flux-install.yaml
    images:
      - ghcr.io/fluxcd/kustomize-controller:v0.27.1
      - ghcr.io/fluxcd/source-controller:v0.28.0

This component has 2 images, and we use Crane to pull the images and Syft/Stereoscope to perform the SBOMing, then include that resulting artifact in the package.

We are calling both Syft and Stereoscope as libraries main function we are calling. This functionality works 100% of the time when run in series, but fails with the previous error message when run in goroutines. The only difference between these two is this concurrency of operations. The layer sha256:994393dc... is a common base layer in many images, and I believe Stereoscope is attempting to read it multiple times, and getting to EOF sooner than expected on one or more threads. This behavior does not happen on our dev machines as their disk IO is able to read the layers concurrently fast enough that the threads do not clobber, but on low IO speed systems (such as Github Actions) the error occurs.

Other relevant info:

The layer cache directory given to image.New looks like the following:

❯ tree ~/.zarf-cache/images/
~/.zarf-cache/images/
├── sha256:094713c74b5761ab52fa3a95cba028e262ead0e2f4ed41857d8ae2bcd3d95f08
├── sha256:0a24540c1e5a47e5a8116c43151cb45c985f7a65afe5d4d65127d9c6fea2ffcf.tar
├── sha256:1a52c5e5391055e32568bc09f13c5a95989b93370c204d7c6d62bd0bd6e37375.tar
├── sha256:1e69fba811c05a87a92708bba2ae4c2d149fd236d853fbe83478066f77cb07a2
├── sha256:21149bf69d91e8a4c3ee46fdbd7c5ba54ee67da12abc51d4c1a29026f1ed0b7a.tar
├── sha256:37b289feee660235184d9711a14607ba91cecb5dd6b0c5402f0edd6cc87264f4.tar
├── sha256:3bdcce4584f90d548aa0d42b8358675c4ebcd6b548251799735c1a2fa454884f
...

Hope this helps!

@tgerla tgerla removed the status in OSS Aug 10, 2023
@kzantow
Copy link
Contributor

kzantow commented Aug 24, 2023

Hi @Noxsios -- we'd love to get this fixed; would you be able to provide a very simple sample Go application that uses stereoscope in a way that exhibits this behavior (even with something like hardcoded image tags/etc.)? It would definitely help us out to get it addressed more quickly... Thanks!

@kzantow kzantow moved this to Awaiting Response in OSS Aug 24, 2023
@Noxsios
Copy link
Author

Noxsios commented Aug 26, 2023

@kzantow Working on it! Ran into a few roadblocks getting the minimum POC working as our project is on syft v0.84.1 but replicating this code resulting in compilation errors in github.com/anchore/syft/syft/pkg/cataloger/java. I upgraded to v0.88.0, but this has breaking changes in the way that images are treated. source.NewFromImage no longer exists and I am currently working on:

src, location, err := image.DetectSource(imagesDir)
if err != nil {
	log.Fatal(err)
}
log.Printf("Detected source %q for %q", src, location)

cfg := source.StereoscopeImageConfig{
	Reference: imagesDir,
	From:      src,
}

syftSource, err := source.NewFromStereoscopeImage(cfg)
if err != nil {
	log.Fatal(err)
}

catalog, relationships, distro, err := syft.CatalogPackages(syftSource, cataloger.DefaultConfig())
if err != nil {
	log.Fatal(err)
}

However, the OciDirectory does not currently support an OCI layout with more than 1 image stored. Also as far as I can tell, Reference is getting used as the directory source, not the image.Source returned from image.DetectSource, which confuses me.

# pkg/image/oci/directory_provider.go:44

// for now, lets only support one image indexManifest (it is not clear how to handle multiple manifests)
if len(indexManifest.Manifests) != 1 {
    return nil, fmt.Errorf("unexpected number of OCI directory manifests (found %d)", len(indexManifest.Manifests))
}

The WIP POC lives here: https://github.com/Noxsios/goroutine-syft/blob/main/main.go . I will update if I can successfully replicate.

Current output:

 Layer "sha256:9398808236ffac29e60c04ec906d8d409af7fa19dc57d8c65ad167e9c4967006" is repeated 3 times
 Detected source "OciDirectory" for "/home/thicc/dev/goroutine-syft/images"
 unable to load image: unable to use OciDirectory source: unexpected number of OCI directory manifests (found 4)

@Noxsios
Copy link
Author

Noxsios commented Aug 26, 2023

This gets me much closer, but image.Image's v1.Image field is private.

layoutPath := layout.Path(imagesDir)

v1Img, err := layoutPath.Image(digest)
if err != nil {
	log.Fatal(err)
}

img := image.Image{}

imgSrc, err := source.NewFromStereoscopeImageObject(&img, digest.String(), nil)
if err != nil {
	log.Fatal(err)
}

catalog, relationships, distro, err := syft.CatalogPackages(imgSrc, cataloger.DefaultConfig())
if err != nil {
	log.Fatal(err)
}

@tgerla tgerla removed the status in OSS Aug 31, 2023
@tgerla
Copy link

tgerla commented Aug 31, 2023

Thanks @Noxsios, we're going to need to spend a little bit more time investigating this so we'll put it in our backlog.

@tgerla tgerla moved this to Backlog in OSS Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants