-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce CPU & network consumption of Facia JSON download #26338
Conversation
7bcd556
to
9431a92
Compare
4ada4c3
to
eb1fdae
Compare
I've talked this through with @abeddow91, @arelra, @ioannakok, @georgeblahblah and @ParisaTork at the 2pm meeting today, and response was positive, no blocking objections! Some evidence shared:
|
Thanks for the detailed write-up @rtyley! Not sure if this has come up already, but (iirc) the only reason @ioannakok and I were able to find for having a Might be worth looking into if it hasn't been discussed yet? |
Ah interesting - that would have been #18364 & #18365 ...
That is a super interesting point @bryophyta ! Looking at the uncompressed files sizes for
...the lite versions aren't even dramatically smaller - they are still 67% of the same size as the original versions of those files. So I think you're right - they're adding 67% to storage requirements (assuming that the Scala case-class memory representation is roughly proportional to the file size) and are actively harmful to the cache hit rate, now that we're using a cache. Subsequent to this PR, in a new PR, I definitely agree: the 'lite' versions should be completely removed. The only thing to watch out for would be to see if the output of the endpoints that use the 'lite' versions ( |
Thanks for this!
Agreed that we'd need to check implications for consumers 👍 Hopefully it should be easy enough to repurpose the existing code that creates the From what I remember though it might take a little bit of detective work to find out who the consumers are though (I happen to know that one of the consumers of But ultimately up to WebX to decide whether it's something to prioritise of course 🙂 |
} | ||
|
||
class FrontJsonFapiLive(val blockingOperations: BlockingOperations) extends FrontJsonFapi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're deleting BlockingOperations
this config can also be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that bit of config is still used by the BlockingOperations
class, and that hasn't been deleted (unfortunately?!) - we're no longer using it for the Fronts-JSON download code, but it's still used elsewhere, eg dfp.OrderAgent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work @rtyley 👏 Thank you for doing this and for the great PR write-up!
Excellent work @rtyley 💎 |
eb1fdae
to
565d89c
Compare
Context: #26335 In this change: * Use AWS SDK v2 and non-blocking async code to avoid blocking a thread while downloading the Facia JSON data from S3 * Directly transform bytes into JsValue, without constructing a `String` first Note that the work done between the two metrics `FrontDownloadLatency` & `FrontDecodingLatency` has changed slightly - the conversion to the basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather than the 'decoding' step. We could get rid of the futureSemaphore and stop using the dedicated blocking-operations pool here, but we'll leave that to another PR. Co-authored-by: Ravi <[email protected]> Co-authored-by: Ioanna Kokkini <[email protected]>
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.7, 2.8, and eventually 2.9 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
ETag Caching was introduced for Facia `PressedPage` JSON downloading with #26338 in order to improve scalability and address #26335, but a limiting factor was the number of `PressedPage` objects that could be stored in the cache. With a max `PressedPage` size of 22MB and a memory budget of 4GB, a cautious max cache size limit of only 180 `PressedPage` objects was set. As a result, the cache hit rate was relatively low, and we saw elevated GC, probably because of object continually being evicted out of the small cache: #26338 (comment) The change in this new commit dramatically reduces the combined size of the `PressedPage` objects held in memory, taking the average retained size per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage` objects held in memory at the same time). It does this by deduplicating the `Tag` objects held by the `PressedPage`s. Previously, as the `Tag`s for different `PressedPage`s were deserialised from JSON, many identical tags would created over and over again, and held in memory. After dedeuplication, those different `PressedPage`s will all reference the _same_ `Tag` object for a given tag. The deduplication is done as the `Tag`s are deserialised - a new cache (gotta love caches!) holds `Tag`s keyed by their hashcode and tag id, and if a new `Tag` is created with a matching key, it's thrown away, and the old one is used instead. Thus we end up with just one instance of that `Tag`, instead of many duplicated ones. See also: * https://en.wikipedia.org/wiki/String_interning - a similar technique used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
ETag Caching was introduced for Facia `PressedPage` JSON downloading with #26338 in order to improve scalability and address #26335, but a limiting factor was the number of `PressedPage` objects that could be stored in the cache. With a max `PressedPage` size of 22MB and a memory budget of 4GB, a cautious max cache size limit of only 180 `PressedPage` objects was set. As a result, the cache hit rate was relatively low, and we saw elevated GC, probably because of object continually being evicted out of the small cache: #26338 (comment) The change in this new commit dramatically reduces the combined size of the `PressedPage` objects held in memory, taking the average retained size per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage` objects held in memory at the same time). It does this by deduplicating the `Tag` objects held by the `PressedPage`s. Previously, as the `Tag`s for different `PressedPage`s were deserialised from JSON, many identical tags would created over and over again, and held in memory. After dedeuplication, those different `PressedPage`s will all reference the _same_ `Tag` object for a given tag. The deduplication is done as the `Tag`s are deserialised - a new cache (gotta love caches!) holds `Tag`s keyed by their hashcode and tag id, and if a new `Tag` is created with a matching key, it's thrown away, and the old one is used instead. Thus we end up with just one instance of that `Tag`, instead of many duplicated ones. See also: * https://en.wikipedia.org/wiki/String_interning - a similar technique used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
ETag Caching was introduced for Facia `PressedPage` JSON downloading with #26338 in order to improve scalability and address #26335, but a limiting factor was the number of `PressedPage` objects that could be stored in the cache. With a max `PressedPage` size of 22MB and a memory budget of 4GB, a cautious max cache size limit of only 180 `PressedPage` objects was set. As a result, the cache hit rate was relatively low, and we saw elevated GC, probably because of object continually being evicted out of the small cache: #26338 (comment) The change in this new commit dramatically reduces the combined size of the `PressedPage` objects held in memory, taking the average retained size per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage` objects held in memory at the same time). It does this by deduplicating the `Tag` objects held by the `PressedPage`s. Previously, as the `Tag`s for different `PressedPage`s were deserialised from JSON, many identical tags would created over and over again, and held in memory. After dedeuplication, those different `PressedPage`s will all reference the _same_ `Tag` object for a given tag. The deduplication is done as the `Tag`s are deserialised - a new cache (gotta love caches!) holds `Tag`s keyed by their hashcode and tag id, and if a new `Tag` is created with a matching key, it's thrown away, and the old one is used instead. Thus we end up with just one instance of that `Tag`, instead of many duplicated ones. See also: * https://en.wikipedia.org/wiki/String_interning - a similar technique used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.7, 2.8, and eventually 2.9 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
The `etag-caching` library has had a small change to its API with guardian/etag-caching#56, so we need make a small code change to adapt to it. # Background ETag Caching was originally introduced to Frontend with #26338.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
The `etag-caching` library has had a small change to its API with guardian/etag-caching#56, so we need make a small code change to adapt to it. ETag Caching was originally introduced to Frontend with #26338.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching . This library has already been used in DotCom PROD with guardian/frontend#26338 * AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact. An example PR consuming this updated version of the FAPI client is at: guardian/ophan#5506 Updated FAPI artifact layout ---------------------------- To use FAPI with the new AWS SDK v2 support, users must now have a dependency on *two* FAPI artifacts: * `fapi-s3-sdk-v2` * `fapi-client-playXX` Due to needing to support the matrix of: * AWS SDK v1 & v2 * Play-JSON 2.8, 2.9, 3.0 ...it's best not to try to produce an artifact that corresponds to every single combination of those! Consequently, we provide an artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with `guardian/play-secret-rotation`: guardian/play-secret-rotation#8 In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, we have a `fapi-client-core` artifact. Any code that doesn't depend on the JSON classes, or the actual AWS SDK version (which isn't much!), can live in there. In particular, we have: * `com.gu.facia.client.ApiClient`, an existing type that is now a trait, with 2 implementations - one that uses the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.
This PR looks to fix #26335 by improving the scalability of Facia's pressed-page-json loading code in
FrontJsonFapi
. Originally it focussed only on adopting AWS SDK v2 to allow non-blocking async code for the downloading, but it now additionally includes a more significant change: ETag-based caching.ETag caching is a pretty wonderful standard part of the HTTP protocol, relying on these 3 components:
ETag
HTTP response header - a hash of the content, returned by the serverIf-None-Match
HTTP request header - sent by the client, to indicate what content it already has, by sending its ETag304 Not Modified
HTTP response code - returned by a server when content is unchanged from the supplied ETag, with an otherwise blank response (no need to return a response!)I've extracted the code needed for an
ETagCache
into a new library at https://github.com/guardian/etag-caching .ETag Caching Benefits
PressedPage
s are only re-downloaded if the S3 content has changedPressedPage
s are only re-parsed if the S3 content has changedThe fetching and parsing code is also improved:
String
firstMemory requirements of caching
Currently
facia
runs onc6g.2xlarge
instances (16GB RAM), with a JVM heap of 12GB. I'm going to suggest that it's reasonable for theETagCache
to consume up to 4GB of the 12GB RAM.There are ~300 fronts, each of which can have 4 different variants, so currently there could be up to 1200 different
PressedPage
objects. From heapdump analysis, the largestPressedPage
retains ~22MB of memory (most instances are smaller, averaging at 4MB). With the 4GB budget, and assuming a worse case of 22MB perPressedPage
, we can afford to set a max size on the cache of ~180 entries. Although it's disappointing we can't hold all of thePressedPage
s, the priorities of the eviction policy used by the Caffeine caching library should ensure that we get a good hit rate on the most in-demand fronts.Incidentally, heapdump analysis also shows that some structures within
PressedPage
are memory inefficient when considered from the perspective of holding manyPressedPage
objects in memory at once - using object pooling onmodel.Tag
for instance, would probably lead to a 80% reduction of the total retained memory.Removal of
FutureSemaphore
FutureSemaphore
, introduced in November 2017 with #18331, is removed in this change - as described in #26335 (comment) & #26336 (comment), the late-stage 32-concurrent-decoding-processes limit was harmful: it led to work (JSON download &String
generation) being thrown away, without the result of that work being cached by the CDN.Testing
Testing total request-response time on CODE: