diff --git a/.nojekyll b/.nojekyll index a133ebd..fa97ae6 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -94a013ba \ No newline at end of file +8dcdbd5b \ No newline at end of file diff --git a/blogs/index.html b/blogs/index.html index 2b5e23a..5dc87b1 100644 --- a/blogs/index.html +++ b/blogs/index.html @@ -238,7 +238,7 @@

Data Science Blog

-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+

diff --git a/blogs/posts/2024-05-22-storing-data-safely/azure_python.out.ipynb b/blogs/posts/2024-05-22-storing-data-safely/azure_python.out.ipynb index 0f03c78..77066f6 100644 --- a/blogs/posts/2024-05-22-storing-data-safely/azure_python.out.ipynb +++ b/blogs/posts/2024-05-22-storing-data-safely/azure_python.out.ipynb @@ -6,7 +6,7 @@ "source": [ "#" ], - "id": "644e0dee-92a1-4caf-9570-36a656b91960" + "id": "8b3824d3-23d4-4fe9-aed2-de900fb02022" }, { "cell_type": "code", @@ -59,7 +59,7 @@ "Install then run `az login` in your terminal. Once you have logged in\n", "with your browser try the `DefaultAzureCredential()` again!" ], - "id": "14d2fbf9-f40f-4fdd-9df9-b097faa2dae1" + "id": "bac34ad5-031d-4157-b84f-9d05c28d4473" }, { "cell_type": "code", diff --git a/presentations/2024-08-22_agile-and-scrum/agile.png b/presentations/2024-08-22_agile-and-scrum/agile.png new file mode 100644 index 0000000..9d8adbf Binary files /dev/null and b/presentations/2024-08-22_agile-and-scrum/agile.png differ diff --git a/presentations/2024-08-22_agile-and-scrum/index.html b/presentations/2024-08-22_agile-and-scrum/index.html new file mode 100644 index 0000000..c21c231 --- /dev/null +++ b/presentations/2024-08-22_agile-and-scrum/index.html @@ -0,0 +1,994 @@ + + + + + + + + + + + + + + Data Science @ The Strategy Unit – Agile and scrum working + + + + + + + + + + + + + + + + + +
+
+ +
+

Agile and scrum working

+ +
+ +

Aug 22, 2024

+
+
+

How did we get here?

+
    +
  • Waterfall approaches were used in the early days of software development +
      +
    • Requirements; Design; Development; Integration; Testing; Deployment
    • +
  • +
  • You only move to the next stage when the first one is complete
  • +
  • (although actually it turns out you kind of don’t…)
  • +
+
+
+

The road to agile

+
    +
  • Some of the ideas for agile floated around in the 20th century
  • +
  • Shewart’s Plan-Do-Study-Act cycle
  • +
  • The New New Product Development Game in 1986
  • +
  • Scrum (which we’ll return to) was proposed in 1993
  • +
  • In 2001 the Manifesto for Agile Software Development was published
  • +
+
+
+

The agile manifesto

+ +

Copyright © 2001 Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick

+

Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas

+

this declaration may be freely copied in any form, but only in its entirety through this notice.

+
+
+

Agile principles- software and the MVP

+
    +
  • Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  • +
  • Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  • +
  • Working software is the primary measure of progress.
  • +
+

(these principles and those on following slides copyright Ibid.)

+
+
+

Agile principles- working with customers

+
    +
  • Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
  • +
  • Business people and developers must work together daily throughout the project.
  • +
+
+
+

Agile principles- teamwork

+
    +
  • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  • +
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  • +
  • The best architectures, requirements, and designs emerge from self-organizing teams.
  • +
  • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.
  • +
+
+
+

Agile principles- project management

+
    +
  • Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  • +
  • Continuous attention to technical excellence and good design enhances agility.
  • +
  • Simplicity–the art of maximizing the amount of work not done–is essential.
  • +
+
+
+

The agile advantage

+
    +
  • Better use of fixed resources to deliver an unknown outcome, rather than unknown resources to deliver a fixed outcome
  • +
  • Continuous delivery
  • +
+ +
+
+

Feature creep

+ +

“every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can”

+
+

Zawinski’s Law- Source

+
+
+
+

Regular stakeholder feedback

+
    +
  • Agile teams are very responsive to product feedback
  • +
  • The project we’re curently working on is very agile whether we like it or not
  • +
  • Our customers never know what they want until we show them something they don’t want
  • +
+
+
+

More agile advantages

+
    +
  • Early and cheap failure
  • +
  • Continuous testing and QA
  • +
  • Reduction in unproductive work
  • +
  • Team can improve regularly, not just the product
  • +
+
+
+

Agile methods

+
    +
  • There are lots of agile methodologies
  • +
  • I’m not going to embarrass myself by pretending to understand them
  • +
  • Examples include Lean, Crystal, and Extreme Programming
  • +
+
+
+

Scrum

+
    +
  • Scrum is the agile methodology we have adopted
  • +
  • Despite dire warnings to the contrary we have not adopted it wholesale but most of its principles
  • +
  • The fundamental organising principle of work in scrum is a sprint lasting 1-4 weeks
  • +
  • Each sprint finishes with a defined and useful piece of software that can be shown to/ used by customers
  • +
+
+
+

Product owner

+
    +
  • This person is responsible for the backlog- what goes in to the sprint
  • +
  • The backlog should be inclusive of all of the things that customers want or might want
  • +
  • The backlog should be prioritised
  • +
  • The product owner does this through deep and frequent conversations with customers
  • +
+
+
+

Scrum master helps the scrum team

+
    +
  • “By coaching the team members in self-management and cross-functionality
  • +
  • Focus on creating high-value Increments that meet the Definition of Done
  • +
  • Influence the removal of impediments to the Scrum Team’s progress
  • +
  • Ensure that all Scrum events take place and are positive, productive, and kept within the timebox.”
  • +
+

Source

+
+
+

The backlog

+
    +
  • Having an accurate and well prioritised backlog is key
  • +
  • Don’t estimate the backlog in hours- use “T shirt sizes” or “points”
  • +
  • People are terrible at estimating how long things take- particularly in software
  • +
  • Everything in the backlog needs a defined “Done” state
  • +
+
+
+

Sprint planning

+
    +
  • The team, the product owner, and the scrum master plan the sprint
  • +
  • Sprints should be a fixed length of time less than one month
  • +
  • The sprint cannot be changed or added to (we break this rule)
  • +
  • The team works autonomously in the sprint- nobody decides who does what except the team
  • +
  • Can take three hours and should if it needs to
  • +
+
+
+

Standup

+
    +
  • Every day, for no more than 15 minutes (teams often stand up to reinforce this rule) team and scrum master meet
  • +
  • Each person answers three questions +
      +
    • What did you do yesterday to help the team finish the sprint?
    • +
    • What will you do today to help the team finish the sprint?
    • +
    • Is there an obstacle blocking you or the team from achieveing the sprint goal
    • +
  • +
+
+
+

Sprint retro

+
    +
  • What went well, what could have gone better, and what to improve next time
  • +
  • Looking at process, not blaming individuals
  • +
  • Requires maturity and trust to bring up issues, and to respond to them in a constructive way
  • +
  • Should agree at the end on one process improvement which goes in the next sprint
  • +
  • We’ve had some really, really good retros and I think it’s a really important process for a team
  • +
+
+
+

Team perspective

+
    +
  • Product owner- that’s me +
      +
    • Focus, clarity and transparency, team delivery, clear and appropriate responsibilities
    • +
  • +
  • Scrum master- YiWen
  • +
  • Team member- Matt
  • +
  • Team member- Rhian
  • +
+
+
+

Scrum values

+
    +
  • Courage
  • +
  • Focus
  • +
  • Commitment
  • +
  • Respect
  • +
  • Openness
  • +
+
+
+

Using agile outside of software

+ + + +
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/presentations/2024-08-22_agile-and-scrum/skateboard.png b/presentations/2024-08-22_agile-and-scrum/skateboard.png new file mode 100644 index 0000000..4fa62d4 Binary files /dev/null and b/presentations/2024-08-22_agile-and-scrum/skateboard.png differ diff --git a/presentations/index.html b/presentations/index.html index 1f7a38c..8226268 100644 --- a/presentations/index.html +++ b/presentations/index.html @@ -174,23 +174,23 @@

Presentations

-
- @@ -634,96 +634,101 @@

Presentations

+Agile and scrum working +Chris Beeley +2024-08-22 + + Open source licensing: Or: how I learned to stop worrying and love openness Chris Beeley 2024-05-30 - + GitHub as a team sport: DfT QA Month Matt Dray 2024-05-23 - + Store Data Safely: Coffee & Coding YiWen Hon, Matt Dray 2024-05-16 - + Coffee and Coding: Making my analytical workflow more reproducible with {targets} Jacqueline Grout 2024-01-25 - + Conference Check-in App: NHS-R/NHS.pycom 2023 Tom Jemmett 2023-10-17 - + System Dynamics in health and care: fitting square data into round models Sally Thompson 2023-10-09 - + Repeating Yourself with Functions: Coffee and Coding Sally Thompson 2023-09-07 - + Coffee and Coding: Working with Geospatial Data in R Tom Jemmett 2023-08-24 - + Unit testing in R: NHS-R Community Webinar Tom Jemmett 2023-08-23 - + Everything you ever wanted to know about data science: but were too afraid to ask Chris Beeley 2023-08-02 - + Travels with R and Python: the power of data science in healthcare Chris Beeley 2023-08-02 - + An Introduction to the New Hospital Programme Demand Model: HACA 2023 Tom Jemmett 2023-07-11 - + What good data science looks like Chris Beeley 2023-05-23 - + Text mining of patient experience data Chris Beeley 2023-05-15 - + Coffee and Coding: {targets} Tom Jemmett 2023-03-23 - + Collaborative working Chris Beeley 2023-03-23 - + Coffee and Coding: Good Coding Practices Tom Jemmett 2023-03-09 - + RAP: what is it and how can my team start using it effectively? Chris Beeley 2023-03-09 - + Coffee and coding: Intro session Chris Beeley 2023-02-23 diff --git a/search.json b/search.json index 0764ddd..eba5fd4 100644 --- a/search.json +++ b/search.json @@ -187,7 +187,7 @@ "href": "presentations/index.html", "title": "Presentations", "section": "", - "text": "Title\nAuthor\nDate\n\n\n\n\nOpen source licensing: Or: how I learned to stop worrying and love openness\nChris Beeley\n2024-05-30\n\n\nGitHub as a team sport: DfT QA Month\nMatt Dray\n2024-05-23\n\n\nStore Data Safely: Coffee & Coding\nYiWen Hon, Matt Dray\n2024-05-16\n\n\nCoffee and Coding: Making my analytical workflow more reproducible with {targets}\nJacqueline Grout\n2024-01-25\n\n\nConference Check-in App: NHS-R/NHS.pycom 2023\nTom Jemmett\n2023-10-17\n\n\nSystem Dynamics in health and care: fitting square data into round models\nSally Thompson\n2023-10-09\n\n\nRepeating Yourself with Functions: Coffee and Coding\nSally Thompson\n2023-09-07\n\n\nCoffee and Coding: Working with Geospatial Data in R\nTom Jemmett\n2023-08-24\n\n\nUnit testing in R: NHS-R Community Webinar\nTom Jemmett\n2023-08-23\n\n\nEverything you ever wanted to know about data science: but were too afraid to ask\nChris Beeley\n2023-08-02\n\n\nTravels with R and Python: the power of data science in healthcare\nChris Beeley\n2023-08-02\n\n\nAn Introduction to the New Hospital Programme Demand Model: HACA 2023\nTom Jemmett\n2023-07-11\n\n\nWhat good data science looks like\nChris Beeley\n2023-05-23\n\n\nText mining of patient experience data\nChris Beeley\n2023-05-15\n\n\nCoffee and Coding: {targets}\nTom Jemmett\n2023-03-23\n\n\nCollaborative working\nChris Beeley\n2023-03-23\n\n\nCoffee and Coding: Good Coding Practices\nTom Jemmett\n2023-03-09\n\n\nRAP: what is it and how can my team start using it effectively?\nChris Beeley\n2023-03-09\n\n\nCoffee and coding: Intro session\nChris Beeley\n2023-02-23" + "text": "Title\nAuthor\nDate\n\n\n\n\nAgile and scrum working\nChris Beeley\n2024-08-22\n\n\nOpen source licensing: Or: how I learned to stop worrying and love openness\nChris Beeley\n2024-05-30\n\n\nGitHub as a team sport: DfT QA Month\nMatt Dray\n2024-05-23\n\n\nStore Data Safely: Coffee & Coding\nYiWen Hon, Matt Dray\n2024-05-16\n\n\nCoffee and Coding: Making my analytical workflow more reproducible with {targets}\nJacqueline Grout\n2024-01-25\n\n\nConference Check-in App: NHS-R/NHS.pycom 2023\nTom Jemmett\n2023-10-17\n\n\nSystem Dynamics in health and care: fitting square data into round models\nSally Thompson\n2023-10-09\n\n\nRepeating Yourself with Functions: Coffee and Coding\nSally Thompson\n2023-09-07\n\n\nCoffee and Coding: Working with Geospatial Data in R\nTom Jemmett\n2023-08-24\n\n\nUnit testing in R: NHS-R Community Webinar\nTom Jemmett\n2023-08-23\n\n\nEverything you ever wanted to know about data science: but were too afraid to ask\nChris Beeley\n2023-08-02\n\n\nTravels with R and Python: the power of data science in healthcare\nChris Beeley\n2023-08-02\n\n\nAn Introduction to the New Hospital Programme Demand Model: HACA 2023\nTom Jemmett\n2023-07-11\n\n\nWhat good data science looks like\nChris Beeley\n2023-05-23\n\n\nText mining of patient experience data\nChris Beeley\n2023-05-15\n\n\nCoffee and Coding: {targets}\nTom Jemmett\n2023-03-23\n\n\nCollaborative working\nChris Beeley\n2023-03-23\n\n\nCoffee and Coding: Good Coding Practices\nTom Jemmett\n2023-03-09\n\n\nRAP: what is it and how can my team start using it effectively?\nChris Beeley\n2023-03-09\n\n\nCoffee and coding: Intro session\nChris Beeley\n2023-02-23" }, { "objectID": "presentations/2023-03-23_collaborative-working/index.html#introduction", @@ -421,480 +421,620 @@ "text": "Contact\n\n\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\n\n\n\n chris.beeley1@nhs.net\n chrisbeeley\n\n\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today", - "title": "Coffee and Coding", - "section": "Packages we are using today", - "text": "Packages we are using today\n\nlibrary(tidyverse)\n\nlibrary(sf)\n\nlibrary(tidygeocoder)\nlibrary(PostcodesioR)\n\nlibrary(osrm)\n\nlibrary(leaflet)" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman", + "href": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman", + "title": "Open source licensing", + "section": "A note on Richard Stallman", + "text": "A note on Richard Stallman\n\nRichard Stallman has been heavily criticised for some of this views\nHe is hard to ignore when talking about open source so I am going to talk about him\nNothing in this talk should be read as endorsing any of his comments outside (or inside) the world of open source" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data", - "title": "Coffee and Coding", - "section": "Getting boundary data", - "text": "Getting boundary data\nWe can use the ONS’s Geoportal we can grab boundary data to generate maps\n\n\n\nicb_url <- paste0(\n \"https://services1.arcgis.com\",\n \"/ESMARspQHYMw9BZ9/arcgis\",\n \"/rest/services\",\n \"/Integrated_Care_Boards_April_2023_EN_BGC\",\n \"/FeatureServer/0/query\",\n \"?outFields=*&where=1%3D1&f=geojson\"\n)\nicb_boundaries <- read_sf(icb_url)\n\nicb_boundaries |>\n ggplot() +\n geom_sf() +\n theme_void()" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source", + "href": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source", + "title": "Open source licensing", + "section": "The origin of open source", + "text": "The origin of open source\n\nIn the 50s and 60s source code was routinely shared with hardware and users were often expected to modify to run on their hardware\nBy the late 1960s the production cost of software was rising relative to hardware and proprietary licences became more prevalent\nIn 1980 Richard Stallman’s department at MIT took delivery of a printer they were not able to modify the source code for\nRichard Stallman launched the GNU project in 1983 to fight for software freedoms\nMIT licence was launched in the late 1980s\nCathedral and the bazaar was released in 1997 (more on which later)" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data", - "title": "Coffee and Coding", - "section": "What is the icb_boundaries data?", - "text": "What is the icb_boundaries data?\n\nicb_boundaries |>\n select(ICB23CD, ICB23NM)\n\nSimple feature collection with 42 features and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -6.418667 ymin: 49.86479 xmax: 1.763706 ymax: 55.81112\nGeodetic CRS: WGS 84\n# A tibble: 42 × 3\n ICB23CD ICB23NM geometry\n <chr> <chr> <MULTIPOLYGON [°]>\n 1 E54000008 NHS Cheshire and Merseyside Integrated C… (((-3.083264 53.2559, -3…\n 2 E54000010 NHS Staffordshire and Stoke-on-Trent Int… (((-1.950489 53.21188, -…\n 3 E54000011 NHS Shropshire, Telford and Wrekin Integ… (((-2.380794 52.99841, -…\n 4 E54000013 NHS Lincolnshire Integrated Care Board (((0.2687853 52.81584, 0…\n 5 E54000015 NHS Leicester, Leicestershire and Rutlan… (((-0.7875237 52.97762, …\n 6 E54000018 NHS Coventry and Warwickshire Integrated… (((-1.577608 52.67858, -…\n 7 E54000019 NHS Herefordshire and Worcestershire Int… (((-2.272042 52.43972, -…\n 8 E54000022 NHS Norfolk and Waveney Integrated Care … (((1.666741 52.31366, 1.…\n 9 E54000023 NHS Suffolk and North East Essex Integra… (((0.8997023 51.7732, 0.…\n10 E54000024 NHS Bedfordshire, Luton and Milton Keyne… (((-0.4577115 52.32009, …\n# ℹ 32 more rows" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source", + "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source", + "title": "Open source licensing", + "section": "What is open source?", + "text": "What is open source?\n\nThink free as in free speech, not free beer (Stallman)\n\n\nOpen source does not mean free of charge! Software freedom implies the ability to sell code\nFree of charge does not mean open source! Many free to download pieces of software are not open source (Zoom, for example)\n\n\nBy Chao-Kuei et al. - https://www.gnu.org/philosophy/categories.en.html, GPL, Link" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes", - "title": "Coffee and Coding", - "section": "Working with geospatial dataframes", - "text": "Working with geospatial dataframes\nWe can simply join sf data frames and “regular” data frames together\n\n\n\nicb_metrics <- icb_boundaries |>\n st_drop_geometry() |>\n select(ICB23CD) |>\n mutate(admissions = rpois(n(), 1000000))\n\nicb_boundaries |>\n inner_join(icb_metrics, by = \"ICB23CD\") |>\n ggplot() +\n geom_sf(aes(fill = admissions)) +\n scale_fill_viridis_c() +\n theme_void()" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms", + "href": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms", + "title": "Open source licensing", + "section": "The four freedoms", + "text": "The four freedoms\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 1: The freedom to study how the program works, and change it to make it do what you wish.\nFreedom 2: The freedom to redistribute and make copies so you can help your neighbor.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits." }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames", - "title": "Coffee and Coding", - "section": "Working with geospatial data frames", - "text": "Working with geospatial data frames\nWe can manipulate sf objects like other data frames\n\n\n\nlondon_icbs <- icb_boundaries |>\n filter(ICB23NM |> stringr::str_detect(\"London\"))\n\nggplot() +\n geom_sf(data = london_icbs) +\n geom_sf(data = st_centroid(london_icbs)) +\n theme_void()" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar", + "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar", + "title": "Open source licensing", + "section": "Cathedral and the bazaar", + "text": "Cathedral and the bazaar\n\nEvery good work of software starts by scratching a developer’s personal itch.\nGood programmers know what to write. Great ones know what to rewrite (and reuse).\nPlan to throw one [version] away; you will, anyhow (copied from Frederick Brooks’s The Mythical Man-Month).\nIf you have the right attitude, interesting problems will find you.\nWhen you lose interest in a program, your last duty to it is to hand it off to a competent successor.\nTreating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.\nRelease early. Release often. And listen to your customers.\nGiven a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.\nSmart data structures and dumb code works a lot better than the other way around.\nIf you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource." }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1", - "title": "Coffee and Coding", - "section": "Working with geospatial data frames", - "text": "Working with geospatial data frames\nSummarising the data will combine the geometries.\n\nlondon_icbs |>\n summarise(area = sum(Shape__Area)) |>\n # and use geospatial functions to create calculations using the geometry\n mutate(new_area = st_area(geometry), .before = \"geometry\")\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -0.5102803 ymin: 51.28676 xmax: 0.3340241 ymax: 51.69188\nGeodetic CRS: WGS 84\n# A tibble: 1 × 3\n area new_area geometry\n* <dbl> [m^2] <MULTIPOLYGON [°]>\n1 1573336388. 1567995610. (((-0.3314819 51.43935, -0.3306676 51.43889, -0.33118…\n\n\n Why the difference in area?\n\n We are using a simplified geometry, so calculating the area will be slightly inaccurate. The original area was calculated on the non-simplified geometries." + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.", + "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.", + "title": "Open source licensing", + "section": "Cathedral and the bazaar (cont.)", + "text": "Cathedral and the bazaar (cont.)\n\nThe next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.\nOften, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.\nPerfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away. (Attributed to Antoine de Saint-Exupéry)\nAny tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.\nWhen writing gateway software of any kind, take pains to disturb the data stream as little as possible—and never throw away information unless the recipient forces you to!\nWhen your language is nowhere near Turing-complete, syntactic sugar can be your friend.\nA security system is only as secure as its secret. Beware of pseudo-secrets.\nTo solve an interesting problem, start by finding a problem that is interesting to you.\nProvided the development coordinator has a communications medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one." }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data", - "title": "Coffee and Coding", - "section": "Creating our own geospatial data", - "text": "Creating our own geospatial data\n\nlocation_raw <- postcode_lookup(\"B2 4BJ\")\nglimpse(location_raw)\n\nRows: 1\nColumns: 40\n$ postcode <chr> \"B2 4BJ\"\n$ quality <int> 1\n$ eastings <int> 406866\n$ northings <int> 286775\n$ country <chr> \"England\"\n$ nhs_ha <chr> \"West Midlands\"\n$ longitude <dbl> -1.90033\n$ latitude <dbl> 52.47887\n$ european_electoral_region <chr> \"West Midlands\"\n$ primary_care_trust <chr> \"Heart of Birmingham Teaching\"\n$ region <chr> \"West Midlands\"\n$ lsoa <chr> \"Birmingham 138A\"\n$ msoa <chr> \"Birmingham 138\"\n$ incode <chr> \"4BJ\"\n$ outcode <chr> \"B2\"\n$ parliamentary_constituency <chr> \"Birmingham, Ladywood\"\n$ parliamentary_constituency_2024 <chr> \"Birmingham Ladywood\"\n$ admin_district <chr> \"Birmingham\"\n$ parish <chr> \"Birmingham, unparished area\"\n$ admin_county <lgl> NA\n$ date_of_introduction <chr> \"198001\"\n$ admin_ward <chr> \"Ladywood\"\n$ ced <lgl> NA\n$ ccg <chr> \"NHS Birmingham and Solihull\"\n$ nuts <chr> \"Birmingham\"\n$ pfa <chr> \"West Midlands\"\n$ admin_district_code <chr> \"E08000025\"\n$ admin_county_code <chr> \"E99999999\"\n$ admin_ward_code <chr> \"E05011151\"\n$ parish_code <chr> \"E43000250\"\n$ parliamentary_constituency_code <chr> \"E14000564\"\n$ parliamentary_constituency_2024_code <chr> \"E14001096\"\n$ ccg_code <chr> \"E38000258\"\n$ ccg_id_code <chr> \"15E\"\n$ ced_code <chr> \"E99999999\"\n$ nuts_code <chr> \"TLG31\"\n$ lsoa_code <chr> \"E01033620\"\n$ msoa_code <chr> \"E02006899\"\n$ lau2_code <chr> \"E08000025\"\n$ pfa_code <chr> \"E23000014\"\n\n\n\n\n\nlocation <- location_raw |>\n st_as_sf(coords = c(\"eastings\", \"northings\"), crs = 27700) |>\n select(postcode, ccg) |>\n st_transform(crs = 4326)\n\nlocation\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.900335 ymin: 52.47886 xmax: -1.900335 ymax: 52.47886\nGeodetic CRS: WGS 84\n postcode ccg geometry\n1 B2 4BJ NHS Birmingham and Solihull POINT (-1.900335 52.47886)" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science", + "href": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science", + "title": "Open source licensing", + "section": "The disciplines of open source are the disciplines of good data science", + "text": "The disciplines of open source are the disciplines of good data science\n\nMeaningful README\nMeaningful commit messages\nModularity\nSeparating data code from analytic code from interactive code\nAssigning issues and pull requests for action/ review\nDon’t forget one of the most lazy and incompetent developers you will ever work with is yourself, six months later" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts", - "title": "Coffee and Coding", - "section": "Creating a geospatial data frame for all NHS Trusts", - "text": "Creating a geospatial data frame for all NHS Trusts\n\n\n\n# using the NHSRtools package\n# remotes::install_github(\"NHS-R-Community/NHSRtools\")\ntrusts <- ods_get_trusts() |>\n filter(status == \"Active\") |>\n select(name, org_id, post_code) |>\n geocode(postalcode = \"post_code\") |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\ntrusts |>\n leaflet() |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers(popup = ~name)" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist", + "href": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist", + "title": "Open source licensing", + "section": "What licences exist?", + "text": "What licences exist?\n\nPermissive\n\nSuch as MIT but there are others. Recommended by NHSX draft guidelines on open source\nApache is a notable permissive licence- includes a patent licence\nIn our work the OGL is also relevant- civil servant publish stuff under OGL (and MIT- it isn’t particularly recommended for code)\n\nCopyleft\n\nGPL2, GPL3, AGPL (“the GPL of the web”)\nNote that the provisions of the GPL only apply when you distribute the code\nAt a certain point it all gets too complicated and you need a lawyer\nMPL is a notable copyleft licence- can combine with proprietary code as long as kept separate\n\nArguments for permissive/ copyleft- getting your code used versus preserving software freedoms for other people\nNote that most of the licences are impossible to read! There is a website to explain tl;dr" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location", - "title": "Coffee and Coding", - "section": "What are the nearest trusts to our location?", - "text": "What are the nearest trusts to our location?\n\nnearest_trusts <- trusts |>\n mutate(\n distance = st_distance(geometry, location)[, 1]\n ) |>\n arrange(distance) |>\n head(5)\n\nnearest_trusts\n\nSimple feature collection with 5 features and 4 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.9384 ymin: 52.4533 xmax: -1.886282 ymax: 52.48764\nGeodetic CRS: WGS 84\n# A tibble: 5 × 5\n name org_id post_code geometry distance\n <chr> <chr> <chr> <POINT [°]> [m]\n1 BIRMINGHAM WOMEN'S AND CH… RQ3 B4 6NH (-1.894241 52.4849) 789.\n2 BIRMINGHAM AND SOLIHULL M… RXT B1 3RB (-1.917663 52.48416) 1313.\n3 BIRMINGHAM COMMUNITY HEAL… RYW B7 4BN (-1.886282 52.48754) 1356.\n4 SANDWELL AND WEST BIRMING… RXK B18 7QH (-1.930203 52.48764) 2246.\n5 UNIVERSITY HOSPITALS BIRM… RRK B15 2GW (-1.9384 52.4533) 3838." + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter", + "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter", + "title": "Open source licensing", + "section": "What is copyright and why does it matter", + "text": "What is copyright and why does it matter\n\nCopyright is assigned at the moment of creation\nIf you made it in your own time, it’s yours (usually!)\nIf you made it at work, it belongs to your employer\nIf someone paid you to make it (“work for hire”) it belongs to them\nCrucially, the copyright holder can relicence software\n\nIf it’s jointly authored it depends if it’s a “collective” or “joint” work\nHonestly it’s pretty complicated. Just vest copyright in an organisation or group of individuals you trust\nGoldacre review suggests using Crown copyright for copyright in the NHS because it’s a “shoal, not a big fish” (with apologies to Ben whom I am misquoting)" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts", - "title": "Coffee and Coding", - "section": "Let’s find driving routes to these trusts", - "text": "Let’s find driving routes to these trusts\n\nroutes <- nearest_trusts |>\n mutate(\n route = map(geometry, ~ osrmRoute(location, st_coordinates(.x)))\n ) |>\n st_drop_geometry() |>\n rename(straight_line_distance = distance) |>\n unnest(route) |>\n st_as_sf()\n\nroutes\n\nSimple feature collection with 5 features and 8 fields\nGeometry type: LINESTRING\nDimension: XY\nBounding box: xmin: -1.93846 ymin: 52.45316 xmax: -1.88527 ymax: 52.49279\nGeodetic CRS: WGS 84\n# A tibble: 5 × 9\n name org_id post_code straight_line_distance src dst duration distance\n <chr> <chr> <chr> [m] <chr> <chr> <dbl> <dbl>\n1 BIRMING… RQ3 B4 6NH 789. 1 dst 5.77 3.09\n2 BIRMING… RXT B1 3RB 1313. 1 dst 6.84 4.14\n3 BIRMING… RYW B7 4BN 1356. 1 dst 7.59 4.29\n4 SANDWEL… RXK B18 7QH 2246. 1 dst 8.78 4.95\n5 UNIVERS… RRK B15 2GW 3838. 1 dst 10.6 4.67\n# ℹ 1 more variable: geometry <LINESTRING [°]>" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel", + "href": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel", + "title": "Open source licensing", + "section": "Iceweasel", + "text": "Iceweasel\n\nIceweasel is a story of trademark rather than copyright\nDebian (a Linux flavour) had the permission to use the source code of Firefox, but not the logo\nSo they took the source code and made their own version\nThis sounds very obscure and unimportant but it could become important in future projects of ours, like…" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes", - "title": "Coffee and Coding", - "section": "Let’s show the routes", - "text": "Let’s show the routes\n\nleaflet(routes) |>\n addTiles() |>\n addMarkers(data = location) |>\n addPolylines(color = \"black\", weight = 3, opacity = 1) |>\n addCircleMarkers(data = nearest_trusts, radius = 4, opacity = 1, fillOpacity = 1)" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects", + "href": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects", + "title": "Open source licensing", + "section": "What we have learned in recent projects", + "text": "What we have learned in recent projects\n\nThe huge benefits of being open\n\nTransparency\nWorking with customers\nGoodwill\n\nNonfree mitigators\nDifferent licences for different repos" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones", - "title": "Coffee and Coding", - "section": "We can use {osrm} to calculate isochrones", - "text": "We can use {osrm} to calculate isochrones\n\n\n\niso <- osrmIsochrone(location, breaks = seq(0, 60, 15), res = 10)\n\nisochrone_ids <- unique(iso$id)\n\npal <- colorFactor(\n viridis::viridis(length(isochrone_ids)),\n isochrone_ids\n)\n\nleaflet(location) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~ pal(id),\n color = \"#000000\",\n weight = 1\n )" + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like", + "href": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like", + "title": "Open source licensing", + "section": "Software freedom means allowing people to do stuff you don’t like", + "text": "Software freedom means allowing people to do stuff you don’t like\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.\nThe code isn’t the only thing with worth in the project\nThis is why there are whole businesses founded on “here’s the Linux source code”\nSo when we’re sharing code we are letting people do stupid things with it but we’re not recommending that they do stupid things with it\nPeople do stupid things with Excel and Microsoft don’t accept liability for that, and neither should we\nThis issue of sharing analytic code and merchantability for a particular purpose is poorly understood and I think everyone needs to be clearer on it (us, and our customers)\nIn my view a world where consultants are selling our code is better than a world where they’re selling their spreadsheets" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones", - "title": "Coffee and Coding", - "section": "What trusts are in the isochrones?", - "text": "What trusts are in the isochrones?\nThe summarise() function will “union” the geometry\n\nsummarise(iso)\n\nSimple feature collection with 1 feature and 0 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: -2.913575 ymin: 51.98062 xmax: -0.8502164 ymax: 53.1084\nGeodetic CRS: WGS 84\n geometry\n1 POLYGON ((-1.541014 52.9693..." + "objectID": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano", + "href": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano", + "title": "Open source licensing", + "section": "“Open source as in piano”", + "text": "“Open source as in piano”\n\nThe patient experience QDC project\nOur current project\nOpen source code is not necessarily to be run, but understood and learned from\nBuilding a group of people who can use and contribute to your code is arguably as important as writing it\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1", - "title": "Coffee and Coding", - "section": "What trusts are in the isochrones?", - "text": "What trusts are in the isochrones?\nWe can use this with a geo-filter to find the trusts in the isochrone\n\n# also works\ntrusts_in_iso <- trusts |>\n st_filter(\n summarise(iso),\n .predicate = st_within\n )\n\ntrusts_in_iso\n\nSimple feature collection with 31 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -2.793386 ymin: 52.19205 xmax: -1.10302 ymax: 53.01015\nGeodetic CRS: WGS 84\n# A tibble: 31 × 4\n name org_id post_code geometry\n * <chr> <chr> <chr> <POINT [°]>\n 1 BIRMINGHAM AND SOLIHULL MENTAL HE… RXT B1 3RB (-1.917663 52.48416)\n 2 BIRMINGHAM COMMUNITY HEALTHCARE N… RYW B7 4BN (-1.886282 52.48754)\n 3 BIRMINGHAM WOMEN'S AND CHILDREN'S… RQ3 B4 6NH (-1.894241 52.4849)\n 4 BIRMINGHAM WOMEN'S NHS FOUNDATION… RLU B15 2TG (-1.942861 52.45325)\n 5 BURTON HOSPITALS NHS FOUNDATION T… RJF DE13 0RB (-1.656667 52.81774)\n 6 COVENTRY AND WARWICKSHIRE PARTNER… RYG CV6 6NY (-1.48692 52.45659)\n 7 DERBYSHIRE HEALTHCARE NHS FOUNDAT… RXM DE22 3LZ (-1.512896 52.91831)\n 8 DUDLEY INTEGRATED HEALTH AND CARE… RYK DY5 1RU (-2.11786 52.48176)\n 9 GEORGE ELIOT HOSPITAL NHS TRUST RLT CV10 7DJ (-1.47844 52.51258)\n10 HEART OF ENGLAND NHS FOUNDATION T… RR1 B9 5ST (-1.828759 52.4781)\n# ℹ 21 more rows" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap", + "title": "RAP", + "section": "What is RAP", + "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\nGoldacre review" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2", - "title": "Coffee and Coding", - "section": "What trusts are in the isochrones?", - "text": "What trusts are in the isochrones?\n\n\n\nleaflet(trusts_in_iso) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~pal(id),\n color = \"#000000\",\n weight = 1\n )" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve", + "title": "RAP", + "section": "What are we trying to achieve?", + "text": "What are we trying to achieve?\n\nLegibility\nReproducibility\nAccuracy\nLaziness" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius", - "title": "Coffee and Coding", - "section": "Doing the same but within a radius", - "text": "Doing the same but within a radius\n\n\n\nr <- 25000\n\ntrusts_in_radius <- trusts |>\n st_filter(\n location,\n .predicate = st_is_within_distance,\n dist = r\n )\n\n# transforming gives us a pretty smooth circle\nradius <- location |>\n st_transform(crs = 27700) |>\n st_buffer(dist = r) |>\n st_transform(crs = 4326)\n\nleaflet(trusts_in_radius) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = radius,\n color = \"#000000\",\n weight = 1\n )" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles", + "title": "RAP", + "section": "What are some of the fundamental principles?", + "text": "What are some of the fundamental principles?\n\nPredictability, reducing mental load, and reducing truck factor\nMaking it easy to collaborate with yourself and others on different computers, in the cloud, in six months’ time…\nDRY" }, { - "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading", - "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading", - "title": "Coffee and Coding", - "section": "Further reading", - "text": "Further reading\n\nGeocomputation with R\nr-spatial\n{sf} documentation\nLeaflet documentation\nTidy Geospatial Networks in R\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap", + "title": "RAP", + "section": "The road to RAP", + "text": "The road to RAP\n\nWe’re roughly using NHS Digital’s RAP stages\nThere is an incredibly large amount to learn!\nConfession time! (everything I do not know…)\nYou don’t need to do it all at once\nYou don’t need to do it all at all ever\nEach thing you learn will incrementally help you\nRemember- that’s why we learnt this stuff. Because it helped us. And it can help you too" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing", - "title": "Unit testing in R", - "section": "What is testing?", - "text": "What is testing?\n\nSoftware testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation\nwikipedia" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline", + "title": "RAP", + "section": "Levels of RAP- Baseline", + "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL).\nCode is version controlled (see Git basics and using Git collaboratively guides).\nRepository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed.\nCode is published in the open and linked to & from accompanying publication (if relevant).\n\nSource: NHS Digital RAP community of practice" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code", - "title": "Unit testing in R", - "section": "How can we test our code?", - "text": "How can we test our code?\n\n\nStatically\n\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\n\nDynamically" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver", + "title": "RAP", + "section": "Levels of RAP- Silver", + "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\nData is handled and output in a Tidy data format\n\nSource: NHS Digital RAP community of practice" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1", - "title": "Unit testing in R", - "section": "How can we test our code?", - "text": "How can we test our code?\n\n\nStatically\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\nDynamically\n\n\n(by executing the code)\nsplit into functional and non-functional testing\ntesting can be manual, or automated\n\n\n\n\n\nnon-functional testing covers things like performance, security, and usability testing" + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold", + "title": "RAP", + "section": "Levels of RAP- Gold", + "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\nSource: NHS Digital RAP community of practice" + }, + { + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there", + "title": "RAP", + "section": "A learning journey to get you there", + "text": "A learning journey to get you there\n\nCode style, organising your files\nFunctions and iteration\nGit and GitHub\nPackaging your code\nTesting\nPackage management and versioning" + }, + { + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there", + "title": "RAP", + "section": "How we can help each other get there", + "text": "How we can help each other get there\n\nWork as a team!\nCoffee and coding!\nAsk for help!\nDo pair coding!\nGet your code reviewed!\nJoin the NHS-R/ NHSPycom communities" + }, + { + "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca", + "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca", + "title": "RAP", + "section": "HACA", + "text": "HACA\n\nThe first national analytics conference for health and care\nInsight to action!\nJuly 11th and 12th, University of Birmingham\nAccepting abstracts for short and long talks and posters\nAbstract deadline 27th March\nHelp is available (with abstract, poster, preparing presentation…)!\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines", + "title": "System Dynamics in health and care", + "section": "Health Data in the Headlines", + "text": "Health Data in the Headlines\n\n\n\n\nUsed to seeing headlines that give a snapshot figure but doesn’t say much about the system.\nNow starting to see headlines that recognise flow through the system rather than snapshot in time of just one part.\nCan get better understanding of the issues in a system if we can map it as stocks and flows, but our datasets not designed to give up this information very readily. This talk is how I have tried to meet that challenge." + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens", + "title": "System Dynamics in health and care", + "section": "Through the System Dynamics lens", + "text": "Through the System Dynamics lens\n\nStock-flow model\nDynamic behaviour, feedback loops\n\nIn a few seconds, what is SD?\nAn approach to understanding the behaviour of complex systems over time. A method of mapping a system as stocks, whose levels can only change due to flows in and flows out. Stocks could be people on a waiting list, on a ward, money, …\nFlows are the rate at which things change in a given time period e.g. admissions per day, referrals per month.\nBehaviour of the system is determined by how the components interact with each other, not what each component does. Mapping the structure of a system like this leads us to identify feedback loops, and consequences of an action - both intended and unintended.\nIn this capacity-constrained model we only need 3 parameters to run the model (exogenous). All the behaviour within the grey box is determined by the interactions of those components (indogenous).\nHow do we get a value/values for referrals per day?\n(currently use specialist software to build and run our models, aim is to get to a point where we can run in open source.)" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows", + "title": "System Dynamics in health and care", + "section": "Determining flows", + "text": "Determining flows\n\n\n\n\n‘admissions per day’ is needed to populate the model.\n‘discharged’ could be used to verify the model against known data\n\nHow many admissions per day (or week, month…)\n\n\n\n\n\n\n\n \n\n\nGoing to use very simple model shown to explain how to extract flow data for admissions. Will start with visual explainer before going into the code.\n1. generate list of key dates (in this case daily, could be weekly, monthly)\n2. take our patient-level ID with admission and discharge dates\n3. count of admissions on that day/week" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy", + "title": "System Dynamics in health and care", + "section": "Determining occupancy", + "text": "Determining occupancy\n\n\n\n\n‘on ward’ is used to verify the model against known data\n\nLogic statement testing if the key date is wholly between admission and discharge dates\nflag for a match \n\n\n\n\n\n\n \n\n\nMight also want to generate occupancy, to compare the model output with actual data to verify/validate.\n1. generate list of key dates\n2. take our patient-level ID with admission and discharge dates\n3. going to take each date in our list of keydates, and see if there is an admission before that date and discharge after 4. this creates a wide data frame, the same length as patient data.\n5. once run through all the dates in the list, sum each column\nPatient A admitted on 2nd, so only starts being classed as resident on 3rd." + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows", + "title": "System Dynamics in health and care", + "section": "in R - flows", + "text": "in R - flows\nEasy to do with count, or group_by and summarise\n\n\n admit_d <- spell_dates |> \n group_by(date_admit) |>\n count(date_admit)\n\nhead(admit_d)\n\n\n# A tibble: 6 × 2\n# Groups: date_admit [6]\n date_admit n\n <date> <int>\n1 2022-01-01 28\n2 2022-01-02 24\n3 2022-01-03 21\n4 2022-01-04 27\n5 2022-01-05 32\n6 2022-01-06 27" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy", + "title": "System Dynamics in health and care", + "section": "in R - occupancy", + "text": "in R - occupancy\nGenerate list of key dates\n\n\n\ndate_start <- dmy(01012022) \ndate_end <- dmy(31012022)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"day\"))\n\nkeydates <- data.frame(\n date = c(seq(date_start, by = \"day\", length.out=run_len))) \n\n\n\n\n date\n1 2022-01-01\n2 2022-01-02\n3 2022-01-03\n4 2022-01-04\n5 2022-01-05\n6 2022-01-06\n\n\n\n\nStart by generating the list of keydates. In this example we’re running the model in days, and checking each day in 2022.\nNeed the run length for the next step, to know how many times to iterate over" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1", + "title": "System Dynamics in health and care", + "section": "in R - occupancy", + "text": "in R - occupancy\nIterate over each date - need to have been admitted before, and discharged after\n\noccupancy_flag <- function(df) {\n\n # pre-allocate tibble size to speed up iteration in loop\n activity_all <- tibble(nrow = nrow(df)) |> \n select()\n \n for (i in 1:run_len) {\n \n activity_period <- case_when(\n \n # creates 1 flag if resident for complete day\n df$date_admit < keydates$keydate[i] & \n df$date_discharge > keydates$keydate[i] ~ 1,\n TRUE ~ 0)\n \n # column bind this day's flags to previous\n activity_all <- bind_cols(activity_all, activity_period)\n \n }\n \n # rename column to match the day being counted\n activity_all <- activity_all |> \n setNames(paste0(\"d_\", keydates$date))\n \n # bind flags columns to patient data\n daily_adm <- bind_cols(df, activity_all) |> \n pivot_longer(\n cols = starts_with(\"d_\"),\n names_to = \"date\",\n values_to = \"count\"\n ) |> \n \n group_by(date) |> \n summarise(resident = sum(count)) |> \n ungroup() |> \n mutate(date = str_remove(date, \"d_\"))\n \n } \n\n\nIs there a better way than using a for loop?\n\nPre-allocate tibbles\nactivity_all will end up as very wide tibble, with a column for each date in list of keydates.\nFor each date in the list of key dates, compares with admission date & discharge date; need to be admitted before the key date and discharged after the key date. If match, flag = 1.\nCreates a column for each day, then binds this to activity all.\nRename each column with the date it was checking (add a character to start of column name so column doesn’t start with numeric)\nPivot long, then group by date and sum the flags (other variables could be added here, such as TFC or provider code)" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows", + "title": "System Dynamics in health and care", + "section": "Longer Time Periods - flows", + "text": "Longer Time Periods - flows\nUse lubridate::floor_date to generate the date at start of week/month\n\nadmit_wk <- spell_dates |> \n mutate(week_start = floor_date(\n date_admit, unit = \"week\", week_start = 1 # start week on Monday\n )) |> \n count(week_start) # could add other parameters such as provider code, TFC etc\n\nhead(admit_wk)\n\n\n\n# A tibble: 6 × 2\n week_start n\n <date> <int>\n1 2021-12-27 52\n2 2022-01-03 196\n3 2022-01-10 192\n4 2022-01-17 223\n5 2022-01-24 157\n6 2022-01-31 187\n\n\n\nMight run SD model in weeks or months - e.g. months for care homes Use lubridate to create new variable with start date of week/month/year etc" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy", + "title": "System Dynamics in health and care", + "section": "Longer Time Periods - occupancy", + "text": "Longer Time Periods - occupancy\nKey dates to include the dates at the start and end of each time period\n\n\n\ndate_start <- dmy(03012022) # first Monday of the year\ndate_end <- dmy(01012023)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"week\"))\n\nkeydates <- data.frame(wk_start = c(seq(date_start, \n by = \"week\", \n length.out=run_len))) |> \n mutate(\n wk_end = wk_start + 6) # last date in time period\n\n\n\n\n wk_start wk_end\n1 2022-01-03 2022-01-09\n2 2022-01-10 2022-01-16\n3 2022-01-17 2022-01-23\n4 2022-01-24 2022-01-30\n5 2022-01-31 2022-02-06\n6 2022-02-07 2022-02-13\n\n\n\n\nModel might make more sense to run in weeks or months (e.g. care home), so list of keydates need a start date and end date for each time period." + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods", + "title": "System Dynamics in health and care", + "section": "Longer Time Periods", + "text": "Longer Time Periods\nMore logic required if working in weeks or months - can only be in one place at any given time\n\n# flag for occupancy\nactivity_period <- case_when(\n \n # creates 1 flag if resident for complete week\n df$date_admit < keydates$wk_start[i] & df$date_discharge > keydates$wk_end[i] ~ 1,\n TRUE ~ 0)\n\n\nAnd a little bit more logic\nOccupancy requires the patient to have been admitted before the start of the week/month, and discharged after the end of the week/month" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data", + "title": "System Dynamics in health and care", + "section": "Applying the data", + "text": "Applying the data\n\n\nHow to apply this wrangling of data to the system dynamic model?\nAdmissions data used as an input to the flow - could be reduced to a single figure (average), or there may be variation by season/day of week etc.\nOccupancy (and discharges) used to verify the model output against known data." + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps", + "title": "System Dynamics in health and care", + "section": "Next Steps", + "text": "Next Steps\n\nGeneralise function to a state where it can be used by others - onto Github\nTurn this into a package\nOpen-source SD models and interfaces - R Shiny or Python" + }, + { + "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions", + "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions", + "title": "System Dynamics in health and care", + "section": "Questions, comments, suggestions?", + "text": "Questions, comments, suggestions?\n\n\n\nPlease get in touch!\n\nSally.Thompson37@nhs.net\n\n\n\nNHS-R conference 2023" + }, + { + "objectID": "presentations/2024-05-16_store-data-safely/index.html#why", + "href": "presentations/2024-05-16_store-data-safely/index.html#why", + "title": "Store Data Safely", + "section": "Why?", + "text": "Why?\nBecause:\n\ndata may be sensitive\nGitHub was designed for source control of code\nGitHub has repository file-size limits\nit makes data independent from code\nit prevents repetition" + }, + { + "objectID": "presentations/2024-05-16_store-data-safely/index.html#other-approaches", + "href": "presentations/2024-05-16_store-data-safely/index.html#other-approaches", + "title": "Store Data Safely", + "section": "Other approaches", + "text": "Other approaches\nTo prevent data commits:\n\nuse a .gitignore file (*.csv, etc)\nuse Git hooks\navoid ‘add all’ (git add .) when staging\nensure thorough reviews of (small) pull-requests" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests", - "title": "Unit testing in R", - "section": "Different types of functional tests", - "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\n\n\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements." + "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data", + "href": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data", + "title": "Store Data Safely", + "section": "What if I committed data?", + "text": "What if I committed data?\n‘It depends’, but if it’s sensitive:\n\n‘undo’ the commit with git reset\nuse a tool like BFG to expunge the file from Git history\ndelete the repo and restart 🔥\n\nA data security breach may have to be reported." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1", - "title": "Unit testing in R", - "section": "Different types of functional tests", - "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nUnit, Integration, and E2E testing are all things we can automate in code, whereas UAT testing is going to be manual" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions", + "href": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions", + "title": "Store Data Safely", + "section": "Data-hosting solutions", + "text": "Data-hosting solutions\nWe’ll talk about two main options for The Strategy Unit:\n\nPosit Connect and the {pins} package\nAzure Data Storage\n\nWhich to use? It depends." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2", - "title": "Unit testing in R", - "section": "Different types of functional tests", - "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nOnly focussing on unit testing in this talk, but the techniques/packages could be extended to integration testing. Often other tools (potentially specific tools) are needed for E2E testing." + "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit", + "href": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit", + "title": "Store Data Safely", + "section": "A platform by Posit", + "text": "A platform by Posit\n\n\nhttps://connect.strategyunitwm.nhs.uk/" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example", - "title": "Unit testing in R", - "section": "Example", - "text": "Example\nWe have a {shiny} app which grabs some data from a database, manipulates the data, and generates a plot.\n\n\nwe would write unit tests to check the data manipulation and plot functions work correctly (with pre-created sample/simple datasets)\nwe would write integration tests to check that the data manipulation function works with the plot function (with similar data to what we used for the unit tests)\nwe would write e2e tests to ensure that from start to finish the app grabs the data and produces a plot as required\n\n\n\nsimple (unit tests) to complex (e2e tests)" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit", + "href": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit", + "title": "Store Data Safely", + "section": "A package by Posit", + "text": "A package by Posit\n\n\nhttps://pins.rstudio.com/" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid", - "title": "Unit testing in R", - "section": "Testing Pyramid", - "text": "Testing Pyramid\n\n\nImage source: The Testing Pyramid: Simplified for One and All headspin.io" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#basic-approach", + "href": "presentations/2024-05-16_store-data-safely/index.html#basic-approach", + "title": "Store Data Safely", + "section": "Basic approach", + "text": "Basic approach\ninstall.packages(\"pins\")\nlibrary(pins)\n\nboard_connect()\npin_write(board, data, \"pin_name\")\npin_read(board, \"user_name/pin_name\")" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function", - "title": "Unit testing in R", - "section": "Let’s create a simple function…", - "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#live-demo", + "href": "presentations/2024-05-16_store-data-safely/index.html#live-demo", + "title": "Store Data Safely", + "section": "Live demo", + "text": "Live demo\n\nLink RStudio to Posit Connect (authenticate)\nConnect to the board\nWrite a new pin\nCheck pin status and details\nPin versions\nUse pinned data\nUnpin your pin" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1", - "title": "Unit testing in R", - "section": "Let’s create a simple function…", - "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it", + "href": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it", + "title": "Store Data Safely", + "section": "Should I use it?", + "text": "Should I use it?\n\n\n⚠️ {pins} is not great because:\n\nyou should not upload sensitive data!\nthere’s a file-size upload limit\npin organisation is a bit awkward (no subfolders)\n\n\n{pins} is helpful because:\n\nauthentication is straightforward\ndata can be versioned\nyou can control permissions\nthere are R and Python versions of the package" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2", - "title": "Unit testing in R", - "section": "Let’s create a simple function…", - "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}\n\n\nThe Ten Rules of Defensive Programming in R" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage", + "href": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage", + "title": "Store Data Safely", + "section": "What is Azure Data Storage?", + "text": "What is Azure Data Storage?\nMicrosoft cloud storage for unstructured data or ‘blobs’ (Binary Large Objects): data objects in binary form that do not necessarily conform to any file format.\nHow is it different?\n\nNo hierarchy – although you can make pseudo-‘folders’ with the blobnames.\nAuthenticates with your Microsoft account." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage", + "href": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage", + "title": "Store Data Safely", + "section": "Authenticating to Azure Data Storage", + "text": "Authenticating to Azure Data Storage\n\nYou are all part of the “strategy-unit-analysts” group; this gives you read/write access to specific Azure storage containers.\nYou can store sensitive information like the container ID in a local .Renviron or .env file that should be ignored by git.\nUsing {AzureAuth}, {AzureStor} and your credentials, you can connect to the Azure storage container, upload files and download them, or read the files directly from storage!" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables", + "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables", + "title": "Store Data Safely", + "section": "Step 1: load your environment variables", + "text": "Step 1: load your environment variables\nStore sensitive info in an .Renviron file that’s kept out of your Git history! The info can then be loaded in your script.\n.Renviron:\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nScript:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\nTip: reload .Renviron with readRenviron(\".Renviron\")" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1", + "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1", + "title": "Store Data Safely", + "section": "Step 1: load your environment variables", + "text": "Step 1: load your environment variables\nIn the demo script we are providing, you will need these environment variables:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure", + "href": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure", + "title": "Store Data Safely", + "section": "Step 2: Authenticate with Azure", + "text": "Step 2: Authenticate with Azure\n\n\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\nThe first time you do this, you will have link to authenticate in your browser and a code in your terminal to enter. Use the browser that works best with your @mlcsu.nhs.uk account!" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container", + "href": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container", + "title": "Store Data Safely", + "section": "Step 3: Connect to container", + "text": "Step 3: Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\nIf you get 403 error, delete your token and re-authenticate, try a different browser/incognito, etc.\nTo clear Azure tokens: AzureAuth::clean_token_directory()" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5", - "title": "Unit testing in R", - "section": "… and create our first test", - "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})\n\nTest passed 😸" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container", + "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container", + "title": "Store Data Safely", + "section": "Interact with the container", + "text": "Interact with the container\nIt’s possible to interact with the container via your browser!\nYou can upload and download files using the Graphical User Interface (GUI), login with your @mlcsu.nhs.uk account: https://portal.azure.com/#home\nAlthough it’s also cooler to interact via code… 😎" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions", - "title": "Unit testing in R", - "section": "other expect_*() functions…", - "text": "other expect_*() functions…\n\ntest_that(\"my_function correctly divides values\", {\n expect_lt(\n my_function(4, 2),\n 10\n )\n expect_gt(\n my_function(1, 4),\n 0.2\n )\n expect_length(\n my_function(c(4, 1), c(2, 4)),\n 2\n )\n})\n\nTest passed 🎉\n\n\n\n{testthat} documentation" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1", + "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1", + "title": "Store Data Safely", + "section": "Interact with the container", + "text": "Interact with the container\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(\n container,\n \"LOCAL_FOLDERNAME/*\",\n \"FOLDERNAME_ON_AZURE\"\n)\n\n# Upload specific file to container\nAzureStor::storage_upload(\n container,\n \"data/ronald.jpeg\",\n \"newdir/ronald.jpeg\"\n)" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert", - "title": "Unit testing in R", - "section": "Arrange, Act, Assert", - "text": "Arrange, Act, Assert\n\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n # \n #\n #\n\n # act\n #\n\n # assert\n #\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container", + "href": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container", + "title": "Store Data Safely", + "section": "Load csv files directly from Azure container", + "text": "Load csv files directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by storing it in memory)\n\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\n\nparq_df <- arrow::read_parquet(parquet_in_memory)" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1", - "title": "Unit testing in R", - "section": "Arrange, Act, Assert", - "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\n\n\nto create sample values\ncreate fake/temporary files\nset random seed\nset R options/environment variables\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n #\n\n # assert\n #\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2", + "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2", + "title": "Store Data Safely", + "section": "Interact with the container", + "text": "Interact with the container\n# Delete from Azure container (!!!)\nAzureStor::delete_storage_file(container, BLOB_NAME)" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2", - "title": "Unit testing in R", - "section": "Arrange, Act, Assert", - "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n #\n})" + "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve", + "href": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve", + "title": "Store Data Safely", + "section": "What does this achieve?", + "text": "What does this achieve?\n\nData is not in the repository, it is instead stored in a secure location\nCode can be open – sensitive information like Azure container name stored as environment variables\nLarge filesizes possible, other people can also access the same container.\nNaming conventions can help to keep blobs organised (these create pseudo-folders)\n\n\n\n\nLearn more about Data Science at The Strategy Unit" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3", - "title": "Unit testing in R", - "section": "Arrange, Act, Assert", - "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\nwe assert that the actual results match our expected results\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})" + "objectID": "blogs/posts/2023-04-26_alternative_remotes.html", + "href": "blogs/posts/2023-04-26_alternative_remotes.html", + "title": "Alternative remote repositories", + "section": "", + "text": "It’s great when someone send’s you a pull request on GitHub to fix bugs or add new features to your project, but you probably always want to check the other persons work in someway before merging that pull request.\nAll of the steps below are intended to be entered via a terminal.\nLet’s imagine that we have a GitHub account called example and a repository called test, and we use https rather than ssh.\n$ git remote get-url origin\n# https://github.com/example/test.git\nNow, let’s say we have someone who has submitted a Pull Request (PR), and their username is friend. We can add a new remote for their fork with\n$ git remote add friend https://github.com/friend/test.git\nHere, I name the remote exactly as per the persons GitHub username for no other reason than making it easier to track things later on. You could name this remote whatever you like, but you will need to make sure that the remote url matches their repository correctly.\nWe are now able to checkout their remote branch. First, we will want to fetch their work:\n# make sure to replace the remote name to what you set it to before\n$ git fetch friend\nNow, hopefully they have commited to a branch with a name that you haven’t used. Let’s say they created a branch called my_work. You can then simply run\n$ git switch friend/my_work\nThis should checkout the my_work branch locally for you.\nNow, if they have happened to use a branch name that you are already using, or more likely, directly commited to their own main branch, you will need to do checkout to a new branch:\n# replace friend as above to be the name of the remote, and main to be the branch\n# that they have used\n# replace their_work with whatever you want to call this branch locally\n$ git checkout friend/main -b their_work\nYou are now ready to run their code and check everything is good to merge!\nFinally, If you want to clean up your local repository you can remove the new branch that you checked out and the new remote with the following steps:\n# switch back to one of your branches, e.g. main\n$ git checkout main\n\n# then remove the branch that you created above\n$ git branch -D their_work\n\n# you can remove the remote\n$ git remote remove friend" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed", - "title": "Unit testing in R", - "section": "Our test failed!?! 😢", - "text": "Our test failed!?! 😢\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})\n\n── Failure: my_function works ──────────────────────────────────────────────────\n`actual` not equal to `expected`.\n1/1 mismatches\n[1] 0.714 - 0.714 == 7.14e-07\n\n\nError:\n! Test failed" + "objectID": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html", + "href": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html", + "title": "Advent of Code and Test Driven Development", + "section": "", + "text": "Advent of Code is an annual event, where daily coding puzzles are released from 1st – 24th December. We ran one of our fortnightly Coffee & Coding sessions introducing Advent of Code to people who code in the Strategy Unit, as well as the concept of test-driven development as a potential way of approaching the puzzles.\nTest-driven development (TDD) is an approach to coding which involves writing the test for a function BEFORE we write the function. This might seem quite counterintuitive, but it makes it easier to identify bugs 🐛 when they are introduced to our code, and ensures that our functions meet all necessary criteria. From my experience, this takes quite a long time to implement and can be quite tedious, but it is definitely worth it overall, especially as your project develops. Testing is also recommended in the NHS Reproducible Analytical Pipeline (RAP) guidelines.\nAn interesting thing to note about TDD is that we’re always expecting our first test to fail, and indeed failing tests are useful and important! If we wrote tests that just passed all the time, this would not be useful at all for our code.\nThe way that Advent of Code is structured, with test data for each puzzle and an expected test result, makes it very amenable to a test-driven approach. In order to support this, Matt and I created template repositories for a test-driven approach to Advent of Code, in Python and in R.\nOur goal when setting this up was to introduce others in the Strategy Unit to both TDD and Advent of Code. Advent of code can be challenging and I personally struggle to get past the first week, but it encourages creative (and maybe even fun?!) approaches to coding problems. I’m glad that we had the chance to explore some of the puzzles together in Coffee & Coding – it was interesting to see so many different approaches to the same problem, and hopefully it also gave us all the chance to practice writing tests." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue", - "title": "Unit testing in R", - "section": "Tolerance to the rescue 🙂", - "text": "Tolerance to the rescue 🙂\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected, tolerance = 1e-6)\n})\n\nTest passed 🎊\n\n\n\n(this is a slightly artificial example, usually the default tolerance is good enough)" + "objectID": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html", + "href": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html", + "title": "Data Science @ The Strategy Unit", + "section": "", + "text": "import os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases", - "title": "Unit testing in R", - "section": "Testing edge cases", - "text": "Testing edge cases\n\n\nRemember the validation steps we built into our function to handle edge cases?\n\nLet’s write tests for these edge cases:\nwe expect errors\n\n\ntest_that(\"my_function works\", {\n expect_error(my_function(5, 0))\n expect_error(my_function(\"a\", 3))\n expect_error(my_function(3, \"a\"))\n expect_error(my_function(1:2, 4))\n})\n\nTest passed 🎊" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html", + "title": "RStudio Tips and Tricks", + "section": "", + "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example", - "title": "Unit testing in R", - "section": "Another (simple) example", - "text": "Another (simple) example\n\n\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\nConsider this function - there is branched logic, so we need to carefully design tests to validate the logic works as intended." + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding", + "title": "RStudio Tips and Tricks", + "section": "", + "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1", - "title": "Unit testing in R", - "section": "Another (simple) example", - "text": "Another (simple) example\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\n\ntest_that(\"it returns 'x' if x is bigger than y\", {\n expect_equal(my_new_function(4, 3), \"x\")\n})\n\nTest passed 🎉\n\ntest_that(\"it returns 'y' if y is bigger than x\", {\n expect_equal(my_new_function(3, 4), \"y\")\n expect_equal(my_new_function(3, 3), \"y\")\n})\n\nTest passed 🥳" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance", + "title": "RStudio Tips and Tricks", + "section": "Official guidance", + "text": "Official guidance\nPosit is the company who build and maintain RStudio. They host a number of cheatsheets on their website, including one for RStudio. They also have a more in-depth user guide." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests", - "title": "Unit testing in R", - "section": "How to design good tests", - "text": "How to design good tests\na non-exhaustive list\n\nconsider all the functions arguments,\nwhat are the expected values for these arguments?\nwhat are unexpected values, and are they handled?\nare there edge cases that need to be handled?\nhave you covered all of the different paths in your code?\nhave you managed to create tests that check the range of results you expect?" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette", + "title": "RStudio Tips and Tricks", + "section": "Command palette", + "text": "Command palette\nRStudio has a powerful built-in Command Palette, which is a special search box that gives instant access to features and settings without needing to find them in the menus. Many of the tips and tricks we discussed can be found by searching in the Palette. Open it with the keyboard shortcut Ctrl + Shift + P.\n\n\n\nOpening the Command Palette.\n\n\nFor example, let’s say you forgot how to restart R. If you open the Command Palette and start typing ‘restart’, you’ll see the option ‘Restart R Session’. Clicking it will do exactly that. Handily, the Palette also displays the keyboard shortcut (Control + Shift + F10 on Windows) as a reminder.\nAs for settings, a search for ‘rainbow’ in the Command Palette will find ‘Use rainbow parentheses’, an option to help prevent bracket-mismatch errors by colouring pairs of parentheses. What’s nice is that the checkbox to toggle the feature appears right there in the palette so you can change it immediately.\nI refer to menu paths and keyboard shortcuts in the rest of this post, but bear in mind that you can use the Command Palette instead." + }, + { + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#options", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#options", + "title": "RStudio Tips and Tricks", + "section": "Options", + "text": "Options\nIn general, most settings can be found under Tools > Global Options… and many of these are discussed in the rest of this post.\n\n\n\nAdjusting workspace and history settings.\n\n\nBut there’s a few settings in particular that we recommend you change to help maximise reproducibility and reduce the chance of confusion. Under General > Basic, uncheck ‘Restore .Rdata into workspace at startup’ and select ‘Never’ from the dropdown options next to ‘Save workspace to .Rdata on exit’. These options mean you start with the ‘blank slate’ of an empty environment when you open a project, allowing you to rebuild objects from scratch1." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests", - "title": "Unit testing in R", - "section": "But, why create tests?", - "text": "But, why create tests?\nanother non-exhaustive list\n\ngood tests will help you uncover existing issues in your code\nwill defend you from future changes that break existing functionality\nwill alert you to changes in dependencies that may have changed the functionality of your code\ncan act as documentation for other developers" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts", + "title": "RStudio Tips and Tricks", + "section": "Keyboard shortcuts", + "text": "Keyboard shortcuts\nYou can speed up day-to-day coding with keyboard shortcuts instead of clicking buttons in the interface.\nYou can see some available shortcuts in RStudio if you navigate to Help > Keyboard Shortcuts Help, or use the shortcut Alt + Shift + K (how meta). You can go to Help > Modify Keyboard Shortcuts… to search all shortcuts and change them to what you prefer2.\nWe discussed a number of handy shortcuts that we use frequently3. You can:\n\nre-indent lines to the appropriate depth with Control + I\nreformat code with Control + Shift + A\nturn one or more lines into a comment with Control + Shift + C\ninsert the pipe operator (%>% or |>4) with Control + Shift + M5\ninsert the assignment arrow (<-) with Alt + - (hyphen)\nhighlight a function in the script or console and press F1 to open the function documentation in the ‘Help’ pane\nuse ‘Find in Files’ to search for a particular variable, function or string across all the files in your project, with Control + Shift + F" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions", - "title": "Unit testing in R", - "section": "Testing complex functions", - "text": "Testing complex functions\n\n\n\nmy_big_function <- function(type) {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n df <- tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n\n conditions <- read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date) |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}\n\n\nWhere do you even begin to start writing tests for something so complex?\n\n\nNote: to get the code on the left to fit on one page, I skipped including a few library calls\n\nlibrary(tidyverse)\nlibrary(DBI)" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes", + "title": "RStudio Tips and Tricks", + "section": "Themes", + "text": "Themes\nYou can change a number of settings to alter RStudio’s theme, colours and fonts to whatever you desire.\nYou can change the default theme in Tools > Global Options… > Appearance > Editor theme and select one from the pre-installed list. You can upload new themes by clicking the ‘Add’ button and selecting a theme from your computer. They typically have the file extension .rsthemes and can be downloaded from the web, or you can create or tweak one yourself. The {rsthemes} package has a number of options and also allows you to switch between themes and automatically switch between light and dark themes depending on the time of day.\n\n\n\nCustomising the appearance and font.\n\n\nIn the same ‘Appearance’ submenu as the theme settings, you can find an option to change fonts. Monospace fonts, ones where each character takes up the same width, will appear here automatically if you’ve installed them on your computer. One popular font for coding is Fira Code, which has the special property of converting certain sets of characters into ‘ligatures’, which some people find easier to read. For example, the base pipe will appear as a rightward-pointing arrow rather than its constituent vertical-pipe and greater-than symbol (|>)." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions", - "title": "Unit testing in R", - "section": "Split the logic into smaller functions", - "text": "Split the logic into smaller functions\nFunction to get the data from the database\n\nget_data_from_sql <- function() {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n}" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes", + "title": "RStudio Tips and Tricks", + "section": "Panes", + "text": "Panes\n\nLayout\nThe structural layout of RStudio’s panes can be adjusted. One simple thing you can do is minimise and maximise each pane by clicking the window icons in their upper-right corners. This is useful when you want more screen real-estate for a particular pane.\nYou can move pane loations too. Click the ‘Workspace Panes’ button (a square with four more inside it) at the top of the IDE to see a number of settings. For example, you can select ‘Console on the right’ to move the R console to the upper-right pane, which you may prefer for maximimsing the vertical space in which code is shown. You could also click Pane Layout… in this menu to be taken to Tools > Global Options… > Pane layout, where you can click ‘Add Column’ to insert new script panes that allow you to inspect and write multiple files side-by-side.\n\n\nScript navigation\nThe script pane in particular has a nice feature for navigating through sections of your script or Quarto/R Markdown files. Click the ‘Show Document Outline’ button or use the keyboard shortcut Control + Shift + O to slide open a tray that provides a nice indented list of all the sections and function defintions in your file.\nSection headers are auto-detected in a Quarto or R Markdown document wherever the Markdown header markup has been used: one hashmark (#) for a level 1 header, two for level 2, and so on. To add section headers to an R Script, add at least four hyphens after a commented line that starts with #. Use two or more hashes at the start of the comment to increase the nestedness of that section.\n\n# Header ------------------------------------------------------------------\n\n## Section ----\n\n### Subsection ----\n\nNote that Ctrl + Shift + R will open a dialog box for you to input the name of a section header, which will be inserted and automatically padded to 75 characters to provide a strong visual cue between sections.\nAs well as the document outline, there’s also a reminder in the lower-left of the script pane that gives the name of the section that your cursor is currently in. A symbol is also shown: a hashmark means it’s a headed section and an ‘f’ means it’s a function definition. You can click this to jump to other sections.\n\n\n\nNavigating with headers in the R script pane.\n\n\n\n\nBackground jobs\nPerhaps an under-used pane is ‘Background jobs’. This is where you can run a separate R process that keeps your R console free. Go to Tools > Background Jobs > Start Background Job… to expose this tab if it isn’t already listed alongside the R console.\nWhy might you want to do this? As I write this post, there’s a background process to detect changes to the Quarto document that I’m writing and then update a preview I have running in the browser. You can do something similar for Shiny apps. You can continue to develop your app and test things in the console and the app preview will update on save. You won’t need to keep hitting the ‘Render’ or ‘Run app’ button every time you make a change." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1", - "title": "Unit testing in R", - "section": "Split the logic into smaller functions", - "text": "Split the logic into smaller functions\nFunction to get the relevant conditions\n\nget_conditions <- function(type) {\n read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n}" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand", + "title": "RStudio Tips and Tricks", + "section": "Magic wand", + "text": "Magic wand\nThere’s a miscellany of useful tools available when you click the ‘magic wand’ button in the script pane.\n\n\n\nAbracadabra! Casting open the ‘magic wand’ menu.\n\n\nThis includes:\n\n‘Rename in Scope’, which is like find-and-replace but you only change instances with the same ‘scope’, so you could select the variable x, go to Rename in Scope and then you can edit all instances of the variable in the document and change them at the same time (e.g. to rename them)\n‘Reflow Comment’, which you can click after higlighting a comments block to have the comments automatically line-break at the maximum width\n‘Insert Roxygen Skeleton’, which you can click when your cursor is inside the body of a function you’ve written and a {roxygen2} documentation template will be added above your function with the @params argument names pre-filled\n\nAlong with ‘Comment/Uncomment Lines’, ‘Reindent Lines’ and ‘Reformat Lines’, mentioned above in the keyboard shortcuts section." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2", - "title": "Unit testing in R", - "section": "Split the logic into smaller functions", - "text": "Split the logic into smaller functions\nFunction to combine the data and create a count by date\n\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up", + "title": "RStudio Tips and Tricks", + "section": "Wrapping up", + "text": "Wrapping up\nTime was limited in our discussion. There are so many more tips and tricks that we didn’t get to. Let us know what we missed, or what your favourite shortcuts and settings are." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3", - "title": "Unit testing in R", - "section": "Split the logic into smaller functions", - "text": "Split the logic into smaller functions\nFunction to generate a plot from the summarised data\n\ncreate_plot <- function(df) {\n df |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}" + "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes", + "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes", + "title": "RStudio Tips and Tricks", + "section": "Footnotes", + "text": "Footnotes\n\n\nFor the same reason it’s a good idea to restart R on a frequent basis. You may assume that an object x in your environment was made in a certain way and contains certain information, but does it? What if you overwrote it at some point and forgot? Best to wipe the slate clean and rebuild it from scratch. Jenny Bryan has written an explainer.↩︎\nYou can ‘snap focus’ to the script and console panes with the pre-existing shortcuts Control + 1 and Control + 2. My next most-used pane is the terminal, so I’ve re-mapped the shortcut to Control + 3.↩︎\nThe classic shortcuts of select-all (Control + A), cut (Control + X), copy Control + C, paste (Control + V), undo (Control + Z) and redo (Control + Shift + Z) are all available when editing.↩︎\nNote that you can set the default pipe to the base-R version (|>) by checking the box at Tools > Global Options… > Code > Use native pipe operator↩︎\nProbably ‘M’ for {magrittr}, the name of the package that contains the %>% incarnation of the operator.↩︎" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4", - "title": "Unit testing in R", - "section": "Split the logic into smaller functions", - "text": "Split the logic into smaller functions\nThe original function refactored to use the new functions\n\nmy_big_function <- function(type) {\n conditions <- get_conditions(type)\n\n get_data_from_sql() |>\n summarise_data(conditions) |>\n create_plot()\n}\n\n\nThis is going to be significantly easier to test, because we now can verify that the individual components work correctly, rather than having to consider all of the possibilities at once." + "objectID": "blogs/posts/2023-04-26-reinstalling-r-packages.html", + "href": "blogs/posts/2023-04-26-reinstalling-r-packages.html", + "title": "Reinstalling R Packages", + "section": "", + "text": "R 4.3.0 was released last week. Anytime you update R you will probably find yourself in the position where no packages are installed. This is by design - the packages that you have installed may need to be updated and recompiled to work under new versions of R.\nYou may find yourself wanting to have all of the packages that you previously used, so one approach that some people take is to copy the previous library folder to the new versions folder. This isn’t a good idea and could potentially break your R install.\nAnother approach would be to export the list of packages in R before updating and then using that list after you have updated R. This can cause issues though if you install from places other than CRAN, e.g. bioconductor, or from GitHub.\nSome of these approaches are discussed on the RStudio Community Forum. But I prefer an approach of having a “spring clean”, instead only installing the packages that I know that I need.\nI maintain a list of the packages that I used as a gist. Using this, I can then simply run this script on any new R install. In fact, if you click the “raw” button on the gist, and copy that url, you can simply run\nsource(\"https://gist.githubusercontent.com/tomjemmett/c105d3e0fbea7558088f68c65e68e1ed/raw/a1db4b5fa0d24562d16d3f57fe8c25fb0d8aa53e/setup.R\")\nGenerally, sourcing a url is a bad idea - the reason for this is if it’s not a link that you control, then someone could update the contents and run arbritary code on your machine. In this case, I’m happy to run this as it’s my own gist, but you should be mindful if running it yourself!\nIf you look at the script I first install a number of packages from CRAN, then I install packages that only exist on GitHub." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}" + "objectID": "blogs/index.html", + "href": "blogs/index.html", + "title": "Data Science Blog", + "section": "", + "text": "Storing data safely\n\n\n\n\n\n\nlearning\n\n\nR\n\n\nPython\n\n\n\n\n\n\n\n\n\nMay 22, 2024\n\n\nYiWen Hon, Matt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nOne year of coffee & coding\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nMay 13, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nRStudio Tips and Tricks\n\n\n\n\n\n\nlearning\n\n\nR\n\n\n\n\n\n\n\n\n\nMar 21, 2024\n\n\nMatt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nVisualising participant recruitment in R using Sankey plots\n\n\n\n\n\n\nlearning\n\n\ntutorial\n\n\nvisualisation\n\n\nR\n\n\n\n\n\n\n\n\n\nFeb 28, 2024\n\n\nCraig Parylo\n\n\n\n\n\n\n\n\n\n\n\n\nNearest neighbour imputation\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 17, 2024\n\n\nJacqueline Grout\n\n\n\n\n\n\n\n\n\n\n\n\nAdvent of Code and Test Driven Development\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 10, 2024\n\n\nYiWen Hon\n\n\n\n\n\n\n\n\n\n\n\n\nReinstalling R Packages\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nAlternative remote repositories\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nCreating a hotfix with git\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nMar 24, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\nNo matching items" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\ntest_that(\"it summarises the data\", {\n # arrange\n \n\n\n\n\n\n\n \n\n \n # act\n \n # assert\n \n})" + "objectID": "about.html", + "href": "about.html", + "title": "About", + "section": "", + "text": "The Data Science team at the Strategy Unit comprises the following team members:\n\nChris Beeley\nMatt Dray\nOzayr Mohammed\nRhian Davies\nTom Jemmett\nYiWen Hon\n\nCurrent and previous projects of note include:\n\nWork supporting the New Hospitals Programme, including building a model for predicting the demand and capacity requirements of hospitals in the future, and a tool for mapping the evidence on this topic.\nThe Patient Experience Qualitative Data Categorisation project\nWork supporting the wider analytical community, through events/communities such as NHS-R and HACA." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n \n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nGenerate some random data to build a reasonably sized data frame.\nYou could also create a table manually, but part of the trick of writing good tests for this function is to make it so the dates don’t all have the same count.\nThe reason for this is it’s harder to know for sure that the count worked if every row returns the same value.\nWe don’t need the values to be exactly like they are in the real data, just close enough. Instead of dates, we can use numbers, and instead of actual conditions, we can use letters." + "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html", + "href": "blogs/posts/2023-03-24_hotfix-with-git.html", + "title": "Creating a hotfix with git", + "section": "", + "text": "I recently discovered a bug in a code-base which needed to be fixed and deployed back to production A.S.A.P., but since the last release the code has moved on significantly. The history looks something a bit like:\nThat is, we have a tag which is the code that is currently in production (which we need to patch), a number of commits after that tag to main (which were separate branches merged via pull requests), and a current development branch.\nI need to somehow: 1. go back to the tagged release, 2. check that code out, 3. patch that code, 4. commit this change, but insert the commit before all of the new commits after the tag\nThere are at least two ways that I know to do this, one would be with an interactive rebase, but I used a slightly longer method, but one I feel is a little less likely to get wrong.\nBelow are the step’s that I took. One thing I should note is this worked well for my particular issue because the change didn’t cause any merge conflicts later on." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nTests need to be reproducible, and generating our table at random will give us unpredictable results.\nSo, we need to set the random seed; now every time this test runs we will generate the same data." + "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase", + "href": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase", + "title": "Creating a hotfix with git", + "section": "Fixing my codebase", + "text": "Fixing my codebase\nFirst, we need to checkout the tag\ngit checkout -b hotfix v0.2.0\nThis creates a new branch called hotfix off of the tag v0.2.0.\nNow that I have the code base checked out at the point I need to fix, I can make the change that is needed, and commit the change\ngit add [FILENAME]\ngit commit -m \"fixes the code\"\n(Obviously, I used the actual file name and gave a better commit message. I Promise 😝)\nNow my code is fixed, I create a new tag for this “release”, as well as push the code to production (this step is omitted here)\ngit tag v0.2.1 -m \"version 0.2.0\"\nAt this point, our history looks something like\n\n\n\n\n\n\n\n\n\nWhat we want to do is break the link between main and v0.2.0, instead attaching tov0.2.1. First though, I want to make sure that if I make a mistake, I’m not making it on the main branch.\ngit checkout main\ngit checkout -b apply-hotfix\nThen we can fix our history using the rebase command\ngit rebase hotfix\nWhat this does is it rolls back to the point where the branch that we are rebasing (apply-hotfix) and the hotfix branch both share a common commit (v0.2.0 tag). It then applies the commits in the hotfix branch, before reapplying the commits from apply-hotfix (a.k.a. the main branch).\nOne thing to note, if you have any merge conflicts created by your fix, then the rebase will stop and ask you to fix the merge conflicts. There is some information in the GitHub doc’s for [resolving merge conflicts after a Git rebase][2].\n[2]: https://docs.github.com/en/get-started/using-git/resolving-merge-conflicts-after-a-git-rebase\nAt this point, we can check that the commit history looks correct\ngit log v0.2.0..HEAD\nIf we are happy, then we can apply this to the main branch. I do this by renaming the apply-hotfix branch as main. First, you have to delete the main branch to allow us to rename the branch.\ngit branch -D main\ngit branch -m main\nWe also need to update the other branches to use the new main branch\ngit checkout branch\ngit rebase main\nNow, we should have a history like" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n\n\n\n # act\n \n # assert\n \n})\n\nCreate the conditions table. We don’t need all of the columns that are present in the real csv, just the ones that will make our code work.\nWe also need to test that the filtering join (semi_join) is working, so we want to use a subset of the conditions that were used in df." + "objectID": "blogs/posts/2024-05-13_one-year-coffee-code.html", + "href": "blogs/posts/2024-05-13_one-year-coffee-code.html", + "title": "One year of coffee & coding", + "section": "", + "text": "The data science team have been running coffee & coding sessions for just over a year now. When I joined that Strategy Unit, I was really pleased to see these sessions running as I think making time to discuss and share technical knowledge is highly valuable, especially as an organisation grows.\nCoffee and coding sessions run every two weeks and usually take the form of a short presentation, followed by a discussion. Although we have had a variety of different sessions including live coding demos and show and tell for projects.\nWe figured it would be a good idea to do a quick survey of attendees to make sure that the sessions were beneficial and see if there were any suggestions for future sessions. We had 11 responses, all of which were really positive, with 90% agreeing that the sessions are interesting, and over 80% saying that they learn new things. Respondents said that the sessions were well varied across the technical spectrum and that they “almost always learn something useful”.\nThe two main themes of the results were that sessions were inclusive and sparked collaboration. ✨\n\nI like that everyone can contribute\n\n\nIt’s great seeing what else people are doing\n\n\nI get more ideas for future projects\n\nSome of the main suggestions included more content for newer programmers and encouraging the wider analytical team to share real project examples.\nSo with that, why not consider presenting? The sessions are informal and everyone is welcome to contribute. If you’ve got something to share, please let a member of the data science team know.\nAs a reminder, materials for our previous sessions are available under Presentations." }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n \n\n \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nBecause we are generating df randomly, to figure out what our “expected” results are, I simply ran the code inside of the test to generate the “actual” results.\nGenerally, this isn’t a good idea. You are creating the results of your test from the code; ideally, you want to be thinking about what the results of your function should be.\nImagine your function doesn’t work as intended, there is some subtle bug that you are not yet aware of. By writing tests “backwards” you may write test cases that confirm the results, but not expose the bug. This is why it’s good to think about edge cases." + "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html", + "href": "blogs/posts/2024-05-22-storing-data-safely/index.html", + "title": "Storing data safely", + "section": "", + "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n ) \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nThat said, in cases where we can be confident (say by static analysis of our code) that it is correct, building tests in this way will give us the confidence going forwards that future changes do not break existing functionality.\nIn this case, I have created the expected data frame using the results from running the function." + "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding", + "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding", + "title": "Storing data safely", + "section": "", + "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7", - "title": "Unit testing in R", - "section": "Let’s test summarise_data", - "text": "Let’s test summarise_data\n\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\"))\n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n )\n # act\n actual <- summarise_data(df, conditions)\n # assert\n expect_equal(actual, expected)\n})\n\nTest passed 😸\n\n\n\nThe test works!" + "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins", + "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins", + "title": "Storing data safely", + "section": "Posit Connect Pins", + "text": "Posit Connect Pins\n\n# A brief intro to using {pins} to store, version, share and protect a dataset\n# on Posit Connect. Documentation: https://pins.rstudio.com/\n\n\n# Setup -------------------------------------------------------------------\n\n\ninstall.packages(c(\"pins\",\"dplyr\")) # if not yet installed\n\nsuppressPackageStartupMessages({\n library(pins)\n library(dplyr) # for wrangling and the 'starwars' demo dataset\n})\n\nboard <- board_connect() # will error if you haven't authenticated before\n# Error in `check_auth()`: ! auth = `auto` has failed to find a way to authenticate:\n# • `server` and `key` not provided for `auth = 'manual'`\n# • Can't find CONNECT_SERVER and CONNECT_API_KEY envvars for `auth = 'envvar'`\n# • rsconnect package not installed for `auth = 'rsconnect'`\n# Run `rlang::last_trace()` to see where the error occurred.\n\n# To authenticate\n# In RStudio: Tools > Global Options > Publishing > Connect... > Posit Connect\n# public URL of the Strategy Unit Posit Connect Server: connect.strategyunitwm.nhs.uk\n# Your browser will open to the Posit Connect web page and you're prompted to\n# for your password. Enter it and you'll be authenticated.\n\n# Once authenticated\nboard <- board_connect()\n# Connecting to Posit Connect 2024.03.0 at\n# <https://connect.strategyunitwm.nhs.uk>\n\nboard |> pin_list() # see all the pins on that board\n\n\n# Create a pin ------------------------------------------------------------\n\n\n# Write a dataset to the board as a pin\nboard |> pin_write(\n x = starwars,\n name = \"starwars_demo\"\n)\n# Guessing `type = 'rds'`\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_exists(\"starwars_demo\")\n# ! Use a fully specified name including user name: \"matt.dray/starwars_demo\",\n# not \"starwars_demo\".\n# [1] TRUE\n\npin_name <- \"matt.dray/starwars_demo\"\n\nboard |> pin_exists(pin_name) # logical, TRUE/FALSE\nboard |> pin_meta(pin_name) # metadata, see also 'metadata' arg in pin_write()\nboard |> pin_browse(pin_name) # view the pin in the browser\n\n\n# Permissions -------------------------------------------------------------\n\n\n# You can let people see and edit a pin. Log into Posit Connect and select the\n# pin under 'Content'. In the 'Settings' panel on the right-hand side, adjust\n# the 'sharing' options in the 'Access' tab.\n\n\n# Overwrite and version ---------------------------------------------------\n\n\nstarwars_droids <- starwars |>\n filter(species == \"Droid\") # beep boop\n\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_versions(pin_name) # see version history\nboard |> pin_versions_prune(pin_name, n = 1) # remove history\nboard |> pin_versions(pin_name)\n\n# What if you try to overwrite the data but it hasn't changed?\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# ! The hash of pin \"matt.dray/starwars_demo\" has not changed.\n# • Your pin will not be stored.\n\n\n# Use the pin -------------------------------------------------------------\n\n\n# You can read a pin to your local machine, or access it from a Quarto file\n# or Shiny app hosted on Connect, for example. If the output and the pin are\n# both on Connect, no authentication is required; the board is defaulted to\n# the Posit Connect instance where they're both hosted.\n\nboard |>\n pin_read(pin_name) |> # like you would use e.g. read_csv\n with(data = _, plot(mass, height)) # wow!\n\n\n# Delete pin --------------------------------------------------------------\n\n\nboard |> pin_exists(pin_name) # logical, good function for error handling\nboard |> pin_delete(pin_name)\nboard |> pin_exists(pin_name)" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps", - "title": "Unit testing in R", - "section": "Next steps", - "text": "Next steps\n\nYou can add tests to any R project (to test functions),\nBut {testthat} works best with Packages\nThe R Packages book has 3 chapters on testing\nThere are two useful helper functions in {usethis}\n\nuse_testthat() will set up the folders for test scripts\nuse_test() will create a test file for the currently open script" + "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r", + "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r", + "title": "Storing data safely", + "section": "Azure Storage in R", + "text": "Azure Storage in R\nYou will need an .Renviron file with the four environment variables listed below for the code to work. This .Renviron file should be ignored by git. You can share the contents of .Renviron files with other team members via Teams, email, or Sharepoint.\nBelow is a sample .Renviron file\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nAZ_STORAGE_CONTAINER=container-name\nAZ_TENANT_ID=long-sequence-of-numbers-and-letters\nAZ_APP_ID=another-long-sequence-of-numbers-and-letters\n\ninstall.packages(c(\"AzureAuth\",\"AzureStor\", \"arrow\")) # if not yet installed\n\n# Load all environment variables\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")\n\n# Authenticate\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\n\n# If you have not authenticated before, you will be taken to an external page to\n# authenticate!Use your mlcsu.nhs.uk account.\n\n# Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\n\n# If you get a 403 error when trying to interact with the container, you may \n# have to clear your Azure token and re-authenticate using a different browser.\n# Use AzureAuth::clean_token_directory() to clear your token, then repeat the\n# AzureAuth::get_azure_token() step above.\n\n# Upload specific file to container\nAzureStor::storage_upload(container, \"data/ronald.jpeg\", \"newdir/ronald.jpeg\")\n\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(container, \"data/*\", \"newdir\")\n\n# Check files have uploaded\nblob_list <- AzureStor::list_blobs(container)\n\n# Load file directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by temporarily downloading file \n# and storing it in memory)\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\nparq_df <- arrow::read_parquet(parquet_in_memory)\n\n# Delete from Azure container (!!!)\nfor (blobfile in blob_list$name) {\n AzureStor::delete_storage_file(container, blobfile)\n}" }, { - "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1", - "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1", - "title": "Unit testing in R", - "section": "Next steps", - "text": "Next steps\n\nIf your test needs to temporarily create a file, or change some R-options, the {withr} package has a lot of useful functions that will automatically clean things up when the test finishes\nIf you are writing tests that involve calling out to a database, or you want to test my_big_function (from before) without calling the intermediate functions, then you should look at the {mockery} package" + "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python", + "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python", + "title": "Storing data safely", + "section": "Azure Storage in Python", + "text": "Azure Storage in Python\nThis will use the same environment variables as the R version, just stored in a .env file instead.\nWe didn’t cover this in the presentation, so it’s not in the slides, but the code should be self-explanatory.\n\n\nimport os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg" }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read", - "title": "Coffee and Coding", - "section": "Which is easier to read?", - "text": "Which is easier to read?\n\nae_attendances |>\n filter(org_code %in% c(\"RNA\", \"RL4\")) |>\n mutate(performance = 1 + breaches / attendances) |>\n filter(type == 1) |>\n mutate(met_target = performance >= 0.95)\n\nor\n\nae_attendances |>\n filter(\n org_code %in% c(\"RNA\", \"RL4\"),\n type == 1\n ) |>\n mutate(\n performance = 1 + breaches / attendances,\n met_target = performance >= 0.95\n )\n\n\n spending a few seconds to neatly format your code can greatly improve the legibility to future readers, making the intent of the code far clearer, and will make finding bugs easier to spot.\n\n\n (have you spotted the mistake in the snippets above?)" + "objectID": "blogs/posts/2024-01-17_nearest_neighbour.html", + "href": "blogs/posts/2024-01-17_nearest_neighbour.html", + "title": "Nearest neighbour imputation", + "section": "", + "text": "Recently I have been gathering data by GP practice, from a variety of different sources. The ultimate purpose of my project is to be able to report at an ICB/sub-ICB level1. The various datasets cover different timescales and consequently changes in GP practices over time have left me with mismatching datasets.\n1 An ICB (Integrated Care Board) is a statutory NHS organisation responsible for planning health services for their local populationsMy approach has been to take as the basis of my project a recent GP List. Later in my project I want to perform calculations at a GP practice level based on an underlying health need and the data for this need is a CHD prevalence value from a dataset that is around 8 years old, and for which there is no update or alternative. From my recent list of 6454 practices, when I match to the need dataset, I am left with 151 practices without a value for need. If I remove these practices from the analysis then this could impact the analysis by sub-ICB since often a group of practices in the same area could be subject to changes, mergers and reorganisation.\nHere’s the packages and some demo objects to work with to create an example for two practices:\n\n\nCode\n# Packages\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(tidygeocoder)\nlibrary(leaflet)\nlibrary(viridisLite)\nlibrary(gt)\n\n# Create some data with two practices with no need data \n# and a selection of practices locally with need data\npractices <- tribble(\n ~practice_code, ~postcode, ~has_orig_need, ~value,\n \"P1\",\"CV1 4FS\", 0, NA,\n \"P2\",\"CV1 3GB\", 1, 7.3,\n \"P3\",\"CV11 5TW\", 1, 6.9,\n \"P4\",\"CV6 3HZ\", 1, 7.1,\n \"P5\",\"CV6 1HS\", 1, 7.7,\n \"P6\",\"CV6 5DF\", 1, 8.2,\n \"P7\",\"CV6 3FA\", 1, 7.9,\n \"P8\",\"CV1 2DL\", 1, 7.5,\n \"P9\",\"CV1 4JH\", 1, 7.7,\n \"P10\",\"CV10 0GQ\", 1, 7.5,\n \"P11\",\"CV10 0JH\", 1, 7.8,\n \"P12\",\"CV11 5QT\", 0, NA,\n \"P13\",\"CV11 6AB\", 1, 7.6,\n \"P14\",\"CV6 4DD\", 1,7.9\n) \n\n# get domain of numeric data\n(domain <- range(practices$has_orig_need))\n\n# make a colour palette\npal <- colorNumeric(palette = viridis(2), domain = domain)\n\n\nTo provide a suitable estimate of need for the newer practices without values, all the practices in the dataset were geocoded2 using the geocode function from the {tidygeocoder} package.\n2 Geocoding is the process of converting addresses (often the postcode) into geographic coordinates (such as latitude and longitude) that can be plotted on a map.\npractices <- practices |>\n mutate(id = row_number()) |>\n geocode(postalcode = postcode) |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\n\nCode\npractices |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nhas_orig_need\nvalue\nid\ngeometry\n\n\n\n\nP1\nCV1 4FS\n0\nNA\n1\nc(-1.50686326666667, 52.4141089666667)\n\n\nP2\nCV1 3GB\n1\n7.3\n2\nc(-1.51888, 52.4034199)\n\n\nP3\nCV11 5TW\n1\n6.9\n3\nc(-1.46746, 52.519)\n\n\nP4\nCV6 3HZ\n1\n7.1\n4\nc(-1.52231, 52.42367)\n\n\nP5\nCV6 1HS\n1\n7.7\n5\nc(-1.52542, 52.41989)\n\n\nP6\nCV6 5DF\n1\n8.2\n6\nc(-1.498344825, 52.4250186)\n\n\nP7\nCV6 3FA\n1\n7.9\n7\nc(-1.51787, 52.43135)\n\n\nP8\nCV1 2DL\n1\n7.5\n8\nc(-1.49105, 52.40582)\n\n\nP9\nCV1 4JH\n1\n7.7\n9\nc(-1.50653, 52.41953)\n\n\nP10\nCV10 0GQ\n1\n7.5\n10\nc(-1.52197, 52.54074)\n\n\nP11\nCV10 0JH\n1\n7.8\n11\nc(-1.5163199, 52.53723)\n\n\nP12\nCV11 5QT\n0\nNA\n12\nc(-1.46927, 52.51899)\n\n\nP13\nCV11 6AB\n1\n7.6\n13\nc(-1.45822, 52.52682)\n\n\nP14\nCV6 4DD\n1\n7.9\n14\nc(-1.50832, 52.44104)\n\n\n\n\n\n\n\nThis map shows the practices, purple are the practices with no need data and yellow are practices with need data available.\n\n\nCode\n# make map to display practices\nleaflet(practices) |> \n addTiles() |>\n addCircleMarkers(color = ~pal(has_orig_need)) \n\n\n\n\n\n\nThe data was split into those with, and without, a value for need. Using st_join from the {sf} package to join those without, and those with, a value for need, using the geometry to find all those within 1500m (1.5km).\n\nno_need <- practices |>\n filter(has_orig_need == 0)\n\nwith_need <- practices |>\n filter(has_orig_need == 1)\n\n\nneighbours <- no_need |>\n select(no_need_postcode = postcode,no_need_prac_code=practice_code) |>\n st_join(with_need, st_is_within_distance, 1500) |>\n st_drop_geometry() |>\n select(id, no_need_postcode,no_need_prac_code) |>\n inner_join(x = with_need, by = join_by(\"id\")) \n\n\n\nCode\nleaflet(neighbours) |> \n addTiles() |>\n addCircleMarkers(color = \"purple\") |>\n addMarkers( -1.50686326666667, 52.4141089666667, popup = \"Practice with no data\"\n) |>\n addCircles(-1.50686326666667, 52.4141089666667,radius=1500) |>\n addMarkers(-1.46927, 52.51899, popup = \"Practice with no data\"\n) |>\naddCircles(-1.46927, 52.51899,radius=1500)\n\n\n\n\n\n\nThe data for the “neighbours” was grouped by the practice code of those without need data and a mean value was calculated for each practice to generate an estimated value.\n\nneighbours_estimate <- neighbours |>\n group_by(no_need_prac_code) |>\n summarise(need_est=mean(value)) |>\n st_drop_geometry(select(no_need_prac_code,need_est)) \n\nThe original data was joined back to the “neighbours”.\n\n practices_with_neighbours_estimate <- practices |>\n left_join(neighbours_estimate, join_by(practice_code==no_need_prac_code)) |>\n st_drop_geometry(select(practice_code,need_est))\n\n\n\nCode\n practices_with_neighbours_estimate |>\n select(-has_orig_need,-id) |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nvalue\nneed_est\n\n\n\n\nP1\nCV1 4FS\nNA\n7.583333\n\n\nP2\nCV1 3GB\n7.3\nNA\n\n\nP3\nCV11 5TW\n6.9\nNA\n\n\nP4\nCV6 3HZ\n7.1\nNA\n\n\nP5\nCV6 1HS\n7.7\nNA\n\n\nP6\nCV6 5DF\n8.2\nNA\n\n\nP7\nCV6 3FA\n7.9\nNA\n\n\nP8\nCV1 2DL\n7.5\nNA\n\n\nP9\nCV1 4JH\n7.7\nNA\n\n\nP10\nCV10 0GQ\n7.5\nNA\n\n\nP11\nCV10 0JH\n7.8\nNA\n\n\nP12\nCV11 5QT\nNA\n7.250000\n\n\nP13\nCV11 6AB\n7.6\nNA\n\n\nP14\nCV6 4DD\n7.9\nNA\n\n\n\n\n\n\n\nFinally, an updated data frame was created of the need data using the actual need for the practice where available, otherwise using estimated need.\n\npractices_with_neighbours_estimate <- practices_with_neighbours_estimate |>\n mutate(need_to_use = case_when(value>=0 ~ value,\n .default = need_est)) |>\n select(practice_code,need_to_use) \n\n\n\n\n\n\n\n\n\npractice_code\nneed_to_use\n\n\n\n\nP1\n7.583333\n\n\nP2\n7.300000\n\n\nP3\n6.900000\n\n\nP4\n7.100000\n\n\nP5\n7.700000\n\n\nP6\n8.200000\n\n\nP7\n7.900000\n\n\nP8\n7.500000\n\n\nP9\n7.700000\n\n\nP10\n7.500000\n\n\nP11\n7.800000\n\n\nP12\n7.250000\n\n\nP13\n7.600000\n\n\nP14\n7.900000\n\n\n\n\n\n\n\nFor my project, this method has successfully generated a prevalence for 125 of the 151 practices without a need value, leaving just 26 practices without a need. This is using a 1.5 km radius. In each use case there will be a decision to make regarding a more accurate estimate (smaller radius) and therefore fewer matches versus a less accurate estimate (using a larger radius) and therefore more matches.\nThis approach could be replicated for other similar uses/purposes. A topical example from an SU project is the need to assign population prevalence for hypertension and compare it to current QOF3 data. Again, the prevalence data is a few years old so we have to move the historical data to fit with current practices and this leaves missing data that can be estimated using this method.\n\n\n3 QOF (Quality and Outcomes Framework) is a voluntary annual reward and incentive programme for all GP practices in England, detailing practice achievement results." }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide", - "title": "Coffee and Coding", - "section": "Tidyverse Style Guide", - "text": "Tidyverse Style Guide\n\nGood coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread\n\n\nAll style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.\n\ntidyverse style guide" + "objectID": "blogs/posts/2024-02-28_sankey_plot.html", + "href": "blogs/posts/2024-02-28_sankey_plot.html", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "", + "text": "Sankey diagrams are great tools to visualise flows through a system. They show connections between the steps of a process where the width of the arrows is proportional to the flow.\nI’m working on an evaluation of a risk screening process for people aged between 55-74 years and a history of smoking. In this Targeted Lung Health Check (TLHC) programme1 eligible people are invited to attend a free lung check where those assessed at high risk of lung cancer are then offered low-dose CT screening scans.\n1 Please visit the NHS England site for for more background.We used Sankey diagrams to visualise how people have engaged with the programme, from recruitment, attendance at appointments, their outcome from risk assessment, attendance at CT scans and will eventually be extended to cover the impact of the screening on early detection of those diagnosed with lung cancer.\nThis blog post is about the technical process of preparing record-level data for visualisation in a Sankey plot using R and customising it to enhance look and feel. Here is how the finished product will look:" }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends", - "title": "Coffee and Coding", - "section": "{lintr} + {styler} are your new best friends", - "text": "{lintr} + {styler} are your new best friends\n\n\n{lintr}\n\n{lintr} is a static code analysis tool that inspects your code (without running it)\nit checks for certain classes of errors (e.g. mismatched { and (’s)\nit warns about potential issues (e.g. using variables that aren’t defined)\nit warns about places where you are not adhering to the code style\n\n\n{styler}\n\n{styler} is an RStudio add in that automatically reformats your code, tidying it up to match the style guide\n99.9% of the time it will give you equivalent code, but there is the potential that it may change the behaviour of your code\nit will overwrite the files that you ask it to run on however, so it is vital to be using version control\na good workflow here is to save your file, “stage” the changes to your file, then run {styler}. You can then revert back to the staged changed if needed." + "objectID": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data", + "href": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "Get the data", + "text": "Get the data\nIn this example we will work with a simplified set of data focused on invitations.\nThe invites table holds details of when people were sent a letter or message inviting them to take part, how many times they were invited and how the person responded.\nThe people eligible for the programme are identified up-front and are represented by a unique ID with one row per person. Let’s assume each person receives at least one invitation to take part, they can have one of three outcomes:\n\nThey accept the invitation and agree to take part,\nThey decline the invitation,\nThey do not respond to the invitation.\n\nIf the person doesn’t respond to the first invitation they may be sent a second invitation and could be offered a third invitation if they didn’t respond to the second.\nHere is the specification for our simplified invites table:\n\nInvites specification\n\n\n\n\n\n\n\nField\nType\nDescription\n\n\n\n\nParticipant ID\nInteger\nA unique identifier for each person.\n\n\nInvite date 1\nDate\nThe date the person was first invited to participate.\nEvery person will have a date in this field.\n\n\nInvite date 2\nDate\nThe date a second invitation was sent.\n\n\nInvite date 3\nDate\nThe date a third invitation was sent.\n\n\nInvite outcome\nText\nThe outcome from the invite, one of either ‘Accepted’, ‘Declined’ or ‘No response’.\n\n\n\nEveryone receives at least one invite. Assuming a third of these respond (to either accept or decline) then two-thirds receive a follow-up invite. Of these, we assume half respond, meaning the remaining participants receive a third invite.\nHere we generate 100 rows of example data to populate our table.\n\n\nCode\n# set a randomisation seed for reproducibility\nset.seed(seed = 1234)\n\n# define some parameters\nstart_date = as.Date('2019-01-01')\nend_date = as.Date('2021-01-01')\nrows = 100\n\ndf_invites_1 <- tibble(\n # create a unique id for each participant\n participant_id = 1:rows,\n \n # create a random initial invite date between our start and end dates\n invite_1_date = sample(\n seq(start_date, end_date, by = 'day'), \n size = rows, replace = T\n ),\n \n # create a random outcome for this participant\n invite_outcome = sample(\n x = c('Accepted', 'Declined', 'No response'),\n size = rows, replace = T\n )\n)\n\n# take a sample of participants and allocate them a second invite date\ndf_invites_2 <- df_invites_1 |>\n # sample two thirds of participants to get a second invite\n slice_sample(prop = 2/3) |> \n # allocate a date between 10 and 30 days following the first\n mutate(\n invite_2_date = invite_1_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_2_date)\n\n\n# take a sample of those with a second invite and allocate them a third invite date\ndf_invites_3 <- df_invites_2 |> \n # sample half of these to get a third invite\n slice_sample(prop = 1/2) |> \n # allocate a date between 10 to 30 days following the second\n mutate(\n invite_3_date = invite_2_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_3_date)\n\n# combine the 2nd and 3rd invites with the first table\ndf_invites <- df_invites_1 |> \n left_join(\n y = df_invites_2, \n by = 'participant_id'\n ) |> \n left_join(\n y = df_invites_3,\n by = 'participant_id'\n ) |> \n # move the outcome field after the third invite\n relocate(invite_outcome, .after = invite_3_date)\n\n# housekeeping\nrm(df_invites_1, df_invites_2, df_invites_3, start_date, end_date, rows)\n\n# view our data\ndf_invites |> \n reactable(defaultPageSize = 5)\n\n\n\n\nGenerated invite table" }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like", - "title": "Coffee and Coding", - "section": "What does {lintr} look like?", - "text": "What does {lintr} look like?\n\n\n\nsource: Good practice for writing R code and R packages\n\nrunning lintr can be done in the console, e.g.\n\nlintr::lintr_dir(\".\")\n\nor via the Addins menu" + "objectID": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes", + "href": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "Determine milestone outcomes", + "text": "Determine milestone outcomes\nThe next step is to take our source table and convert the data into a series of milestones (and associated outcomes) that represents how our invited participants moved through the pathway.\nIn our example we have five milestones to represent in our Sankey plot:\n\nOur eligible population (everyone in our invites table),\nThe result from the first invitation,\nThe result from the second invitation,\nThe result from the third invitation,\nThe overall invite outcome.\n\nAside from the eligible population, where everyone starts with the same value, participants will have one of several outcomes at each milestone. This step is about naming these milestones and the outcomes.\nIt is important that each milestone-outcome has unique values. An outcome of ‘No response’ can be recorded against the first, second and third invite, and we wish to see these outcomes separately represented on the Sankey (rather than just one ‘No response’), so each outcome must be made unique. In this example we prefix the outcome from each invite with the number of the invite, e.g. ‘Invite 1 No response’.\nThe reason for this will become clearer when we come to plot the Sankey, but for now we produce these milestone-outcomes from our invites table.\n\n\nCode\ndf_milestones <- df_invites |> \n mutate(\n # everyone starts in the eligible population\n start_population = 'Eligible population',\n \n # work out what happened following the first invite\n invite_1_outcome = case_when(\n # if a second invite was sent we assume there was no outcome from the first\n !is.na(invite_2_date) ~ 'Invitation 1 No response',\n # otherwise the overall outcome resulted from the first invite\n .default = glue('Invitation 1 {invite_outcome}')\n ),\n \n # work out what happened following the second invite\n invite_2_outcome = case_when(\n # if a third invite was sent we assume there was no outcome from the second\n !is.na(invite_3_date) ~ 'Invitation 2 No response',\n # if a second invite was sent but no third then\n !is.na(invite_2_date) ~ glue('Invitation 2 {invite_outcome}'),\n # default to NA if neither of the above are true\n .default = NA\n ),\n \n # work out what happened following the third invite\n invite_3_outcome = case_when(\n # if a third invite was sent then the outcome is the overall outcome\n !is.na(invite_3_date) ~ glue('Invitation 3 {invite_outcome}'),\n # otherwise mark as NA\n .default = NA\n )\n ) |> \n # exclude the dates as they are no longer needed\n select(-contains('_date')) |> \n # move the overall invite outcome to the end\n relocate(invite_outcome, .after = invite_3_outcome)\n\n# view our data\ndf_milestones |> \n reactable(defaultPageSize = 5)\n\n\n\n\nMilestone-outcomes for participants" }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler", - "title": "Coffee and Coding", - "section": "Using {styler}", - "text": "Using {styler}\n\nsource: Good practice for writing R code and R packages" + "objectID": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows", + "href": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "Calculate flows", + "text": "Calculate flows\nNext we take pairs of milestone-outcomes and calculate the number of participants that moved between them.\nHere we utilise the power of dplyr::summarise with an argument .by to group by our data before counting the number of unique participants who move between our start and end groups.\nFor invites 2 and 3 we perform two sets of summaries:\n\nThe first where the values in the to and from fields contain details.\nThe second to capture cases where the to destination is NULL. This is because the participant responded at the previous invite so there was no subsequent invite. In these cases we flow the participant to the overall invite outcome.2\n\n2 If you are thinking there is a lot of repetition here, you’re right. In practice I abstracted both steps to a function and passed in the parameters for the from and to variables and simplified my workflow a little, however, I’m showing it in plain form here for simplification.\n\nCode\ndf_flows <- bind_rows(\n \n # flow from population to invite 1\n df_milestones |> \n filter(!is.na(start_population) & !is.na(invite_1_outcome)) |> \n rename(from = start_population, to = invite_1_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to invite 2 (where not NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & !is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_2_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to overall invite outcome (where invite 2 is NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to invite 3 (where not NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & !is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_3_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to overall invite outcome (where invite 3 is NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # final flow - invite 3 to overall outcome (where both are not NA)\n df_milestones |> \n filter(!is.na(invite_3_outcome) & !is.na(invite_outcome)) |> \n rename(from = invite_3_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n )\n)\n\n# view our data\ndf_flows |> \n reactable(defaultPageSize = 5)\n\n\n\n\nFlows of participants between milestones" }, { - "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility", - "href": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility", - "title": "Coffee and Coding", - "section": "Further thoughts on improving code legibility", - "text": "Further thoughts on improving code legibility\n\ndo not let files grow too big\nbreak up logic into separate files, then you can use source(\"filename.R) to run the code in that file\nidealy, break up your logic into separate functions, each function having it’s own file, and then call those functions within your analysis\ndo not repeat yourself - if you are copying and pasting your code then you should be thinking about how to write a single function to handle this repeated logic\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" + "objectID": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly", + "href": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "Preparing for plotly", + "text": "Preparing for plotly\nPlotly expects to be fed two sets of data:\n\nNodes - these are the milestones we have in our from and to fields,\nEdges - these are the flows that occur between nodes, the flow in our table.\n\nIt is possible to extract this data by hand but I found using the tidygraph package was much easier and more convenient.\n\ndf_sankey <- df_flows |> \n # convert our flows data to a tidy graph object\n as_tbl_graph()\n\nThe tidygraph package splits our data into nodes and edges. We can selectively work on each by ‘activating’ them - here is the nodes list:\n\ndf_sankey |> \n activate(what = 'nodes') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nYou can see each unique node name listed. The row numbers for these nodes are used as reference IDs in the edges object:\n\ndf_sankey |> \n activate(what = 'edges') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nWe now have enough information to generate our Sankey.\nFirst we extract our nodes and edges to separate data frames then convert the ID values to be zero-based (starts at 0) as this is what plotly is expecting. To do this is as simple as subtracting 1 from the value of the IDs.\nFinally we pass these two dataframes to plotly’s node and link function inputs to generate the plot.\n\n\nCode\n# extract the nodes to a dataframe\nnodes <- df_sankey |> \n activate(nodes) |> \n data.frame() |> \n mutate(\n id = row_number() -1\n )\n\n# extract the edges to a dataframe\nedges <- df_sankey |> \n activate(edges) |> \n data.frame() |> \n mutate(\n from = from - 1,\n to = to - 1\n )\n\n# plot our sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name\n ),\n \n # use our link data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow\n )\n)\n\n\n\n\nOur first sankey\n\n\nNot bad!\nWe can see the structure of our Sankey now. Can you see the relative proportions of participants who did or didn’t respond to our first invite? Marvel at how those who responded to the first invite flow into our final outcome. How about those who didn’t respond to the first invitation go on to receive a second invite?\nPlotly’s charts are interactive. Try hovering your cursor over the nodes and edges to highlight them and a pop-up box will appear giving you additional details. You can reorder the vertical position of the nodes by dragging them above or below an adjacent node.\nThis looks functional." + }, + { + "objectID": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey", + "href": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey", + "title": "Visualising participant recruitment in R using Sankey plots", + "section": "Styling our Sankey", + "text": "Styling our Sankey\nNow we have the foundations of our Sankey I’d like to move on to its presentation. Specifically I’d like to:\n\nuse colour coding to clearly group those who accept or decline the invite,\nimprove the readability of the node titles,\nadd additional information to the pop-up boxes when you hover over nodes and edges, and\ncontrol the positioning of the nodes in the plot.\n\nAs our nodes and edges objects are dataframes it is straightforward to add this styling information directly to them.\nFor the nodes object we define colours based on the name of each node and manually position them in the plot\n\n\nCode\n# get the eligible population as a single value\n# NB, will be used to work out % amounts in each node and edge\ntemp_eligible_pop <- df_flows |> \n filter(from == 'Eligible population') |> \n summarise(total = sum(flow, na.rm = T)) |> \n pull(total)\n\n# style our nodes object\nnodes <- nodes |> \n mutate(\n # colour ----\n # add colour definitions, green for accepted, red for declined\n colour = case_when(\n str_detect(name, 'Accepted') ~ '#44bd32',\n str_detect(name, 'Declined') ~ '#c23616',\n str_detect(name, 'No response') ~ '#7f8fa6',\n str_detect(name, 'Eligible population') ~ '#7f8fa6'\n ),\n \n # add a semi-transparent colour for the edges based on node colours\n colour_fade = col2hcl(colour = colour, alpha = 0.3),\n \n # positioning ----\n # NB, I found that to position nodes you need to supply both\n # horizontal and vertical positions\n # NNB, it was a bit of trial and error to get the these positions just\n # right\n \n # horizontal positions (0 = left, 1 = right)\n x = case_when(\n str_detect(name, 'Eligible population') ~ 1,\n str_detect(name, 'Invitation 1') ~ 2,\n str_detect(name, 'Invitation 2') ~ 3,\n str_detect(name, 'Invitation 3') ~ 4,\n .default = 5\n ) |> rescale(to = c(0.001, 0.9)),\n \n # vertical position (1 = bottom, 0 = top)\n y = case_when(\n str_detect(name, 'Eligible population') ~ 5,\n # invite 1\n str_detect(name, 'Invitation 1 Accepted') ~ 1,\n str_detect(name, 'Invitation 1 No response') ~ 5,\n str_detect(name, 'Invitation 1 Declined') ~ 8.5,\n # invite 2\n str_detect(name, 'Invitation 2 Accepted') ~ 2,\n str_detect(name, 'Invitation 2 No response') ~ 5,\n str_detect(name, 'Invitation 2 Declined') ~ 7.8,\n # invite 3\n str_detect(name, 'Invitation 3 Accepted') ~ 2.7,\n str_detect(name, 'Invitation 3 No response') ~ 5.8,\n str_detect(name, 'Invitation 3 Declined') ~ 7.2,\n # final outcomes\n str_detect(name, 'Accepted') ~ 1,\n str_detect(name, 'No response') ~ 5,\n str_detect(name, 'Declined') ~ 8,\n .default = 5\n ) |> rescale(to = c(0.001, 0.999))\n ) |> \n # add in a custom field to show the percentage flow\n left_join(\n y = df_flows |> \n group_by(to) |> \n summarise(\n flow = sum(flow, na.rm = T),\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1),\n ) |> \n select(name = to, flow_perc),\n by = 'name'\n )\n\n# view our nodes data\nnodes |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the nodes dataframe\n\n\nNext we move to styling the edges, which is a much simpler prospect:\n\n\nCode\nedges <- edges |> \n mutate(\n # add a label for each flow to tell us how many people are in each\n label = number(flow, big.mark = ','),\n # add a percentage flow figure\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1)\n ) |> \n # add the faded colour from our nodes object to match the destinations\n left_join(\n y = nodes |> select(to = id, colour_fade),\n by = 'to'\n )\n\n# view our edges data\nedges |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the edges dataframe\n\n\nWe now have stylised node and edge tables ready and can bring it all together. Note the use of customdata and hovertemplate help to bring in additional information and styling to the pop-up boxes that appear when you hover over each flow and node.\n\n\nCode\n# plot our stylised sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name,\n color = nodes$colour,\n x = nodes$x,\n y = nodes$y,\n customdata = nodes$flow_perc,\n hovertemplate = '%{label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n ),\n \n # use our edge data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow,\n label = edges$label,\n color = edges$colour_fade,\n customdata = edges$flow_perc,\n hovertemplate = '%{source.label} → %{target.label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n )\n) |> \n layout(\n font = list(\n family = 'Arial, Helvetica, sans-serif',\n size = 12\n ),\n # make the background transparent (also removes the text shadow)\n paper_bgcolor = 'rgba(0,0,0,0)'\n ) |> \n config(responsive = T)\n\n\n\n\nA stylish Sankey" }, { "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#the-team", @@ -1107,620 +1247,634 @@ "text": "Questions?\n\nContact The Strategy Unit\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\nContact Me\n\n\n thomas.jemmett@nhs.net\n tomjemmett\n\n\n\n\n\nview slides at https://tinyurl.com/haca23nhp" }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html", - "href": "blogs/posts/2024-02-28_sankey_plot.html", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "", - "text": "Sankey diagrams are great tools to visualise flows through a system. They show connections between the steps of a process where the width of the arrows is proportional to the flow.\nI’m working on an evaluation of a risk screening process for people aged between 55-74 years and a history of smoking. In this Targeted Lung Health Check (TLHC) programme1 eligible people are invited to attend a free lung check where those assessed at high risk of lung cancer are then offered low-dose CT screening scans.\n1 Please visit the NHS England site for for more background.We used Sankey diagrams to visualise how people have engaged with the programme, from recruitment, attendance at appointments, their outcome from risk assessment, attendance at CT scans and will eventually be extended to cover the impact of the screening on early detection of those diagnosed with lung cancer.\nThis blog post is about the technical process of preparing record-level data for visualisation in a Sankey plot using R and customising it to enhance look and feel. Here is how the finished product will look:" + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read", + "title": "Coffee and Coding", + "section": "Which is easier to read?", + "text": "Which is easier to read?\n\nae_attendances |>\n filter(org_code %in% c(\"RNA\", \"RL4\")) |>\n mutate(performance = 1 + breaches / attendances) |>\n filter(type == 1) |>\n mutate(met_target = performance >= 0.95)\n\nor\n\nae_attendances |>\n filter(\n org_code %in% c(\"RNA\", \"RL4\"),\n type == 1\n ) |>\n mutate(\n performance = 1 + breaches / attendances,\n met_target = performance >= 0.95\n )\n\n\n spending a few seconds to neatly format your code can greatly improve the legibility to future readers, making the intent of the code far clearer, and will make finding bugs easier to spot.\n\n\n (have you spotted the mistake in the snippets above?)" + }, + { + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide", + "title": "Coffee and Coding", + "section": "Tidyverse Style Guide", + "text": "Tidyverse Style Guide\n\nGood coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread\n\n\nAll style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.\n\ntidyverse style guide" + }, + { + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends", + "title": "Coffee and Coding", + "section": "{lintr} + {styler} are your new best friends", + "text": "{lintr} + {styler} are your new best friends\n\n\n{lintr}\n\n{lintr} is a static code analysis tool that inspects your code (without running it)\nit checks for certain classes of errors (e.g. mismatched { and (’s)\nit warns about potential issues (e.g. using variables that aren’t defined)\nit warns about places where you are not adhering to the code style\n\n\n{styler}\n\n{styler} is an RStudio add in that automatically reformats your code, tidying it up to match the style guide\n99.9% of the time it will give you equivalent code, but there is the potential that it may change the behaviour of your code\nit will overwrite the files that you ask it to run on however, so it is vital to be using version control\na good workflow here is to save your file, “stage” the changes to your file, then run {styler}. You can then revert back to the staged changed if needed." }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data", - "href": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "Get the data", - "text": "Get the data\nIn this example we will work with a simplified set of data focused on invitations.\nThe invites table holds details of when people were sent a letter or message inviting them to take part, how many times they were invited and how the person responded.\nThe people eligible for the programme are identified up-front and are represented by a unique ID with one row per person. Let’s assume each person receives at least one invitation to take part, they can have one of three outcomes:\n\nThey accept the invitation and agree to take part,\nThey decline the invitation,\nThey do not respond to the invitation.\n\nIf the person doesn’t respond to the first invitation they may be sent a second invitation and could be offered a third invitation if they didn’t respond to the second.\nHere is the specification for our simplified invites table:\n\nInvites specification\n\n\n\n\n\n\n\nField\nType\nDescription\n\n\n\n\nParticipant ID\nInteger\nA unique identifier for each person.\n\n\nInvite date 1\nDate\nThe date the person was first invited to participate.\nEvery person will have a date in this field.\n\n\nInvite date 2\nDate\nThe date a second invitation was sent.\n\n\nInvite date 3\nDate\nThe date a third invitation was sent.\n\n\nInvite outcome\nText\nThe outcome from the invite, one of either ‘Accepted’, ‘Declined’ or ‘No response’.\n\n\n\nEveryone receives at least one invite. Assuming a third of these respond (to either accept or decline) then two-thirds receive a follow-up invite. Of these, we assume half respond, meaning the remaining participants receive a third invite.\nHere we generate 100 rows of example data to populate our table.\n\n\nCode\n# set a randomisation seed for reproducibility\nset.seed(seed = 1234)\n\n# define some parameters\nstart_date = as.Date('2019-01-01')\nend_date = as.Date('2021-01-01')\nrows = 100\n\ndf_invites_1 <- tibble(\n # create a unique id for each participant\n participant_id = 1:rows,\n \n # create a random initial invite date between our start and end dates\n invite_1_date = sample(\n seq(start_date, end_date, by = 'day'), \n size = rows, replace = T\n ),\n \n # create a random outcome for this participant\n invite_outcome = sample(\n x = c('Accepted', 'Declined', 'No response'),\n size = rows, replace = T\n )\n)\n\n# take a sample of participants and allocate them a second invite date\ndf_invites_2 <- df_invites_1 |>\n # sample two thirds of participants to get a second invite\n slice_sample(prop = 2/3) |> \n # allocate a date between 10 and 30 days following the first\n mutate(\n invite_2_date = invite_1_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_2_date)\n\n\n# take a sample of those with a second invite and allocate them a third invite date\ndf_invites_3 <- df_invites_2 |> \n # sample half of these to get a third invite\n slice_sample(prop = 1/2) |> \n # allocate a date between 10 to 30 days following the second\n mutate(\n invite_3_date = invite_2_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_3_date)\n\n# combine the 2nd and 3rd invites with the first table\ndf_invites <- df_invites_1 |> \n left_join(\n y = df_invites_2, \n by = 'participant_id'\n ) |> \n left_join(\n y = df_invites_3,\n by = 'participant_id'\n ) |> \n # move the outcome field after the third invite\n relocate(invite_outcome, .after = invite_3_date)\n\n# housekeeping\nrm(df_invites_1, df_invites_2, df_invites_3, start_date, end_date, rows)\n\n# view our data\ndf_invites |> \n reactable(defaultPageSize = 5)\n\n\n\n\nGenerated invite table" + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like", + "title": "Coffee and Coding", + "section": "What does {lintr} look like?", + "text": "What does {lintr} look like?\n\n\n\nsource: Good practice for writing R code and R packages\n\nrunning lintr can be done in the console, e.g.\n\nlintr::lintr_dir(\".\")\n\nor via the Addins menu" }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes", - "href": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "Determine milestone outcomes", - "text": "Determine milestone outcomes\nThe next step is to take our source table and convert the data into a series of milestones (and associated outcomes) that represents how our invited participants moved through the pathway.\nIn our example we have five milestones to represent in our Sankey plot:\n\nOur eligible population (everyone in our invites table),\nThe result from the first invitation,\nThe result from the second invitation,\nThe result from the third invitation,\nThe overall invite outcome.\n\nAside from the eligible population, where everyone starts with the same value, participants will have one of several outcomes at each milestone. This step is about naming these milestones and the outcomes.\nIt is important that each milestone-outcome has unique values. An outcome of ‘No response’ can be recorded against the first, second and third invite, and we wish to see these outcomes separately represented on the Sankey (rather than just one ‘No response’), so each outcome must be made unique. In this example we prefix the outcome from each invite with the number of the invite, e.g. ‘Invite 1 No response’.\nThe reason for this will become clearer when we come to plot the Sankey, but for now we produce these milestone-outcomes from our invites table.\n\n\nCode\ndf_milestones <- df_invites |> \n mutate(\n # everyone starts in the eligible population\n start_population = 'Eligible population',\n \n # work out what happened following the first invite\n invite_1_outcome = case_when(\n # if a second invite was sent we assume there was no outcome from the first\n !is.na(invite_2_date) ~ 'Invitation 1 No response',\n # otherwise the overall outcome resulted from the first invite\n .default = glue('Invitation 1 {invite_outcome}')\n ),\n \n # work out what happened following the second invite\n invite_2_outcome = case_when(\n # if a third invite was sent we assume there was no outcome from the second\n !is.na(invite_3_date) ~ 'Invitation 2 No response',\n # if a second invite was sent but no third then\n !is.na(invite_2_date) ~ glue('Invitation 2 {invite_outcome}'),\n # default to NA if neither of the above are true\n .default = NA\n ),\n \n # work out what happened following the third invite\n invite_3_outcome = case_when(\n # if a third invite was sent then the outcome is the overall outcome\n !is.na(invite_3_date) ~ glue('Invitation 3 {invite_outcome}'),\n # otherwise mark as NA\n .default = NA\n )\n ) |> \n # exclude the dates as they are no longer needed\n select(-contains('_date')) |> \n # move the overall invite outcome to the end\n relocate(invite_outcome, .after = invite_3_outcome)\n\n# view our data\ndf_milestones |> \n reactable(defaultPageSize = 5)\n\n\n\n\nMilestone-outcomes for participants" + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler", + "title": "Coffee and Coding", + "section": "Using {styler}", + "text": "Using {styler}\n\nsource: Good practice for writing R code and R packages" }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows", - "href": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "Calculate flows", - "text": "Calculate flows\nNext we take pairs of milestone-outcomes and calculate the number of participants that moved between them.\nHere we utilise the power of dplyr::summarise with an argument .by to group by our data before counting the number of unique participants who move between our start and end groups.\nFor invites 2 and 3 we perform two sets of summaries:\n\nThe first where the values in the to and from fields contain details.\nThe second to capture cases where the to destination is NULL. This is because the participant responded at the previous invite so there was no subsequent invite. In these cases we flow the participant to the overall invite outcome.2\n\n2 If you are thinking there is a lot of repetition here, you’re right. In practice I abstracted both steps to a function and passed in the parameters for the from and to variables and simplified my workflow a little, however, I’m showing it in plain form here for simplification.\n\nCode\ndf_flows <- bind_rows(\n \n # flow from population to invite 1\n df_milestones |> \n filter(!is.na(start_population) & !is.na(invite_1_outcome)) |> \n rename(from = start_population, to = invite_1_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to invite 2 (where not NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & !is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_2_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to overall invite outcome (where invite 2 is NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to invite 3 (where not NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & !is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_3_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to overall invite outcome (where invite 3 is NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # final flow - invite 3 to overall outcome (where both are not NA)\n df_milestones |> \n filter(!is.na(invite_3_outcome) & !is.na(invite_outcome)) |> \n rename(from = invite_3_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n )\n)\n\n# view our data\ndf_flows |> \n reactable(defaultPageSize = 5)\n\n\n\n\nFlows of participants between milestones" + "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility", + "href": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility", + "title": "Coffee and Coding", + "section": "Further thoughts on improving code legibility", + "text": "Further thoughts on improving code legibility\n\ndo not let files grow too big\nbreak up logic into separate files, then you can use source(\"filename.R) to run the code in that file\nidealy, break up your logic into separate functions, each function having it’s own file, and then call those functions within your analysis\ndo not repeat yourself - if you are copying and pasting your code then you should be thinking about how to write a single function to handle this repeated logic\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly", - "href": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "Preparing for plotly", - "text": "Preparing for plotly\nPlotly expects to be fed two sets of data:\n\nNodes - these are the milestones we have in our from and to fields,\nEdges - these are the flows that occur between nodes, the flow in our table.\n\nIt is possible to extract this data by hand but I found using the tidygraph package was much easier and more convenient.\n\ndf_sankey <- df_flows |> \n # convert our flows data to a tidy graph object\n as_tbl_graph()\n\nThe tidygraph package splits our data into nodes and edges. We can selectively work on each by ‘activating’ them - here is the nodes list:\n\ndf_sankey |> \n activate(what = 'nodes') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nYou can see each unique node name listed. The row numbers for these nodes are used as reference IDs in the edges object:\n\ndf_sankey |> \n activate(what = 'edges') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nWe now have enough information to generate our Sankey.\nFirst we extract our nodes and edges to separate data frames then convert the ID values to be zero-based (starts at 0) as this is what plotly is expecting. To do this is as simple as subtracting 1 from the value of the IDs.\nFinally we pass these two dataframes to plotly’s node and link function inputs to generate the plot.\n\n\nCode\n# extract the nodes to a dataframe\nnodes <- df_sankey |> \n activate(nodes) |> \n data.frame() |> \n mutate(\n id = row_number() -1\n )\n\n# extract the edges to a dataframe\nedges <- df_sankey |> \n activate(edges) |> \n data.frame() |> \n mutate(\n from = from - 1,\n to = to - 1\n )\n\n# plot our sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name\n ),\n \n # use our link data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow\n )\n)\n\n\n\n\nOur first sankey\n\n\nNot bad!\nWe can see the structure of our Sankey now. Can you see the relative proportions of participants who did or didn’t respond to our first invite? Marvel at how those who responded to the first invite flow into our final outcome. How about those who didn’t respond to the first invitation go on to receive a second invite?\nPlotly’s charts are interactive. Try hovering your cursor over the nodes and edges to highlight them and a pop-up box will appear giving you additional details. You can reorder the vertical position of the nodes by dragging them above or below an adjacent node.\nThis looks functional." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing", + "title": "Unit testing in R", + "section": "What is testing?", + "text": "What is testing?\n\nSoftware testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation\nwikipedia" }, { - "objectID": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey", - "href": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey", - "title": "Visualising participant recruitment in R using Sankey plots", - "section": "Styling our Sankey", - "text": "Styling our Sankey\nNow we have the foundations of our Sankey I’d like to move on to its presentation. Specifically I’d like to:\n\nuse colour coding to clearly group those who accept or decline the invite,\nimprove the readability of the node titles,\nadd additional information to the pop-up boxes when you hover over nodes and edges, and\ncontrol the positioning of the nodes in the plot.\n\nAs our nodes and edges objects are dataframes it is straightforward to add this styling information directly to them.\nFor the nodes object we define colours based on the name of each node and manually position them in the plot\n\n\nCode\n# get the eligible population as a single value\n# NB, will be used to work out % amounts in each node and edge\ntemp_eligible_pop <- df_flows |> \n filter(from == 'Eligible population') |> \n summarise(total = sum(flow, na.rm = T)) |> \n pull(total)\n\n# style our nodes object\nnodes <- nodes |> \n mutate(\n # colour ----\n # add colour definitions, green for accepted, red for declined\n colour = case_when(\n str_detect(name, 'Accepted') ~ '#44bd32',\n str_detect(name, 'Declined') ~ '#c23616',\n str_detect(name, 'No response') ~ '#7f8fa6',\n str_detect(name, 'Eligible population') ~ '#7f8fa6'\n ),\n \n # add a semi-transparent colour for the edges based on node colours\n colour_fade = col2hcl(colour = colour, alpha = 0.3),\n \n # positioning ----\n # NB, I found that to position nodes you need to supply both\n # horizontal and vertical positions\n # NNB, it was a bit of trial and error to get the these positions just\n # right\n \n # horizontal positions (0 = left, 1 = right)\n x = case_when(\n str_detect(name, 'Eligible population') ~ 1,\n str_detect(name, 'Invitation 1') ~ 2,\n str_detect(name, 'Invitation 2') ~ 3,\n str_detect(name, 'Invitation 3') ~ 4,\n .default = 5\n ) |> rescale(to = c(0.001, 0.9)),\n \n # vertical position (1 = bottom, 0 = top)\n y = case_when(\n str_detect(name, 'Eligible population') ~ 5,\n # invite 1\n str_detect(name, 'Invitation 1 Accepted') ~ 1,\n str_detect(name, 'Invitation 1 No response') ~ 5,\n str_detect(name, 'Invitation 1 Declined') ~ 8.5,\n # invite 2\n str_detect(name, 'Invitation 2 Accepted') ~ 2,\n str_detect(name, 'Invitation 2 No response') ~ 5,\n str_detect(name, 'Invitation 2 Declined') ~ 7.8,\n # invite 3\n str_detect(name, 'Invitation 3 Accepted') ~ 2.7,\n str_detect(name, 'Invitation 3 No response') ~ 5.8,\n str_detect(name, 'Invitation 3 Declined') ~ 7.2,\n # final outcomes\n str_detect(name, 'Accepted') ~ 1,\n str_detect(name, 'No response') ~ 5,\n str_detect(name, 'Declined') ~ 8,\n .default = 5\n ) |> rescale(to = c(0.001, 0.999))\n ) |> \n # add in a custom field to show the percentage flow\n left_join(\n y = df_flows |> \n group_by(to) |> \n summarise(\n flow = sum(flow, na.rm = T),\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1),\n ) |> \n select(name = to, flow_perc),\n by = 'name'\n )\n\n# view our nodes data\nnodes |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the nodes dataframe\n\n\nNext we move to styling the edges, which is a much simpler prospect:\n\n\nCode\nedges <- edges |> \n mutate(\n # add a label for each flow to tell us how many people are in each\n label = number(flow, big.mark = ','),\n # add a percentage flow figure\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1)\n ) |> \n # add the faded colour from our nodes object to match the destinations\n left_join(\n y = nodes |> select(to = id, colour_fade),\n by = 'to'\n )\n\n# view our edges data\nedges |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the edges dataframe\n\n\nWe now have stylised node and edge tables ready and can bring it all together. Note the use of customdata and hovertemplate help to bring in additional information and styling to the pop-up boxes that appear when you hover over each flow and node.\n\n\nCode\n# plot our stylised sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name,\n color = nodes$colour,\n x = nodes$x,\n y = nodes$y,\n customdata = nodes$flow_perc,\n hovertemplate = '%{label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n ),\n \n # use our edge data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow,\n label = edges$label,\n color = edges$colour_fade,\n customdata = edges$flow_perc,\n hovertemplate = '%{source.label} → %{target.label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n )\n) |> \n layout(\n font = list(\n family = 'Arial, Helvetica, sans-serif',\n size = 12\n ),\n # make the background transparent (also removes the text shadow)\n paper_bgcolor = 'rgba(0,0,0,0)'\n ) |> \n config(responsive = T)\n\n\n\n\nA stylish Sankey" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code", + "title": "Unit testing in R", + "section": "How can we test our code?", + "text": "How can we test our code?\n\n\nStatically\n\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\n\nDynamically" }, { - "objectID": "blogs/posts/2024-01-17_nearest_neighbour.html", - "href": "blogs/posts/2024-01-17_nearest_neighbour.html", - "title": "Nearest neighbour imputation", - "section": "", - "text": "Recently I have been gathering data by GP practice, from a variety of different sources. The ultimate purpose of my project is to be able to report at an ICB/sub-ICB level1. The various datasets cover different timescales and consequently changes in GP practices over time have left me with mismatching datasets.\n1 An ICB (Integrated Care Board) is a statutory NHS organisation responsible for planning health services for their local populationsMy approach has been to take as the basis of my project a recent GP List. Later in my project I want to perform calculations at a GP practice level based on an underlying health need and the data for this need is a CHD prevalence value from a dataset that is around 8 years old, and for which there is no update or alternative. From my recent list of 6454 practices, when I match to the need dataset, I am left with 151 practices without a value for need. If I remove these practices from the analysis then this could impact the analysis by sub-ICB since often a group of practices in the same area could be subject to changes, mergers and reorganisation.\nHere’s the packages and some demo objects to work with to create an example for two practices:\n\n\nCode\n# Packages\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(tidygeocoder)\nlibrary(leaflet)\nlibrary(viridisLite)\nlibrary(gt)\n\n# Create some data with two practices with no need data \n# and a selection of practices locally with need data\npractices <- tribble(\n ~practice_code, ~postcode, ~has_orig_need, ~value,\n \"P1\",\"CV1 4FS\", 0, NA,\n \"P2\",\"CV1 3GB\", 1, 7.3,\n \"P3\",\"CV11 5TW\", 1, 6.9,\n \"P4\",\"CV6 3HZ\", 1, 7.1,\n \"P5\",\"CV6 1HS\", 1, 7.7,\n \"P6\",\"CV6 5DF\", 1, 8.2,\n \"P7\",\"CV6 3FA\", 1, 7.9,\n \"P8\",\"CV1 2DL\", 1, 7.5,\n \"P9\",\"CV1 4JH\", 1, 7.7,\n \"P10\",\"CV10 0GQ\", 1, 7.5,\n \"P11\",\"CV10 0JH\", 1, 7.8,\n \"P12\",\"CV11 5QT\", 0, NA,\n \"P13\",\"CV11 6AB\", 1, 7.6,\n \"P14\",\"CV6 4DD\", 1,7.9\n) \n\n# get domain of numeric data\n(domain <- range(practices$has_orig_need))\n\n# make a colour palette\npal <- colorNumeric(palette = viridis(2), domain = domain)\n\n\nTo provide a suitable estimate of need for the newer practices without values, all the practices in the dataset were geocoded2 using the geocode function from the {tidygeocoder} package.\n2 Geocoding is the process of converting addresses (often the postcode) into geographic coordinates (such as latitude and longitude) that can be plotted on a map.\npractices <- practices |>\n mutate(id = row_number()) |>\n geocode(postalcode = postcode) |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\n\nCode\npractices |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nhas_orig_need\nvalue\nid\ngeometry\n\n\n\n\nP1\nCV1 4FS\n0\nNA\n1\nc(-1.50686326666667, 52.4141089666667)\n\n\nP2\nCV1 3GB\n1\n7.3\n2\nc(-1.51888, 52.4034199)\n\n\nP3\nCV11 5TW\n1\n6.9\n3\nc(-1.46746, 52.519)\n\n\nP4\nCV6 3HZ\n1\n7.1\n4\nc(-1.52231, 52.42367)\n\n\nP5\nCV6 1HS\n1\n7.7\n5\nc(-1.52542, 52.41989)\n\n\nP6\nCV6 5DF\n1\n8.2\n6\nc(-1.498344825, 52.4250186)\n\n\nP7\nCV6 3FA\n1\n7.9\n7\nc(-1.51787, 52.43135)\n\n\nP8\nCV1 2DL\n1\n7.5\n8\nc(-1.49105, 52.40582)\n\n\nP9\nCV1 4JH\n1\n7.7\n9\nc(-1.50653, 52.41953)\n\n\nP10\nCV10 0GQ\n1\n7.5\n10\nc(-1.52197, 52.54074)\n\n\nP11\nCV10 0JH\n1\n7.8\n11\nc(-1.5163199, 52.53723)\n\n\nP12\nCV11 5QT\n0\nNA\n12\nc(-1.46927, 52.51899)\n\n\nP13\nCV11 6AB\n1\n7.6\n13\nc(-1.45822, 52.52682)\n\n\nP14\nCV6 4DD\n1\n7.9\n14\nc(-1.50832, 52.44104)\n\n\n\n\n\n\n\nThis map shows the practices, purple are the practices with no need data and yellow are practices with need data available.\n\n\nCode\n# make map to display practices\nleaflet(practices) |> \n addTiles() |>\n addCircleMarkers(color = ~pal(has_orig_need)) \n\n\n\n\n\n\nThe data was split into those with, and without, a value for need. Using st_join from the {sf} package to join those without, and those with, a value for need, using the geometry to find all those within 1500m (1.5km).\n\nno_need <- practices |>\n filter(has_orig_need == 0)\n\nwith_need <- practices |>\n filter(has_orig_need == 1)\n\n\nneighbours <- no_need |>\n select(no_need_postcode = postcode,no_need_prac_code=practice_code) |>\n st_join(with_need, st_is_within_distance, 1500) |>\n st_drop_geometry() |>\n select(id, no_need_postcode,no_need_prac_code) |>\n inner_join(x = with_need, by = join_by(\"id\")) \n\n\n\nCode\nleaflet(neighbours) |> \n addTiles() |>\n addCircleMarkers(color = \"purple\") |>\n addMarkers( -1.50686326666667, 52.4141089666667, popup = \"Practice with no data\"\n) |>\n addCircles(-1.50686326666667, 52.4141089666667,radius=1500) |>\n addMarkers(-1.46927, 52.51899, popup = \"Practice with no data\"\n) |>\naddCircles(-1.46927, 52.51899,radius=1500)\n\n\n\n\n\n\nThe data for the “neighbours” was grouped by the practice code of those without need data and a mean value was calculated for each practice to generate an estimated value.\n\nneighbours_estimate <- neighbours |>\n group_by(no_need_prac_code) |>\n summarise(need_est=mean(value)) |>\n st_drop_geometry(select(no_need_prac_code,need_est)) \n\nThe original data was joined back to the “neighbours”.\n\n practices_with_neighbours_estimate <- practices |>\n left_join(neighbours_estimate, join_by(practice_code==no_need_prac_code)) |>\n st_drop_geometry(select(practice_code,need_est))\n\n\n\nCode\n practices_with_neighbours_estimate |>\n select(-has_orig_need,-id) |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nvalue\nneed_est\n\n\n\n\nP1\nCV1 4FS\nNA\n7.583333\n\n\nP2\nCV1 3GB\n7.3\nNA\n\n\nP3\nCV11 5TW\n6.9\nNA\n\n\nP4\nCV6 3HZ\n7.1\nNA\n\n\nP5\nCV6 1HS\n7.7\nNA\n\n\nP6\nCV6 5DF\n8.2\nNA\n\n\nP7\nCV6 3FA\n7.9\nNA\n\n\nP8\nCV1 2DL\n7.5\nNA\n\n\nP9\nCV1 4JH\n7.7\nNA\n\n\nP10\nCV10 0GQ\n7.5\nNA\n\n\nP11\nCV10 0JH\n7.8\nNA\n\n\nP12\nCV11 5QT\nNA\n7.250000\n\n\nP13\nCV11 6AB\n7.6\nNA\n\n\nP14\nCV6 4DD\n7.9\nNA\n\n\n\n\n\n\n\nFinally, an updated data frame was created of the need data using the actual need for the practice where available, otherwise using estimated need.\n\npractices_with_neighbours_estimate <- practices_with_neighbours_estimate |>\n mutate(need_to_use = case_when(value>=0 ~ value,\n .default = need_est)) |>\n select(practice_code,need_to_use) \n\n\n\n\n\n\n\n\n\npractice_code\nneed_to_use\n\n\n\n\nP1\n7.583333\n\n\nP2\n7.300000\n\n\nP3\n6.900000\n\n\nP4\n7.100000\n\n\nP5\n7.700000\n\n\nP6\n8.200000\n\n\nP7\n7.900000\n\n\nP8\n7.500000\n\n\nP9\n7.700000\n\n\nP10\n7.500000\n\n\nP11\n7.800000\n\n\nP12\n7.250000\n\n\nP13\n7.600000\n\n\nP14\n7.900000\n\n\n\n\n\n\n\nFor my project, this method has successfully generated a prevalence for 125 of the 151 practices without a need value, leaving just 26 practices without a need. This is using a 1.5 km radius. In each use case there will be a decision to make regarding a more accurate estimate (smaller radius) and therefore fewer matches versus a less accurate estimate (using a larger radius) and therefore more matches.\nThis approach could be replicated for other similar uses/purposes. A topical example from an SU project is the need to assign population prevalence for hypertension and compare it to current QOF3 data. Again, the prevalence data is a few years old so we have to move the historical data to fit with current practices and this leaves missing data that can be estimated using this method.\n\n\n3 QOF (Quality and Outcomes Framework) is a voluntary annual reward and incentive programme for all GP practices in England, detailing practice achievement results." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1", + "title": "Unit testing in R", + "section": "How can we test our code?", + "text": "How can we test our code?\n\n\nStatically\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\nDynamically\n\n\n(by executing the code)\nsplit into functional and non-functional testing\ntesting can be manual, or automated\n\n\n\n\n\nnon-functional testing covers things like performance, security, and usability testing" }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html", - "href": "blogs/posts/2024-05-22-storing-data-safely/index.html", - "title": "Storing data safely", - "section": "", - "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests", + "title": "Unit testing in R", + "section": "Different types of functional tests", + "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\n\n\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements." }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding", - "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding", - "title": "Storing data safely", - "section": "", - "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1", + "title": "Unit testing in R", + "section": "Different types of functional tests", + "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nUnit, Integration, and E2E testing are all things we can automate in code, whereas UAT testing is going to be manual" }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins", - "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins", - "title": "Storing data safely", - "section": "Posit Connect Pins", - "text": "Posit Connect Pins\n\n# A brief intro to using {pins} to store, version, share and protect a dataset\n# on Posit Connect. Documentation: https://pins.rstudio.com/\n\n\n# Setup -------------------------------------------------------------------\n\n\ninstall.packages(c(\"pins\",\"dplyr\")) # if not yet installed\n\nsuppressPackageStartupMessages({\n library(pins)\n library(dplyr) # for wrangling and the 'starwars' demo dataset\n})\n\nboard <- board_connect() # will error if you haven't authenticated before\n# Error in `check_auth()`: ! auth = `auto` has failed to find a way to authenticate:\n# • `server` and `key` not provided for `auth = 'manual'`\n# • Can't find CONNECT_SERVER and CONNECT_API_KEY envvars for `auth = 'envvar'`\n# • rsconnect package not installed for `auth = 'rsconnect'`\n# Run `rlang::last_trace()` to see where the error occurred.\n\n# To authenticate\n# In RStudio: Tools > Global Options > Publishing > Connect... > Posit Connect\n# public URL of the Strategy Unit Posit Connect Server: connect.strategyunitwm.nhs.uk\n# Your browser will open to the Posit Connect web page and you're prompted to\n# for your password. Enter it and you'll be authenticated.\n\n# Once authenticated\nboard <- board_connect()\n# Connecting to Posit Connect 2024.03.0 at\n# <https://connect.strategyunitwm.nhs.uk>\n\nboard |> pin_list() # see all the pins on that board\n\n\n# Create a pin ------------------------------------------------------------\n\n\n# Write a dataset to the board as a pin\nboard |> pin_write(\n x = starwars,\n name = \"starwars_demo\"\n)\n# Guessing `type = 'rds'`\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_exists(\"starwars_demo\")\n# ! Use a fully specified name including user name: \"matt.dray/starwars_demo\",\n# not \"starwars_demo\".\n# [1] TRUE\n\npin_name <- \"matt.dray/starwars_demo\"\n\nboard |> pin_exists(pin_name) # logical, TRUE/FALSE\nboard |> pin_meta(pin_name) # metadata, see also 'metadata' arg in pin_write()\nboard |> pin_browse(pin_name) # view the pin in the browser\n\n\n# Permissions -------------------------------------------------------------\n\n\n# You can let people see and edit a pin. Log into Posit Connect and select the\n# pin under 'Content'. In the 'Settings' panel on the right-hand side, adjust\n# the 'sharing' options in the 'Access' tab.\n\n\n# Overwrite and version ---------------------------------------------------\n\n\nstarwars_droids <- starwars |>\n filter(species == \"Droid\") # beep boop\n\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_versions(pin_name) # see version history\nboard |> pin_versions_prune(pin_name, n = 1) # remove history\nboard |> pin_versions(pin_name)\n\n# What if you try to overwrite the data but it hasn't changed?\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# ! The hash of pin \"matt.dray/starwars_demo\" has not changed.\n# • Your pin will not be stored.\n\n\n# Use the pin -------------------------------------------------------------\n\n\n# You can read a pin to your local machine, or access it from a Quarto file\n# or Shiny app hosted on Connect, for example. If the output and the pin are\n# both on Connect, no authentication is required; the board is defaulted to\n# the Posit Connect instance where they're both hosted.\n\nboard |>\n pin_read(pin_name) |> # like you would use e.g. read_csv\n with(data = _, plot(mass, height)) # wow!\n\n\n# Delete pin --------------------------------------------------------------\n\n\nboard |> pin_exists(pin_name) # logical, good function for error handling\nboard |> pin_delete(pin_name)\nboard |> pin_exists(pin_name)" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2", + "title": "Unit testing in R", + "section": "Different types of functional tests", + "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nOnly focussing on unit testing in this talk, but the techniques/packages could be extended to integration testing. Often other tools (potentially specific tools) are needed for E2E testing." }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r", - "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r", - "title": "Storing data safely", - "section": "Azure Storage in R", - "text": "Azure Storage in R\nYou will need an .Renviron file with the four environment variables listed below for the code to work. This .Renviron file should be ignored by git. You can share the contents of .Renviron files with other team members via Teams, email, or Sharepoint.\nBelow is a sample .Renviron file\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nAZ_STORAGE_CONTAINER=container-name\nAZ_TENANT_ID=long-sequence-of-numbers-and-letters\nAZ_APP_ID=another-long-sequence-of-numbers-and-letters\n\ninstall.packages(c(\"AzureAuth\",\"AzureStor\", \"arrow\")) # if not yet installed\n\n# Load all environment variables\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")\n\n# Authenticate\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\n\n# If you have not authenticated before, you will be taken to an external page to\n# authenticate!Use your mlcsu.nhs.uk account.\n\n# Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\n\n# If you get a 403 error when trying to interact with the container, you may \n# have to clear your Azure token and re-authenticate using a different browser.\n# Use AzureAuth::clean_token_directory() to clear your token, then repeat the\n# AzureAuth::get_azure_token() step above.\n\n# Upload specific file to container\nAzureStor::storage_upload(container, \"data/ronald.jpeg\", \"newdir/ronald.jpeg\")\n\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(container, \"data/*\", \"newdir\")\n\n# Check files have uploaded\nblob_list <- AzureStor::list_blobs(container)\n\n# Load file directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by temporarily downloading file \n# and storing it in memory)\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\nparq_df <- arrow::read_parquet(parquet_in_memory)\n\n# Delete from Azure container (!!!)\nfor (blobfile in blob_list$name) {\n AzureStor::delete_storage_file(container, blobfile)\n}" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example", + "title": "Unit testing in R", + "section": "Example", + "text": "Example\nWe have a {shiny} app which grabs some data from a database, manipulates the data, and generates a plot.\n\n\nwe would write unit tests to check the data manipulation and plot functions work correctly (with pre-created sample/simple datasets)\nwe would write integration tests to check that the data manipulation function works with the plot function (with similar data to what we used for the unit tests)\nwe would write e2e tests to ensure that from start to finish the app grabs the data and produces a plot as required\n\n\n\nsimple (unit tests) to complex (e2e tests)" }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python", - "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python", - "title": "Storing data safely", - "section": "Azure Storage in Python", - "text": "Azure Storage in Python\nThis will use the same environment variables as the R version, just stored in a .env file instead.\nWe didn’t cover this in the presentation, so it’s not in the slides, but the code should be self-explanatory.\n\n\nimport os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid", + "title": "Unit testing in R", + "section": "Testing Pyramid", + "text": "Testing Pyramid\n\n\nImage source: The Testing Pyramid: Simplified for One and All headspin.io" }, { - "objectID": "blogs/posts/2024-05-13_one-year-coffee-code.html", - "href": "blogs/posts/2024-05-13_one-year-coffee-code.html", - "title": "One year of coffee & coding", - "section": "", - "text": "The data science team have been running coffee & coding sessions for just over a year now. When I joined that Strategy Unit, I was really pleased to see these sessions running as I think making time to discuss and share technical knowledge is highly valuable, especially as an organisation grows.\nCoffee and coding sessions run every two weeks and usually take the form of a short presentation, followed by a discussion. Although we have had a variety of different sessions including live coding demos and show and tell for projects.\nWe figured it would be a good idea to do a quick survey of attendees to make sure that the sessions were beneficial and see if there were any suggestions for future sessions. We had 11 responses, all of which were really positive, with 90% agreeing that the sessions are interesting, and over 80% saying that they learn new things. Respondents said that the sessions were well varied across the technical spectrum and that they “almost always learn something useful”.\nThe two main themes of the results were that sessions were inclusive and sparked collaboration. ✨\n\nI like that everyone can contribute\n\n\nIt’s great seeing what else people are doing\n\n\nI get more ideas for future projects\n\nSome of the main suggestions included more content for newer programmers and encouraging the wider analytical team to share real project examples.\nSo with that, why not consider presenting? The sessions are informal and everyone is welcome to contribute. If you’ve got something to share, please let a member of the data science team know.\nAs a reminder, materials for our previous sessions are available under Presentations." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function", + "title": "Unit testing in R", + "section": "Let’s create a simple function…", + "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}" }, { - "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html", - "href": "blogs/posts/2023-03-24_hotfix-with-git.html", - "title": "Creating a hotfix with git", - "section": "", - "text": "I recently discovered a bug in a code-base which needed to be fixed and deployed back to production A.S.A.P., but since the last release the code has moved on significantly. The history looks something a bit like:\nThat is, we have a tag which is the code that is currently in production (which we need to patch), a number of commits after that tag to main (which were separate branches merged via pull requests), and a current development branch.\nI need to somehow: 1. go back to the tagged release, 2. check that code out, 3. patch that code, 4. commit this change, but insert the commit before all of the new commits after the tag\nThere are at least two ways that I know to do this, one would be with an interactive rebase, but I used a slightly longer method, but one I feel is a little less likely to get wrong.\nBelow are the step’s that I took. One thing I should note is this worked well for my particular issue because the change didn’t cause any merge conflicts later on." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1", + "title": "Unit testing in R", + "section": "Let’s create a simple function…", + "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}" }, { - "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase", - "href": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase", - "title": "Creating a hotfix with git", - "section": "Fixing my codebase", - "text": "Fixing my codebase\nFirst, we need to checkout the tag\ngit checkout -b hotfix v0.2.0\nThis creates a new branch called hotfix off of the tag v0.2.0.\nNow that I have the code base checked out at the point I need to fix, I can make the change that is needed, and commit the change\ngit add [FILENAME]\ngit commit -m \"fixes the code\"\n(Obviously, I used the actual file name and gave a better commit message. I Promise 😝)\nNow my code is fixed, I create a new tag for this “release”, as well as push the code to production (this step is omitted here)\ngit tag v0.2.1 -m \"version 0.2.0\"\nAt this point, our history looks something like\n\n\n\n\n\n\n\n\n\nWhat we want to do is break the link between main and v0.2.0, instead attaching tov0.2.1. First though, I want to make sure that if I make a mistake, I’m not making it on the main branch.\ngit checkout main\ngit checkout -b apply-hotfix\nThen we can fix our history using the rebase command\ngit rebase hotfix\nWhat this does is it rolls back to the point where the branch that we are rebasing (apply-hotfix) and the hotfix branch both share a common commit (v0.2.0 tag). It then applies the commits in the hotfix branch, before reapplying the commits from apply-hotfix (a.k.a. the main branch).\nOne thing to note, if you have any merge conflicts created by your fix, then the rebase will stop and ask you to fix the merge conflicts. There is some information in the GitHub doc’s for [resolving merge conflicts after a Git rebase][2].\n[2]: https://docs.github.com/en/get-started/using-git/resolving-merge-conflicts-after-a-git-rebase\nAt this point, we can check that the commit history looks correct\ngit log v0.2.0..HEAD\nIf we are happy, then we can apply this to the main branch. I do this by renaming the apply-hotfix branch as main. First, you have to delete the main branch to allow us to rename the branch.\ngit branch -D main\ngit branch -m main\nWe also need to update the other branches to use the new main branch\ngit checkout branch\ngit rebase main\nNow, we should have a history like" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2", + "title": "Unit testing in R", + "section": "Let’s create a simple function…", + "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}\n\n\nThe Ten Rules of Defensive Programming in R" }, { - "objectID": "about.html", - "href": "about.html", - "title": "About", - "section": "", - "text": "The Data Science team at the Strategy Unit comprises the following team members:\n\nChris Beeley\nMatt Dray\nOzayr Mohammed\nRhian Davies\nTom Jemmett\nYiWen Hon\n\nCurrent and previous projects of note include:\n\nWork supporting the New Hospitals Programme, including building a model for predicting the demand and capacity requirements of hospitals in the future, and a tool for mapping the evidence on this topic.\nThe Patient Experience Qualitative Data Categorisation project\nWork supporting the wider analytical community, through events/communities such as NHS-R and HACA." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" }, { - "objectID": "blogs/index.html", - "href": "blogs/index.html", - "title": "Data Science Blog", - "section": "", - "text": "Storing data safely\n\n\n\n\n\n\nlearning\n\n\nR\n\n\nPython\n\n\n\n\n\n\n\n\n\nMay 22, 2024\n\n\nYiWen Hon, Matt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nOne year of coffee & coding\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nMay 13, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nRStudio Tips and Tricks\n\n\n\n\n\n\nlearning\n\n\nR\n\n\n\n\n\n\n\n\n\nMar 21, 2024\n\n\nMatt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nVisualising participant recruitment in R using Sankey plots\n\n\n\n\n\n\nlearning\n\n\ntutorial\n\n\nvisualisation\n\n\nR\n\n\n\n\n\n\n\n\n\nFeb 28, 2024\n\n\nCraig Parylo\n\n\n\n\n\n\n\n\n\n\n\n\nNearest neighbour imputation\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 17, 2024\n\n\nJacqueline Grout\n\n\n\n\n\n\n\n\n\n\n\n\nAdvent of Code and Test Driven Development\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 10, 2024\n\n\nYiWen Hon\n\n\n\n\n\n\n\n\n\n\n\n\nReinstalling R Packages\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nAlternative remote repositories\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nCreating a hotfix with git\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nMar 24, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\nNo matching items" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" }, { - "objectID": "blogs/posts/2023-04-26-reinstalling-r-packages.html", - "href": "blogs/posts/2023-04-26-reinstalling-r-packages.html", - "title": "Reinstalling R Packages", - "section": "", - "text": "R 4.3.0 was released last week. Anytime you update R you will probably find yourself in the position where no packages are installed. This is by design - the packages that you have installed may need to be updated and recompiled to work under new versions of R.\nYou may find yourself wanting to have all of the packages that you previously used, so one approach that some people take is to copy the previous library folder to the new versions folder. This isn’t a good idea and could potentially break your R install.\nAnother approach would be to export the list of packages in R before updating and then using that list after you have updated R. This can cause issues though if you install from places other than CRAN, e.g. bioconductor, or from GitHub.\nSome of these approaches are discussed on the RStudio Community Forum. But I prefer an approach of having a “spring clean”, instead only installing the packages that I know that I need.\nI maintain a list of the packages that I used as a gist. Using this, I can then simply run this script on any new R install. In fact, if you click the “raw” button on the gist, and copy that url, you can simply run\nsource(\"https://gist.githubusercontent.com/tomjemmett/c105d3e0fbea7558088f68c65e68e1ed/raw/a1db4b5fa0d24562d16d3f57fe8c25fb0d8aa53e/setup.R\")\nGenerally, sourcing a url is a bad idea - the reason for this is if it’s not a link that you control, then someone could update the contents and run arbritary code on your machine. In this case, I’m happy to run this as it’s my own gist, but you should be mindful if running it yourself!\nIf you look at the script I first install a number of packages from CRAN, then I install packages that only exist on GitHub." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html", - "title": "RStudio Tips and Tricks", - "section": "", - "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding", - "title": "RStudio Tips and Tricks", - "section": "", - "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance", - "title": "RStudio Tips and Tricks", - "section": "Official guidance", - "text": "Official guidance\nPosit is the company who build and maintain RStudio. They host a number of cheatsheets on their website, including one for RStudio. They also have a more in-depth user guide." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5", + "title": "Unit testing in R", + "section": "… and create our first test", + "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})\n\nTest passed 😸" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette", - "title": "RStudio Tips and Tricks", - "section": "Command palette", - "text": "Command palette\nRStudio has a powerful built-in Command Palette, which is a special search box that gives instant access to features and settings without needing to find them in the menus. Many of the tips and tricks we discussed can be found by searching in the Palette. Open it with the keyboard shortcut Ctrl + Shift + P.\n\n\n\nOpening the Command Palette.\n\n\nFor example, let’s say you forgot how to restart R. If you open the Command Palette and start typing ‘restart’, you’ll see the option ‘Restart R Session’. Clicking it will do exactly that. Handily, the Palette also displays the keyboard shortcut (Control + Shift + F10 on Windows) as a reminder.\nAs for settings, a search for ‘rainbow’ in the Command Palette will find ‘Use rainbow parentheses’, an option to help prevent bracket-mismatch errors by colouring pairs of parentheses. What’s nice is that the checkbox to toggle the feature appears right there in the palette so you can change it immediately.\nI refer to menu paths and keyboard shortcuts in the rest of this post, but bear in mind that you can use the Command Palette instead." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions", + "title": "Unit testing in R", + "section": "other expect_*() functions…", + "text": "other expect_*() functions…\n\ntest_that(\"my_function correctly divides values\", {\n expect_lt(\n my_function(4, 2),\n 10\n )\n expect_gt(\n my_function(1, 4),\n 0.2\n )\n expect_length(\n my_function(c(4, 1), c(2, 4)),\n 2\n )\n})\n\nTest passed 🎉\n\n\n\n{testthat} documentation" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#options", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#options", - "title": "RStudio Tips and Tricks", - "section": "Options", - "text": "Options\nIn general, most settings can be found under Tools > Global Options… and many of these are discussed in the rest of this post.\n\n\n\nAdjusting workspace and history settings.\n\n\nBut there’s a few settings in particular that we recommend you change to help maximise reproducibility and reduce the chance of confusion. Under General > Basic, uncheck ‘Restore .Rdata into workspace at startup’ and select ‘Never’ from the dropdown options next to ‘Save workspace to .Rdata on exit’. These options mean you start with the ‘blank slate’ of an empty environment when you open a project, allowing you to rebuild objects from scratch1." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert", + "title": "Unit testing in R", + "section": "Arrange, Act, Assert", + "text": "Arrange, Act, Assert\n\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n # \n #\n #\n\n # act\n #\n\n # assert\n #\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts", - "title": "RStudio Tips and Tricks", - "section": "Keyboard shortcuts", - "text": "Keyboard shortcuts\nYou can speed up day-to-day coding with keyboard shortcuts instead of clicking buttons in the interface.\nYou can see some available shortcuts in RStudio if you navigate to Help > Keyboard Shortcuts Help, or use the shortcut Alt + Shift + K (how meta). You can go to Help > Modify Keyboard Shortcuts… to search all shortcuts and change them to what you prefer2.\nWe discussed a number of handy shortcuts that we use frequently3. You can:\n\nre-indent lines to the appropriate depth with Control + I\nreformat code with Control + Shift + A\nturn one or more lines into a comment with Control + Shift + C\ninsert the pipe operator (%>% or |>4) with Control + Shift + M5\ninsert the assignment arrow (<-) with Alt + - (hyphen)\nhighlight a function in the script or console and press F1 to open the function documentation in the ‘Help’ pane\nuse ‘Find in Files’ to search for a particular variable, function or string across all the files in your project, with Control + Shift + F" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1", + "title": "Unit testing in R", + "section": "Arrange, Act, Assert", + "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\n\n\nto create sample values\ncreate fake/temporary files\nset random seed\nset R options/environment variables\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n #\n\n # assert\n #\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes", - "title": "RStudio Tips and Tricks", - "section": "Themes", - "text": "Themes\nYou can change a number of settings to alter RStudio’s theme, colours and fonts to whatever you desire.\nYou can change the default theme in Tools > Global Options… > Appearance > Editor theme and select one from the pre-installed list. You can upload new themes by clicking the ‘Add’ button and selecting a theme from your computer. They typically have the file extension .rsthemes and can be downloaded from the web, or you can create or tweak one yourself. The {rsthemes} package has a number of options and also allows you to switch between themes and automatically switch between light and dark themes depending on the time of day.\n\n\n\nCustomising the appearance and font.\n\n\nIn the same ‘Appearance’ submenu as the theme settings, you can find an option to change fonts. Monospace fonts, ones where each character takes up the same width, will appear here automatically if you’ve installed them on your computer. One popular font for coding is Fira Code, which has the special property of converting certain sets of characters into ‘ligatures’, which some people find easier to read. For example, the base pipe will appear as a rightward-pointing arrow rather than its constituent vertical-pipe and greater-than symbol (|>)." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2", + "title": "Unit testing in R", + "section": "Arrange, Act, Assert", + "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n #\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes", - "title": "RStudio Tips and Tricks", - "section": "Panes", - "text": "Panes\n\nLayout\nThe structural layout of RStudio’s panes can be adjusted. One simple thing you can do is minimise and maximise each pane by clicking the window icons in their upper-right corners. This is useful when you want more screen real-estate for a particular pane.\nYou can move pane loations too. Click the ‘Workspace Panes’ button (a square with four more inside it) at the top of the IDE to see a number of settings. For example, you can select ‘Console on the right’ to move the R console to the upper-right pane, which you may prefer for maximimsing the vertical space in which code is shown. You could also click Pane Layout… in this menu to be taken to Tools > Global Options… > Pane layout, where you can click ‘Add Column’ to insert new script panes that allow you to inspect and write multiple files side-by-side.\n\n\nScript navigation\nThe script pane in particular has a nice feature for navigating through sections of your script or Quarto/R Markdown files. Click the ‘Show Document Outline’ button or use the keyboard shortcut Control + Shift + O to slide open a tray that provides a nice indented list of all the sections and function defintions in your file.\nSection headers are auto-detected in a Quarto or R Markdown document wherever the Markdown header markup has been used: one hashmark (#) for a level 1 header, two for level 2, and so on. To add section headers to an R Script, add at least four hyphens after a commented line that starts with #. Use two or more hashes at the start of the comment to increase the nestedness of that section.\n\n# Header ------------------------------------------------------------------\n\n## Section ----\n\n### Subsection ----\n\nNote that Ctrl + Shift + R will open a dialog box for you to input the name of a section header, which will be inserted and automatically padded to 75 characters to provide a strong visual cue between sections.\nAs well as the document outline, there’s also a reminder in the lower-left of the script pane that gives the name of the section that your cursor is currently in. A symbol is also shown: a hashmark means it’s a headed section and an ‘f’ means it’s a function definition. You can click this to jump to other sections.\n\n\n\nNavigating with headers in the R script pane.\n\n\n\n\nBackground jobs\nPerhaps an under-used pane is ‘Background jobs’. This is where you can run a separate R process that keeps your R console free. Go to Tools > Background Jobs > Start Background Job… to expose this tab if it isn’t already listed alongside the R console.\nWhy might you want to do this? As I write this post, there’s a background process to detect changes to the Quarto document that I’m writing and then update a preview I have running in the browser. You can do something similar for Shiny apps. You can continue to develop your app and test things in the console and the app preview will update on save. You won’t need to keep hitting the ‘Render’ or ‘Run app’ button every time you make a change." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3", + "title": "Unit testing in R", + "section": "Arrange, Act, Assert", + "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\nwe assert that the actual results match our expected results\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand", - "title": "RStudio Tips and Tricks", - "section": "Magic wand", - "text": "Magic wand\nThere’s a miscellany of useful tools available when you click the ‘magic wand’ button in the script pane.\n\n\n\nAbracadabra! Casting open the ‘magic wand’ menu.\n\n\nThis includes:\n\n‘Rename in Scope’, which is like find-and-replace but you only change instances with the same ‘scope’, so you could select the variable x, go to Rename in Scope and then you can edit all instances of the variable in the document and change them at the same time (e.g. to rename them)\n‘Reflow Comment’, which you can click after higlighting a comments block to have the comments automatically line-break at the maximum width\n‘Insert Roxygen Skeleton’, which you can click when your cursor is inside the body of a function you’ve written and a {roxygen2} documentation template will be added above your function with the @params argument names pre-filled\n\nAlong with ‘Comment/Uncomment Lines’, ‘Reindent Lines’ and ‘Reformat Lines’, mentioned above in the keyboard shortcuts section." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed", + "title": "Unit testing in R", + "section": "Our test failed!?! 😢", + "text": "Our test failed!?! 😢\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})\n\n── Failure: my_function works ──────────────────────────────────────────────────\n`actual` not equal to `expected`.\n1/1 mismatches\n[1] 0.714 - 0.714 == 7.14e-07\n\n\nError:\n! Test failed" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up", - "title": "RStudio Tips and Tricks", - "section": "Wrapping up", - "text": "Wrapping up\nTime was limited in our discussion. There are so many more tips and tricks that we didn’t get to. Let us know what we missed, or what your favourite shortcuts and settings are." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue", + "title": "Unit testing in R", + "section": "Tolerance to the rescue 🙂", + "text": "Tolerance to the rescue 🙂\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected, tolerance = 1e-6)\n})\n\nTest passed 🎊\n\n\n\n(this is a slightly artificial example, usually the default tolerance is good enough)" }, { - "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes", - "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes", - "title": "RStudio Tips and Tricks", - "section": "Footnotes", - "text": "Footnotes\n\n\nFor the same reason it’s a good idea to restart R on a frequent basis. You may assume that an object x in your environment was made in a certain way and contains certain information, but does it? What if you overwrote it at some point and forgot? Best to wipe the slate clean and rebuild it from scratch. Jenny Bryan has written an explainer.↩︎\nYou can ‘snap focus’ to the script and console panes with the pre-existing shortcuts Control + 1 and Control + 2. My next most-used pane is the terminal, so I’ve re-mapped the shortcut to Control + 3.↩︎\nThe classic shortcuts of select-all (Control + A), cut (Control + X), copy Control + C, paste (Control + V), undo (Control + Z) and redo (Control + Shift + Z) are all available when editing.↩︎\nNote that you can set the default pipe to the base-R version (|>) by checking the box at Tools > Global Options… > Code > Use native pipe operator↩︎\nProbably ‘M’ for {magrittr}, the name of the package that contains the %>% incarnation of the operator.↩︎" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases", + "title": "Unit testing in R", + "section": "Testing edge cases", + "text": "Testing edge cases\n\n\nRemember the validation steps we built into our function to handle edge cases?\n\nLet’s write tests for these edge cases:\nwe expect errors\n\n\ntest_that(\"my_function works\", {\n expect_error(my_function(5, 0))\n expect_error(my_function(\"a\", 3))\n expect_error(my_function(3, \"a\"))\n expect_error(my_function(1:2, 4))\n})\n\nTest passed 🎊" }, { - "objectID": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html", - "href": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html", - "title": "Data Science @ The Strategy Unit", - "section": "", - "text": "import os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example", + "title": "Unit testing in R", + "section": "Another (simple) example", + "text": "Another (simple) example\n\n\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\nConsider this function - there is branched logic, so we need to carefully design tests to validate the logic works as intended." }, { - "objectID": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html", - "href": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html", - "title": "Advent of Code and Test Driven Development", - "section": "", - "text": "Advent of Code is an annual event, where daily coding puzzles are released from 1st – 24th December. We ran one of our fortnightly Coffee & Coding sessions introducing Advent of Code to people who code in the Strategy Unit, as well as the concept of test-driven development as a potential way of approaching the puzzles.\nTest-driven development (TDD) is an approach to coding which involves writing the test for a function BEFORE we write the function. This might seem quite counterintuitive, but it makes it easier to identify bugs 🐛 when they are introduced to our code, and ensures that our functions meet all necessary criteria. From my experience, this takes quite a long time to implement and can be quite tedious, but it is definitely worth it overall, especially as your project develops. Testing is also recommended in the NHS Reproducible Analytical Pipeline (RAP) guidelines.\nAn interesting thing to note about TDD is that we’re always expecting our first test to fail, and indeed failing tests are useful and important! If we wrote tests that just passed all the time, this would not be useful at all for our code.\nThe way that Advent of Code is structured, with test data for each puzzle and an expected test result, makes it very amenable to a test-driven approach. In order to support this, Matt and I created template repositories for a test-driven approach to Advent of Code, in Python and in R.\nOur goal when setting this up was to introduce others in the Strategy Unit to both TDD and Advent of Code. Advent of code can be challenging and I personally struggle to get past the first week, but it encourages creative (and maybe even fun?!) approaches to coding problems. I’m glad that we had the chance to explore some of the puzzles together in Coffee & Coding – it was interesting to see so many different approaches to the same problem, and hopefully it also gave us all the chance to practice writing tests." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1", + "title": "Unit testing in R", + "section": "Another (simple) example", + "text": "Another (simple) example\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\n\ntest_that(\"it returns 'x' if x is bigger than y\", {\n expect_equal(my_new_function(4, 3), \"x\")\n})\n\nTest passed 🎉\n\ntest_that(\"it returns 'y' if y is bigger than x\", {\n expect_equal(my_new_function(3, 4), \"y\")\n expect_equal(my_new_function(3, 3), \"y\")\n})\n\nTest passed 🥳" }, { - "objectID": "blogs/posts/2023-04-26_alternative_remotes.html", - "href": "blogs/posts/2023-04-26_alternative_remotes.html", - "title": "Alternative remote repositories", - "section": "", - "text": "It’s great when someone send’s you a pull request on GitHub to fix bugs or add new features to your project, but you probably always want to check the other persons work in someway before merging that pull request.\nAll of the steps below are intended to be entered via a terminal.\nLet’s imagine that we have a GitHub account called example and a repository called test, and we use https rather than ssh.\n$ git remote get-url origin\n# https://github.com/example/test.git\nNow, let’s say we have someone who has submitted a Pull Request (PR), and their username is friend. We can add a new remote for their fork with\n$ git remote add friend https://github.com/friend/test.git\nHere, I name the remote exactly as per the persons GitHub username for no other reason than making it easier to track things later on. You could name this remote whatever you like, but you will need to make sure that the remote url matches their repository correctly.\nWe are now able to checkout their remote branch. First, we will want to fetch their work:\n# make sure to replace the remote name to what you set it to before\n$ git fetch friend\nNow, hopefully they have commited to a branch with a name that you haven’t used. Let’s say they created a branch called my_work. You can then simply run\n$ git switch friend/my_work\nThis should checkout the my_work branch locally for you.\nNow, if they have happened to use a branch name that you are already using, or more likely, directly commited to their own main branch, you will need to do checkout to a new branch:\n# replace friend as above to be the name of the remote, and main to be the branch\n# that they have used\n# replace their_work with whatever you want to call this branch locally\n$ git checkout friend/main -b their_work\nYou are now ready to run their code and check everything is good to merge!\nFinally, If you want to clean up your local repository you can remove the new branch that you checked out and the new remote with the following steps:\n# switch back to one of your branches, e.g. main\n$ git checkout main\n\n# then remove the branch that you created above\n$ git branch -D their_work\n\n# you can remove the remote\n$ git remote remove friend" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests", + "title": "Unit testing in R", + "section": "How to design good tests", + "text": "How to design good tests\na non-exhaustive list\n\nconsider all the functions arguments,\nwhat are the expected values for these arguments?\nwhat are unexpected values, and are they handled?\nare there edge cases that need to be handled?\nhave you covered all of the different paths in your code?\nhave you managed to create tests that check the range of results you expect?" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#why", - "href": "presentations/2024-05-16_store-data-safely/index.html#why", - "title": "Store Data Safely", - "section": "Why?", - "text": "Why?\nBecause:\n\ndata may be sensitive\nGitHub was designed for source control of code\nGitHub has repository file-size limits\nit makes data independent from code\nit prevents repetition" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests", + "title": "Unit testing in R", + "section": "But, why create tests?", + "text": "But, why create tests?\nanother non-exhaustive list\n\ngood tests will help you uncover existing issues in your code\nwill defend you from future changes that break existing functionality\nwill alert you to changes in dependencies that may have changed the functionality of your code\ncan act as documentation for other developers" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#other-approaches", - "href": "presentations/2024-05-16_store-data-safely/index.html#other-approaches", - "title": "Store Data Safely", - "section": "Other approaches", - "text": "Other approaches\nTo prevent data commits:\n\nuse a .gitignore file (*.csv, etc)\nuse Git hooks\navoid ‘add all’ (git add .) when staging\nensure thorough reviews of (small) pull-requests" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions", + "title": "Unit testing in R", + "section": "Testing complex functions", + "text": "Testing complex functions\n\n\n\nmy_big_function <- function(type) {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n df <- tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n\n conditions <- read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date) |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}\n\n\nWhere do you even begin to start writing tests for something so complex?\n\n\nNote: to get the code on the left to fit on one page, I skipped including a few library calls\n\nlibrary(tidyverse)\nlibrary(DBI)" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data", - "href": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data", - "title": "Store Data Safely", - "section": "What if I committed data?", - "text": "What if I committed data?\n‘It depends’, but if it’s sensitive:\n\n‘undo’ the commit with git reset\nuse a tool like BFG to expunge the file from Git history\ndelete the repo and restart 🔥\n\nA data security breach may have to be reported." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions", + "title": "Unit testing in R", + "section": "Split the logic into smaller functions", + "text": "Split the logic into smaller functions\nFunction to get the data from the database\n\nget_data_from_sql <- function() {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n}" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions", - "href": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions", - "title": "Store Data Safely", - "section": "Data-hosting solutions", - "text": "Data-hosting solutions\nWe’ll talk about two main options for The Strategy Unit:\n\nPosit Connect and the {pins} package\nAzure Data Storage\n\nWhich to use? It depends." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1", + "title": "Unit testing in R", + "section": "Split the logic into smaller functions", + "text": "Split the logic into smaller functions\nFunction to get the relevant conditions\n\nget_conditions <- function(type) {\n read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n}" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit", - "href": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit", - "title": "Store Data Safely", - "section": "A platform by Posit", - "text": "A platform by Posit\n\n\nhttps://connect.strategyunitwm.nhs.uk/" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2", + "title": "Unit testing in R", + "section": "Split the logic into smaller functions", + "text": "Split the logic into smaller functions\nFunction to combine the data and create a count by date\n\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit", - "href": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit", - "title": "Store Data Safely", - "section": "A package by Posit", - "text": "A package by Posit\n\n\nhttps://pins.rstudio.com/" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3", + "title": "Unit testing in R", + "section": "Split the logic into smaller functions", + "text": "Split the logic into smaller functions\nFunction to generate a plot from the summarised data\n\ncreate_plot <- function(df) {\n df |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#basic-approach", - "href": "presentations/2024-05-16_store-data-safely/index.html#basic-approach", - "title": "Store Data Safely", - "section": "Basic approach", - "text": "Basic approach\ninstall.packages(\"pins\")\nlibrary(pins)\n\nboard_connect()\npin_write(board, data, \"pin_name\")\npin_read(board, \"user_name/pin_name\")" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4", + "title": "Unit testing in R", + "section": "Split the logic into smaller functions", + "text": "Split the logic into smaller functions\nThe original function refactored to use the new functions\n\nmy_big_function <- function(type) {\n conditions <- get_conditions(type)\n\n get_data_from_sql() |>\n summarise_data(conditions) |>\n create_plot()\n}\n\n\nThis is going to be significantly easier to test, because we now can verify that the individual components work correctly, rather than having to consider all of the possibilities at once." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#live-demo", - "href": "presentations/2024-05-16_store-data-safely/index.html#live-demo", - "title": "Store Data Safely", - "section": "Live demo", - "text": "Live demo\n\nLink RStudio to Posit Connect (authenticate)\nConnect to the board\nWrite a new pin\nCheck pin status and details\nPin versions\nUse pinned data\nUnpin your pin" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it", - "href": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it", - "title": "Store Data Safely", - "section": "Should I use it?", - "text": "Should I use it?\n\n\n⚠️ {pins} is not great because:\n\nyou should not upload sensitive data!\nthere’s a file-size upload limit\npin organisation is a bit awkward (no subfolders)\n\n\n{pins} is helpful because:\n\nauthentication is straightforward\ndata can be versioned\nyou can control permissions\nthere are R and Python versions of the package" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\ntest_that(\"it summarises the data\", {\n # arrange\n \n\n\n\n\n\n\n \n\n \n # act\n \n # assert\n \n})" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage", - "href": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage", - "title": "Store Data Safely", - "section": "What is Azure Data Storage?", - "text": "What is Azure Data Storage?\nMicrosoft cloud storage for unstructured data or ‘blobs’ (Binary Large Objects): data objects in binary form that do not necessarily conform to any file format.\nHow is it different?\n\nNo hierarchy – although you can make pseudo-‘folders’ with the blobnames.\nAuthenticates with your Microsoft account." + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n \n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nGenerate some random data to build a reasonably sized data frame.\nYou could also create a table manually, but part of the trick of writing good tests for this function is to make it so the dates don’t all have the same count.\nThe reason for this is it’s harder to know for sure that the count worked if every row returns the same value.\nWe don’t need the values to be exactly like they are in the real data, just close enough. Instead of dates, we can use numbers, and instead of actual conditions, we can use letters." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage", - "href": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage", - "title": "Store Data Safely", - "section": "Authenticating to Azure Data Storage", - "text": "Authenticating to Azure Data Storage\n\nYou are all part of the “strategy-unit-analysts” group; this gives you read/write access to specific Azure storage containers.\nYou can store sensitive information like the container ID in a local .Renviron or .env file that should be ignored by git.\nUsing {AzureAuth}, {AzureStor} and your credentials, you can connect to the Azure storage container, upload files and download them, or read the files directly from storage!" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nTests need to be reproducible, and generating our table at random will give us unpredictable results.\nSo, we need to set the random seed; now every time this test runs we will generate the same data." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables", - "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables", - "title": "Store Data Safely", - "section": "Step 1: load your environment variables", - "text": "Step 1: load your environment variables\nStore sensitive info in an .Renviron file that’s kept out of your Git history! The info can then be loaded in your script.\n.Renviron:\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nScript:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\nTip: reload .Renviron with readRenviron(\".Renviron\")" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n\n\n\n # act\n \n # assert\n \n})\n\nCreate the conditions table. We don’t need all of the columns that are present in the real csv, just the ones that will make our code work.\nWe also need to test that the filtering join (semi_join) is working, so we want to use a subset of the conditions that were used in df." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1", - "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1", - "title": "Store Data Safely", - "section": "Step 1: load your environment variables", - "text": "Step 1: load your environment variables\nIn the demo script we are providing, you will need these environment variables:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n \n\n \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nBecause we are generating df randomly, to figure out what our “expected” results are, I simply ran the code inside of the test to generate the “actual” results.\nGenerally, this isn’t a good idea. You are creating the results of your test from the code; ideally, you want to be thinking about what the results of your function should be.\nImagine your function doesn’t work as intended, there is some subtle bug that you are not yet aware of. By writing tests “backwards” you may write test cases that confirm the results, but not expose the bug. This is why it’s good to think about edge cases." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure", - "href": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure", - "title": "Store Data Safely", - "section": "Step 2: Authenticate with Azure", - "text": "Step 2: Authenticate with Azure\n\n\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\nThe first time you do this, you will have link to authenticate in your browser and a code in your terminal to enter. Use the browser that works best with your @mlcsu.nhs.uk account!" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n ) \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nThat said, in cases where we can be confident (say by static analysis of our code) that it is correct, building tests in this way will give us the confidence going forwards that future changes do not break existing functionality.\nIn this case, I have created the expected data frame using the results from running the function." }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container", - "href": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container", - "title": "Store Data Safely", - "section": "Step 3: Connect to container", - "text": "Step 3: Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\nIf you get 403 error, delete your token and re-authenticate, try a different browser/incognito, etc.\nTo clear Azure tokens: AzureAuth::clean_token_directory()" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7", + "title": "Unit testing in R", + "section": "Let’s test summarise_data", + "text": "Let’s test summarise_data\n\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\"))\n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n )\n # act\n actual <- summarise_data(df, conditions)\n # assert\n expect_equal(actual, expected)\n})\n\nTest passed 😸\n\n\n\nThe test works!" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container", - "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container", - "title": "Store Data Safely", - "section": "Interact with the container", - "text": "Interact with the container\nIt’s possible to interact with the container via your browser!\nYou can upload and download files using the Graphical User Interface (GUI), login with your @mlcsu.nhs.uk account: https://portal.azure.com/#home\nAlthough it’s also cooler to interact via code… 😎" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps", + "title": "Unit testing in R", + "section": "Next steps", + "text": "Next steps\n\nYou can add tests to any R project (to test functions),\nBut {testthat} works best with Packages\nThe R Packages book has 3 chapters on testing\nThere are two useful helper functions in {usethis}\n\nuse_testthat() will set up the folders for test scripts\nuse_test() will create a test file for the currently open script" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1", - "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1", - "title": "Store Data Safely", - "section": "Interact with the container", - "text": "Interact with the container\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(\n container,\n \"LOCAL_FOLDERNAME/*\",\n \"FOLDERNAME_ON_AZURE\"\n)\n\n# Upload specific file to container\nAzureStor::storage_upload(\n container,\n \"data/ronald.jpeg\",\n \"newdir/ronald.jpeg\"\n)" + "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1", + "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1", + "title": "Unit testing in R", + "section": "Next steps", + "text": "Next steps\n\nIf your test needs to temporarily create a file, or change some R-options, the {withr} package has a lot of useful functions that will automatically clean things up when the test finishes\nIf you are writing tests that involve calling out to a database, or you want to test my_big_function (from before) without calling the intermediate functions, then you should look at the {mockery} package" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container", - "href": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container", - "title": "Store Data Safely", - "section": "Load csv files directly from Azure container", - "text": "Load csv files directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by storing it in memory)\n\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\n\nparq_df <- arrow::read_parquet(parquet_in_memory)" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today", + "title": "Coffee and Coding", + "section": "Packages we are using today", + "text": "Packages we are using today\n\nlibrary(tidyverse)\n\nlibrary(sf)\n\nlibrary(tidygeocoder)\nlibrary(PostcodesioR)\n\nlibrary(osrm)\n\nlibrary(leaflet)" }, { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2", - "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2", - "title": "Store Data Safely", - "section": "Interact with the container", - "text": "Interact with the container\n# Delete from Azure container (!!!)\nAzureStor::delete_storage_file(container, BLOB_NAME)" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data", + "title": "Coffee and Coding", + "section": "Getting boundary data", + "text": "Getting boundary data\nWe can use the ONS’s Geoportal we can grab boundary data to generate maps\n\n\n\nicb_url <- paste0(\n \"https://services1.arcgis.com\",\n \"/ESMARspQHYMw9BZ9/arcgis\",\n \"/rest/services\",\n \"/Integrated_Care_Boards_April_2023_EN_BGC\",\n \"/FeatureServer/0/query\",\n \"?outFields=*&where=1%3D1&f=geojson\"\n)\nicb_boundaries <- read_sf(icb_url)\n\nicb_boundaries |>\n ggplot() +\n geom_sf() +\n theme_void()" }, - { - "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve", - "href": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve", - "title": "Store Data Safely", - "section": "What does this achieve?", - "text": "What does this achieve?\n\nData is not in the repository, it is instead stored in a secure location\nCode can be open – sensitive information like Azure container name stored as environment variables\nLarge filesizes possible, other people can also access the same container.\nNaming conventions can help to keep blobs organised (these create pseudo-folders)\n\n\n\n\nLearn more about Data Science at The Strategy Unit" + { + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data", + "title": "Coffee and Coding", + "section": "What is the icb_boundaries data?", + "text": "What is the icb_boundaries data?\n\nicb_boundaries |>\n select(ICB23CD, ICB23NM)\n\nSimple feature collection with 42 features and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -6.418667 ymin: 49.86479 xmax: 1.763706 ymax: 55.81112\nGeodetic CRS: WGS 84\n# A tibble: 42 × 3\n ICB23CD ICB23NM geometry\n <chr> <chr> <MULTIPOLYGON [°]>\n 1 E54000008 NHS Cheshire and Merseyside Integrated C… (((-3.083264 53.2559, -3…\n 2 E54000010 NHS Staffordshire and Stoke-on-Trent Int… (((-1.950489 53.21188, -…\n 3 E54000011 NHS Shropshire, Telford and Wrekin Integ… (((-2.380794 52.99841, -…\n 4 E54000013 NHS Lincolnshire Integrated Care Board (((0.2687853 52.81584, 0…\n 5 E54000015 NHS Leicester, Leicestershire and Rutlan… (((-0.7875237 52.97762, …\n 6 E54000018 NHS Coventry and Warwickshire Integrated… (((-1.577608 52.67858, -…\n 7 E54000019 NHS Herefordshire and Worcestershire Int… (((-2.272042 52.43972, -…\n 8 E54000022 NHS Norfolk and Waveney Integrated Care … (((1.666741 52.31366, 1.…\n 9 E54000023 NHS Suffolk and North East Essex Integra… (((0.8997023 51.7732, 0.…\n10 E54000024 NHS Bedfordshire, Luton and Milton Keyne… (((-0.4577115 52.32009, …\n# ℹ 32 more rows" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines", - "title": "System Dynamics in health and care", - "section": "Health Data in the Headlines", - "text": "Health Data in the Headlines\n\n\n\n\nUsed to seeing headlines that give a snapshot figure but doesn’t say much about the system.\nNow starting to see headlines that recognise flow through the system rather than snapshot in time of just one part.\nCan get better understanding of the issues in a system if we can map it as stocks and flows, but our datasets not designed to give up this information very readily. This talk is how I have tried to meet that challenge." + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes", + "title": "Coffee and Coding", + "section": "Working with geospatial dataframes", + "text": "Working with geospatial dataframes\nWe can simply join sf data frames and “regular” data frames together\n\n\n\nicb_metrics <- icb_boundaries |>\n st_drop_geometry() |>\n select(ICB23CD) |>\n mutate(admissions = rpois(n(), 1000000))\n\nicb_boundaries |>\n inner_join(icb_metrics, by = \"ICB23CD\") |>\n ggplot() +\n geom_sf(aes(fill = admissions)) +\n scale_fill_viridis_c() +\n theme_void()" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens", - "title": "System Dynamics in health and care", - "section": "Through the System Dynamics lens", - "text": "Through the System Dynamics lens\n\nStock-flow model\nDynamic behaviour, feedback loops\n\nIn a few seconds, what is SD?\nAn approach to understanding the behaviour of complex systems over time. A method of mapping a system as stocks, whose levels can only change due to flows in and flows out. Stocks could be people on a waiting list, on a ward, money, …\nFlows are the rate at which things change in a given time period e.g. admissions per day, referrals per month.\nBehaviour of the system is determined by how the components interact with each other, not what each component does. Mapping the structure of a system like this leads us to identify feedback loops, and consequences of an action - both intended and unintended.\nIn this capacity-constrained model we only need 3 parameters to run the model (exogenous). All the behaviour within the grey box is determined by the interactions of those components (indogenous).\nHow do we get a value/values for referrals per day?\n(currently use specialist software to build and run our models, aim is to get to a point where we can run in open source.)" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames", + "title": "Coffee and Coding", + "section": "Working with geospatial data frames", + "text": "Working with geospatial data frames\nWe can manipulate sf objects like other data frames\n\n\n\nlondon_icbs <- icb_boundaries |>\n filter(ICB23NM |> stringr::str_detect(\"London\"))\n\nggplot() +\n geom_sf(data = london_icbs) +\n geom_sf(data = st_centroid(london_icbs)) +\n theme_void()" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows", - "title": "System Dynamics in health and care", - "section": "Determining flows", - "text": "Determining flows\n\n\n\n\n‘admissions per day’ is needed to populate the model.\n‘discharged’ could be used to verify the model against known data\n\nHow many admissions per day (or week, month…)\n\n\n\n\n\n\n\n \n\n\nGoing to use very simple model shown to explain how to extract flow data for admissions. Will start with visual explainer before going into the code.\n1. generate list of key dates (in this case daily, could be weekly, monthly)\n2. take our patient-level ID with admission and discharge dates\n3. count of admissions on that day/week" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1", + "title": "Coffee and Coding", + "section": "Working with geospatial data frames", + "text": "Working with geospatial data frames\nSummarising the data will combine the geometries.\n\nlondon_icbs |>\n summarise(area = sum(Shape__Area)) |>\n # and use geospatial functions to create calculations using the geometry\n mutate(new_area = st_area(geometry), .before = \"geometry\")\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -0.5102803 ymin: 51.28676 xmax: 0.3340241 ymax: 51.69188\nGeodetic CRS: WGS 84\n# A tibble: 1 × 3\n area new_area geometry\n* <dbl> [m^2] <MULTIPOLYGON [°]>\n1 1573336388. 1567995610. (((-0.3314819 51.43935, -0.3306676 51.43889, -0.33118…\n\n\n Why the difference in area?\n\n We are using a simplified geometry, so calculating the area will be slightly inaccurate. The original area was calculated on the non-simplified geometries." }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy", - "title": "System Dynamics in health and care", - "section": "Determining occupancy", - "text": "Determining occupancy\n\n\n\n\n‘on ward’ is used to verify the model against known data\n\nLogic statement testing if the key date is wholly between admission and discharge dates\nflag for a match \n\n\n\n\n\n\n \n\n\nMight also want to generate occupancy, to compare the model output with actual data to verify/validate.\n1. generate list of key dates\n2. take our patient-level ID with admission and discharge dates\n3. going to take each date in our list of keydates, and see if there is an admission before that date and discharge after 4. this creates a wide data frame, the same length as patient data.\n5. once run through all the dates in the list, sum each column\nPatient A admitted on 2nd, so only starts being classed as resident on 3rd." + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data", + "title": "Coffee and Coding", + "section": "Creating our own geospatial data", + "text": "Creating our own geospatial data\n\nlocation_raw <- postcode_lookup(\"B2 4BJ\")\nglimpse(location_raw)\n\nRows: 1\nColumns: 40\n$ postcode <chr> \"B2 4BJ\"\n$ quality <int> 1\n$ eastings <int> 406866\n$ northings <int> 286775\n$ country <chr> \"England\"\n$ nhs_ha <chr> \"West Midlands\"\n$ longitude <dbl> -1.90033\n$ latitude <dbl> 52.47887\n$ european_electoral_region <chr> \"West Midlands\"\n$ primary_care_trust <chr> \"Heart of Birmingham Teaching\"\n$ region <chr> \"West Midlands\"\n$ lsoa <chr> \"Birmingham 138A\"\n$ msoa <chr> \"Birmingham 138\"\n$ incode <chr> \"4BJ\"\n$ outcode <chr> \"B2\"\n$ parliamentary_constituency <chr> \"Birmingham, Ladywood\"\n$ parliamentary_constituency_2024 <chr> \"Birmingham Ladywood\"\n$ admin_district <chr> \"Birmingham\"\n$ parish <chr> \"Birmingham, unparished area\"\n$ admin_county <lgl> NA\n$ date_of_introduction <chr> \"198001\"\n$ admin_ward <chr> \"Ladywood\"\n$ ced <lgl> NA\n$ ccg <chr> \"NHS Birmingham and Solihull\"\n$ nuts <chr> \"Birmingham\"\n$ pfa <chr> \"West Midlands\"\n$ admin_district_code <chr> \"E08000025\"\n$ admin_county_code <chr> \"E99999999\"\n$ admin_ward_code <chr> \"E05011151\"\n$ parish_code <chr> \"E43000250\"\n$ parliamentary_constituency_code <chr> \"E14000564\"\n$ parliamentary_constituency_2024_code <chr> \"E14001096\"\n$ ccg_code <chr> \"E38000258\"\n$ ccg_id_code <chr> \"15E\"\n$ ced_code <chr> \"E99999999\"\n$ nuts_code <chr> \"TLG31\"\n$ lsoa_code <chr> \"E01033620\"\n$ msoa_code <chr> \"E02006899\"\n$ lau2_code <chr> \"E08000025\"\n$ pfa_code <chr> \"E23000014\"\n\n\n\n\n\nlocation <- location_raw |>\n st_as_sf(coords = c(\"eastings\", \"northings\"), crs = 27700) |>\n select(postcode, ccg) |>\n st_transform(crs = 4326)\n\nlocation\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.900335 ymin: 52.47886 xmax: -1.900335 ymax: 52.47886\nGeodetic CRS: WGS 84\n postcode ccg geometry\n1 B2 4BJ NHS Birmingham and Solihull POINT (-1.900335 52.47886)" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows", - "title": "System Dynamics in health and care", - "section": "in R - flows", - "text": "in R - flows\nEasy to do with count, or group_by and summarise\n\n\n admit_d <- spell_dates |> \n group_by(date_admit) |>\n count(date_admit)\n\nhead(admit_d)\n\n\n# A tibble: 6 × 2\n# Groups: date_admit [6]\n date_admit n\n <date> <int>\n1 2022-01-01 28\n2 2022-01-02 24\n3 2022-01-03 21\n4 2022-01-04 27\n5 2022-01-05 32\n6 2022-01-06 27" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts", + "title": "Coffee and Coding", + "section": "Creating a geospatial data frame for all NHS Trusts", + "text": "Creating a geospatial data frame for all NHS Trusts\n\n\n\n# using the NHSRtools package\n# remotes::install_github(\"NHS-R-Community/NHSRtools\")\ntrusts <- ods_get_trusts() |>\n filter(status == \"Active\") |>\n select(name, org_id, post_code) |>\n geocode(postalcode = \"post_code\") |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\ntrusts |>\n leaflet() |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers(popup = ~name)" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy", - "title": "System Dynamics in health and care", - "section": "in R - occupancy", - "text": "in R - occupancy\nGenerate list of key dates\n\n\n\ndate_start <- dmy(01012022) \ndate_end <- dmy(31012022)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"day\"))\n\nkeydates <- data.frame(\n date = c(seq(date_start, by = \"day\", length.out=run_len))) \n\n\n\n\n date\n1 2022-01-01\n2 2022-01-02\n3 2022-01-03\n4 2022-01-04\n5 2022-01-05\n6 2022-01-06\n\n\n\n\nStart by generating the list of keydates. In this example we’re running the model in days, and checking each day in 2022.\nNeed the run length for the next step, to know how many times to iterate over" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location", + "title": "Coffee and Coding", + "section": "What are the nearest trusts to our location?", + "text": "What are the nearest trusts to our location?\n\nnearest_trusts <- trusts |>\n mutate(\n distance = st_distance(geometry, location)[, 1]\n ) |>\n arrange(distance) |>\n head(5)\n\nnearest_trusts\n\nSimple feature collection with 5 features and 4 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.9384 ymin: 52.4533 xmax: -1.886282 ymax: 52.48764\nGeodetic CRS: WGS 84\n# A tibble: 5 × 5\n name org_id post_code geometry distance\n <chr> <chr> <chr> <POINT [°]> [m]\n1 BIRMINGHAM WOMEN'S AND CH… RQ3 B4 6NH (-1.894241 52.4849) 789.\n2 BIRMINGHAM AND SOLIHULL M… RXT B1 3RB (-1.917663 52.48416) 1313.\n3 BIRMINGHAM COMMUNITY HEAL… RYW B7 4BN (-1.886282 52.48754) 1356.\n4 SANDWELL AND WEST BIRMING… RXK B18 7QH (-1.930203 52.48764) 2246.\n5 UNIVERSITY HOSPITALS BIRM… RRK B15 2GW (-1.9384 52.4533) 3838." }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1", - "title": "System Dynamics in health and care", - "section": "in R - occupancy", - "text": "in R - occupancy\nIterate over each date - need to have been admitted before, and discharged after\n\noccupancy_flag <- function(df) {\n\n # pre-allocate tibble size to speed up iteration in loop\n activity_all <- tibble(nrow = nrow(df)) |> \n select()\n \n for (i in 1:run_len) {\n \n activity_period <- case_when(\n \n # creates 1 flag if resident for complete day\n df$date_admit < keydates$keydate[i] & \n df$date_discharge > keydates$keydate[i] ~ 1,\n TRUE ~ 0)\n \n # column bind this day's flags to previous\n activity_all <- bind_cols(activity_all, activity_period)\n \n }\n \n # rename column to match the day being counted\n activity_all <- activity_all |> \n setNames(paste0(\"d_\", keydates$date))\n \n # bind flags columns to patient data\n daily_adm <- bind_cols(df, activity_all) |> \n pivot_longer(\n cols = starts_with(\"d_\"),\n names_to = \"date\",\n values_to = \"count\"\n ) |> \n \n group_by(date) |> \n summarise(resident = sum(count)) |> \n ungroup() |> \n mutate(date = str_remove(date, \"d_\"))\n \n } \n\n\nIs there a better way than using a for loop?\n\nPre-allocate tibbles\nactivity_all will end up as very wide tibble, with a column for each date in list of keydates.\nFor each date in the list of key dates, compares with admission date & discharge date; need to be admitted before the key date and discharged after the key date. If match, flag = 1.\nCreates a column for each day, then binds this to activity all.\nRename each column with the date it was checking (add a character to start of column name so column doesn’t start with numeric)\nPivot long, then group by date and sum the flags (other variables could be added here, such as TFC or provider code)" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts", + "title": "Coffee and Coding", + "section": "Let’s find driving routes to these trusts", + "text": "Let’s find driving routes to these trusts\n\nroutes <- nearest_trusts |>\n mutate(\n route = map(geometry, ~ osrmRoute(location, st_coordinates(.x)))\n ) |>\n st_drop_geometry() |>\n rename(straight_line_distance = distance) |>\n unnest(route) |>\n st_as_sf()\n\nroutes\n\nSimple feature collection with 5 features and 8 fields\nGeometry type: LINESTRING\nDimension: XY\nBounding box: xmin: -1.93846 ymin: 52.45316 xmax: -1.88527 ymax: 52.49279\nGeodetic CRS: WGS 84\n# A tibble: 5 × 9\n name org_id post_code straight_line_distance src dst duration distance\n <chr> <chr> <chr> [m] <chr> <chr> <dbl> <dbl>\n1 BIRMING… RQ3 B4 6NH 789. 1 dst 5.77 3.09\n2 BIRMING… RXT B1 3RB 1313. 1 dst 6.84 4.14\n3 BIRMING… RYW B7 4BN 1356. 1 dst 7.59 4.29\n4 SANDWEL… RXK B18 7QH 2246. 1 dst 8.78 4.95\n5 UNIVERS… RRK B15 2GW 3838. 1 dst 10.6 4.67\n# ℹ 1 more variable: geometry <LINESTRING [°]>" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows", - "title": "System Dynamics in health and care", - "section": "Longer Time Periods - flows", - "text": "Longer Time Periods - flows\nUse lubridate::floor_date to generate the date at start of week/month\n\nadmit_wk <- spell_dates |> \n mutate(week_start = floor_date(\n date_admit, unit = \"week\", week_start = 1 # start week on Monday\n )) |> \n count(week_start) # could add other parameters such as provider code, TFC etc\n\nhead(admit_wk)\n\n\n\n# A tibble: 6 × 2\n week_start n\n <date> <int>\n1 2021-12-27 52\n2 2022-01-03 196\n3 2022-01-10 192\n4 2022-01-17 223\n5 2022-01-24 157\n6 2022-01-31 187\n\n\n\nMight run SD model in weeks or months - e.g. months for care homes Use lubridate to create new variable with start date of week/month/year etc" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes", + "title": "Coffee and Coding", + "section": "Let’s show the routes", + "text": "Let’s show the routes\n\nleaflet(routes) |>\n addTiles() |>\n addMarkers(data = location) |>\n addPolylines(color = \"black\", weight = 3, opacity = 1) |>\n addCircleMarkers(data = nearest_trusts, radius = 4, opacity = 1, fillOpacity = 1)" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy", - "title": "System Dynamics in health and care", - "section": "Longer Time Periods - occupancy", - "text": "Longer Time Periods - occupancy\nKey dates to include the dates at the start and end of each time period\n\n\n\ndate_start <- dmy(03012022) # first Monday of the year\ndate_end <- dmy(01012023)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"week\"))\n\nkeydates <- data.frame(wk_start = c(seq(date_start, \n by = \"week\", \n length.out=run_len))) |> \n mutate(\n wk_end = wk_start + 6) # last date in time period\n\n\n\n\n wk_start wk_end\n1 2022-01-03 2022-01-09\n2 2022-01-10 2022-01-16\n3 2022-01-17 2022-01-23\n4 2022-01-24 2022-01-30\n5 2022-01-31 2022-02-06\n6 2022-02-07 2022-02-13\n\n\n\n\nModel might make more sense to run in weeks or months (e.g. care home), so list of keydates need a start date and end date for each time period." + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones", + "title": "Coffee and Coding", + "section": "We can use {osrm} to calculate isochrones", + "text": "We can use {osrm} to calculate isochrones\n\n\n\niso <- osrmIsochrone(location, breaks = seq(0, 60, 15), res = 10)\n\nisochrone_ids <- unique(iso$id)\n\npal <- colorFactor(\n viridis::viridis(length(isochrone_ids)),\n isochrone_ids\n)\n\nleaflet(location) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~ pal(id),\n color = \"#000000\",\n weight = 1\n )" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods", - "title": "System Dynamics in health and care", - "section": "Longer Time Periods", - "text": "Longer Time Periods\nMore logic required if working in weeks or months - can only be in one place at any given time\n\n# flag for occupancy\nactivity_period <- case_when(\n \n # creates 1 flag if resident for complete week\n df$date_admit < keydates$wk_start[i] & df$date_discharge > keydates$wk_end[i] ~ 1,\n TRUE ~ 0)\n\n\nAnd a little bit more logic\nOccupancy requires the patient to have been admitted before the start of the week/month, and discharged after the end of the week/month" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones", + "title": "Coffee and Coding", + "section": "What trusts are in the isochrones?", + "text": "What trusts are in the isochrones?\nThe summarise() function will “union” the geometry\n\nsummarise(iso)\n\nSimple feature collection with 1 feature and 0 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: -2.913575 ymin: 51.98062 xmax: -0.8502164 ymax: 53.1084\nGeodetic CRS: WGS 84\n geometry\n1 POLYGON ((-1.541014 52.9693..." }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data", - "title": "System Dynamics in health and care", - "section": "Applying the data", - "text": "Applying the data\n\n\nHow to apply this wrangling of data to the system dynamic model?\nAdmissions data used as an input to the flow - could be reduced to a single figure (average), or there may be variation by season/day of week etc.\nOccupancy (and discharges) used to verify the model output against known data." + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1", + "title": "Coffee and Coding", + "section": "What trusts are in the isochrones?", + "text": "What trusts are in the isochrones?\nWe can use this with a geo-filter to find the trusts in the isochrone\n\n# also works\ntrusts_in_iso <- trusts |>\n st_filter(\n summarise(iso),\n .predicate = st_within\n )\n\ntrusts_in_iso\n\nSimple feature collection with 31 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -2.793386 ymin: 52.19205 xmax: -1.10302 ymax: 53.01015\nGeodetic CRS: WGS 84\n# A tibble: 31 × 4\n name org_id post_code geometry\n * <chr> <chr> <chr> <POINT [°]>\n 1 BIRMINGHAM AND SOLIHULL MENTAL HE… RXT B1 3RB (-1.917663 52.48416)\n 2 BIRMINGHAM COMMUNITY HEALTHCARE N… RYW B7 4BN (-1.886282 52.48754)\n 3 BIRMINGHAM WOMEN'S AND CHILDREN'S… RQ3 B4 6NH (-1.894241 52.4849)\n 4 BIRMINGHAM WOMEN'S NHS FOUNDATION… RLU B15 2TG (-1.942861 52.45325)\n 5 BURTON HOSPITALS NHS FOUNDATION T… RJF DE13 0RB (-1.656667 52.81774)\n 6 COVENTRY AND WARWICKSHIRE PARTNER… RYG CV6 6NY (-1.48692 52.45659)\n 7 DERBYSHIRE HEALTHCARE NHS FOUNDAT… RXM DE22 3LZ (-1.512896 52.91831)\n 8 DUDLEY INTEGRATED HEALTH AND CARE… RYK DY5 1RU (-2.11786 52.48176)\n 9 GEORGE ELIOT HOSPITAL NHS TRUST RLT CV10 7DJ (-1.47844 52.51258)\n10 HEART OF ENGLAND NHS FOUNDATION T… RR1 B9 5ST (-1.828759 52.4781)\n# ℹ 21 more rows" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps", - "title": "System Dynamics in health and care", - "section": "Next Steps", - "text": "Next Steps\n\nGeneralise function to a state where it can be used by others - onto Github\nTurn this into a package\nOpen-source SD models and interfaces - R Shiny or Python" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2", + "title": "Coffee and Coding", + "section": "What trusts are in the isochrones?", + "text": "What trusts are in the isochrones?\n\n\n\nleaflet(trusts_in_iso) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~pal(id),\n color = \"#000000\",\n weight = 1\n )" }, { - "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions", - "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions", - "title": "System Dynamics in health and care", - "section": "Questions, comments, suggestions?", - "text": "Questions, comments, suggestions?\n\n\n\nPlease get in touch!\n\nSally.Thompson37@nhs.net\n\n\n\nNHS-R conference 2023" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius", + "title": "Coffee and Coding", + "section": "Doing the same but within a radius", + "text": "Doing the same but within a radius\n\n\n\nr <- 25000\n\ntrusts_in_radius <- trusts |>\n st_filter(\n location,\n .predicate = st_is_within_distance,\n dist = r\n )\n\n# transforming gives us a pretty smooth circle\nradius <- location |>\n st_transform(crs = 27700) |>\n st_buffer(dist = r) |>\n st_transform(crs = 4326)\n\nleaflet(trusts_in_radius) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = radius,\n color = \"#000000\",\n weight = 1\n )" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap", - "title": "RAP", - "section": "What is RAP", - "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\nGoldacre review" + "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading", + "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading", + "title": "Coffee and Coding", + "section": "Further reading", + "text": "Further reading\n\nGeocomputation with R\nr-spatial\n{sf} documentation\nLeaflet documentation\nTidy Geospatial Networks in R\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve", - "title": "RAP", - "section": "What are we trying to achieve?", - "text": "What are we trying to achieve?\n\nLegibility\nReproducibility\nAccuracy\nLaziness" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here", + "title": "Agile and scrum working", + "section": "How did we get here?", + "text": "How did we get here?\n\nWaterfall approaches were used in the early days of software development\n\nRequirements; Design; Development; Integration; Testing; Deployment\n\nYou only move to the next stage when the first one is complete\n(although actually it turns out you kind of don’t…)" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles", - "title": "RAP", - "section": "What are some of the fundamental principles?", - "text": "What are some of the fundamental principles?\n\nPredictability, reducing mental load, and reducing truck factor\nMaking it easy to collaborate with yourself and others on different computers, in the cloud, in six months’ time…\nDRY" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile", + "title": "Agile and scrum working", + "section": "The road to agile", + "text": "The road to agile\n\nSome of the ideas for agile floated around in the 20th century\nShewart’s Plan-Do-Study-Act cycle\nThe New New Product Development Game in 1986\nScrum (which we’ll return to) was proposed in 1993\nIn 2001 the Manifesto for Agile Software Development was published" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap", - "title": "RAP", - "section": "The road to RAP", - "text": "The road to RAP\n\nWe’re roughly using NHS Digital’s RAP stages\nThere is an incredibly large amount to learn!\nConfession time! (everything I do not know…)\nYou don’t need to do it all at once\nYou don’t need to do it all at all ever\nEach thing you learn will incrementally help you\nRemember- that’s why we learnt this stuff. Because it helped us. And it can help you too" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto", + "title": "Agile and scrum working", + "section": "The agile manifesto", + "text": "The agile manifesto\n\nCopyright © 2001 Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick\nRobert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas\nthis declaration may be freely copied in any form, but only in its entirety through this notice." }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline", - "title": "RAP", - "section": "Levels of RAP- Baseline", - "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL).\nCode is version controlled (see Git basics and using Git collaboratively guides).\nRepository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed.\nCode is published in the open and linked to & from accompanying publication (if relevant).\n\nSource: NHS Digital RAP community of practice" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp", + "title": "Agile and scrum working", + "section": "Agile principles- software and the MVP", + "text": "Agile principles- software and the MVP\n\nOur highest priority is to satisfy the customer through early and continuous delivery of valuable software.\nDeliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.\nWorking software is the primary measure of progress.\n\n(these principles and those on following slides copyright Ibid.)" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver", - "title": "RAP", - "section": "Levels of RAP- Silver", - "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\nData is handled and output in a Tidy data format\n\nSource: NHS Digital RAP community of practice" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers", + "title": "Agile and scrum working", + "section": "Agile principles- working with customers", + "text": "Agile principles- working with customers\n\nWelcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.\nBusiness people and developers must work together daily throughout the project." }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold", - "title": "RAP", - "section": "Levels of RAP- Gold", - "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\nSource: NHS Digital RAP community of practice" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork", + "title": "Agile and scrum working", + "section": "Agile principles- teamwork", + "text": "Agile principles- teamwork\n\nBuild projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.\nThe most efficient and effective method of conveying information to and within a development team is face-to-face conversation.\nThe best architectures, requirements, and designs emerge from self-organizing teams.\nAt regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly." }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there", - "title": "RAP", - "section": "A learning journey to get you there", - "text": "A learning journey to get you there\n\nCode style, organising your files\nFunctions and iteration\nGit and GitHub\nPackaging your code\nTesting\nPackage management and versioning" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management", + "title": "Agile and scrum working", + "section": "Agile principles- project management", + "text": "Agile principles- project management\n\nAgile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.\nContinuous attention to technical excellence and good design enhances agility.\nSimplicity–the art of maximizing the amount of work not done–is essential." }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there", - "title": "RAP", - "section": "How we can help each other get there", - "text": "How we can help each other get there\n\nWork as a team!\nCoffee and coding!\nAsk for help!\nDo pair coding!\nGet your code reviewed!\nJoin the NHS-R/ NHSPycom communities" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage", + "title": "Agile and scrum working", + "section": "The agile advantage", + "text": "The agile advantage\n\nBetter use of fixed resources to deliver an unknown outcome, rather than unknown resources to deliver a fixed outcome\nContinuous delivery" }, { - "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca", - "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca", - "title": "RAP", - "section": "HACA", - "text": "HACA\n\nThe first national analytics conference for health and care\nInsight to action!\nJuly 11th and 12th, University of Birmingham\nAccepting abstracts for short and long talks and posters\nAbstract deadline 27th March\nHelp is available (with abstract, poster, preparing presentation…)!\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep", + "title": "Agile and scrum working", + "section": "Feature creep", + "text": "Feature creep\n\nUsers ask for: everything they need, everything they think they may need, everything they want, everything they think they may want\n\n“every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can”\n\nZawinski’s Law- Source" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman", - "href": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman", - "title": "Open source licensing", - "section": "A note on Richard Stallman", - "text": "A note on Richard Stallman\n\nRichard Stallman has been heavily criticised for some of this views\nHe is hard to ignore when talking about open source so I am going to talk about him\nNothing in this talk should be read as endorsing any of his comments outside (or inside) the world of open source" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback", + "title": "Agile and scrum working", + "section": "Regular stakeholder feedback", + "text": "Regular stakeholder feedback\n\nAgile teams are very responsive to product feedback\nThe project we’re curently working on is very agile whether we like it or not\nOur customers never know what they want until we show them something they don’t want" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source", - "href": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source", - "title": "Open source licensing", - "section": "The origin of open source", - "text": "The origin of open source\n\nIn the 50s and 60s source code was routinely shared with hardware and users were often expected to modify to run on their hardware\nBy the late 1960s the production cost of software was rising relative to hardware and proprietary licences became more prevalent\nIn 1980 Richard Stallman’s department at MIT took delivery of a printer they were not able to modify the source code for\nRichard Stallman launched the GNU project in 1983 to fight for software freedoms\nMIT licence was launched in the late 1980s\nCathedral and the bazaar was released in 1997 (more on which later)" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages", + "title": "Agile and scrum working", + "section": "More agile advantages", + "text": "More agile advantages\n\nEarly and cheap failure\nContinuous testing and QA\nReduction in unproductive work\nTeam can improve regularly, not just the product" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source", - "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source", - "title": "Open source licensing", - "section": "What is open source?", - "text": "What is open source?\n\nThink free as in free speech, not free beer (Stallman)\n\n\nOpen source does not mean free of charge! Software freedom implies the ability to sell code\nFree of charge does not mean open source! Many free to download pieces of software are not open source (Zoom, for example)\n\n\nBy Chao-Kuei et al. - https://www.gnu.org/philosophy/categories.en.html, GPL, Link" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods", + "title": "Agile and scrum working", + "section": "Agile methods", + "text": "Agile methods\n\nThere are lots of agile methodologies\nI’m not going to embarrass myself by pretending to understand them\nExamples include Lean, Crystal, and Extreme Programming" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms", - "href": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms", - "title": "Open source licensing", - "section": "The four freedoms", - "text": "The four freedoms\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 1: The freedom to study how the program works, and change it to make it do what you wish.\nFreedom 2: The freedom to redistribute and make copies so you can help your neighbor.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits." + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum", + "title": "Agile and scrum working", + "section": "Scrum", + "text": "Scrum\n\nScrum is the agile methodology we have adopted\nDespite dire warnings to the contrary we have not adopted it wholesale but most of its principles\nThe fundamental organising principle of work in scrum is a sprint lasting 1-4 weeks\nEach sprint finishes with a defined and useful piece of software that can be shown to/ used by customers" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar", - "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar", - "title": "Open source licensing", - "section": "Cathedral and the bazaar", - "text": "Cathedral and the bazaar\n\nEvery good work of software starts by scratching a developer’s personal itch.\nGood programmers know what to write. Great ones know what to rewrite (and reuse).\nPlan to throw one [version] away; you will, anyhow (copied from Frederick Brooks’s The Mythical Man-Month).\nIf you have the right attitude, interesting problems will find you.\nWhen you lose interest in a program, your last duty to it is to hand it off to a competent successor.\nTreating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.\nRelease early. Release often. And listen to your customers.\nGiven a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.\nSmart data structures and dumb code works a lot better than the other way around.\nIf you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource." + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner", + "title": "Agile and scrum working", + "section": "Product owner", + "text": "Product owner\n\nThis person is responsible for the backlog- what goes in to the sprint\nThe backlog should be inclusive of all of the things that customers want or might want\nThe backlog should be prioritised\nThe product owner does this through deep and frequent conversations with customers" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.", - "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.", - "title": "Open source licensing", - "section": "Cathedral and the bazaar (cont.)", - "text": "Cathedral and the bazaar (cont.)\n\nThe next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.\nOften, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.\nPerfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away. (Attributed to Antoine de Saint-Exupéry)\nAny tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.\nWhen writing gateway software of any kind, take pains to disturb the data stream as little as possible—and never throw away information unless the recipient forces you to!\nWhen your language is nowhere near Turing-complete, syntactic sugar can be your friend.\nA security system is only as secure as its secret. Beware of pseudo-secrets.\nTo solve an interesting problem, start by finding a problem that is interesting to you.\nProvided the development coordinator has a communications medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one." + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team", + "title": "Agile and scrum working", + "section": "Scrum master helps the scrum team", + "text": "Scrum master helps the scrum team\n\n“By coaching the team members in self-management and cross-functionality\nFocus on creating high-value Increments that meet the Definition of Done\nInfluence the removal of impediments to the Scrum Team’s progress\nEnsure that all Scrum events take place and are positive, productive, and kept within the timebox.”\n\nSource" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science", - "href": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science", - "title": "Open source licensing", - "section": "The disciplines of open source are the disciplines of good data science", - "text": "The disciplines of open source are the disciplines of good data science\n\nMeaningful README\nMeaningful commit messages\nModularity\nSeparating data code from analytic code from interactive code\nAssigning issues and pull requests for action/ review\nDon’t forget one of the most lazy and incompetent developers you will ever work with is yourself, six months later" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog", + "title": "Agile and scrum working", + "section": "The backlog", + "text": "The backlog\n\nHaving an accurate and well prioritised backlog is key\nDon’t estimate the backlog in hours- use “T shirt sizes” or “points”\nPeople are terrible at estimating how long things take- particularly in software\nEverything in the backlog needs a defined “Done” state" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist", - "href": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist", - "title": "Open source licensing", - "section": "What licences exist?", - "text": "What licences exist?\n\nPermissive\n\nSuch as MIT but there are others. Recommended by NHSX draft guidelines on open source\nApache is a notable permissive licence- includes a patent licence\nIn our work the OGL is also relevant- civil servant publish stuff under OGL (and MIT- it isn’t particularly recommended for code)\n\nCopyleft\n\nGPL2, GPL3, AGPL (“the GPL of the web”)\nNote that the provisions of the GPL only apply when you distribute the code\nAt a certain point it all gets too complicated and you need a lawyer\nMPL is a notable copyleft licence- can combine with proprietary code as long as kept separate\n\nArguments for permissive/ copyleft- getting your code used versus preserving software freedoms for other people\nNote that most of the licences are impossible to read! There is a website to explain tl;dr" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning", + "title": "Agile and scrum working", + "section": "Sprint planning", + "text": "Sprint planning\n\nThe team, the product owner, and the scrum master plan the sprint\nSprints should be a fixed length of time less than one month\nThe sprint cannot be changed or added to (we break this rule)\nThe team works autonomously in the sprint- nobody decides who does what except the team\nCan take three hours and should if it needs to" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter", - "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter", - "title": "Open source licensing", - "section": "What is copyright and why does it matter", - "text": "What is copyright and why does it matter\n\nCopyright is assigned at the moment of creation\nIf you made it in your own time, it’s yours (usually!)\nIf you made it at work, it belongs to your employer\nIf someone paid you to make it (“work for hire”) it belongs to them\nCrucially, the copyright holder can relicence software\n\nIf it’s jointly authored it depends if it’s a “collective” or “joint” work\nHonestly it’s pretty complicated. Just vest copyright in an organisation or group of individuals you trust\nGoldacre review suggests using Crown copyright for copyright in the NHS because it’s a “shoal, not a big fish” (with apologies to Ben whom I am misquoting)" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#standup", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#standup", + "title": "Agile and scrum working", + "section": "Standup", + "text": "Standup\n\nEvery day, for no more than 15 minutes (teams often stand up to reinforce this rule) team and scrum master meet\nEach person answers three questions\n\nWhat did you do yesterday to help the team finish the sprint?\nWhat will you do today to help the team finish the sprint?\nIs there an obstacle blocking you or the team from achieveing the sprint goal" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel", - "href": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel", - "title": "Open source licensing", - "section": "Iceweasel", - "text": "Iceweasel\n\nIceweasel is a story of trademark rather than copyright\nDebian (a Linux flavour) had the permission to use the source code of Firefox, but not the logo\nSo they took the source code and made their own version\nThis sounds very obscure and unimportant but it could become important in future projects of ours, like…" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro", + "title": "Agile and scrum working", + "section": "Sprint retro", + "text": "Sprint retro\n\nWhat went well, what could have gone better, and what to improve next time\nLooking at process, not blaming individuals\nRequires maturity and trust to bring up issues, and to respond to them in a constructive way\nShould agree at the end on one process improvement which goes in the next sprint\nWe’ve had some really, really good retros and I think it’s a really important process for a team" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects", - "href": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects", - "title": "Open source licensing", - "section": "What we have learned in recent projects", - "text": "What we have learned in recent projects\n\nThe huge benefits of being open\n\nTransparency\nWorking with customers\nGoodwill\n\nNonfree mitigators\nDifferent licences for different repos" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective", + "title": "Agile and scrum working", + "section": "Team perspective", + "text": "Team perspective\n\nProduct owner- that’s me\n\nFocus, clarity and transparency, team delivery, clear and appropriate responsibilities\n\nScrum master- YiWen\nTeam member- Matt\nTeam member- Rhian" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like", - "href": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like", - "title": "Open source licensing", - "section": "Software freedom means allowing people to do stuff you don’t like", - "text": "Software freedom means allowing people to do stuff you don’t like\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.\nThe code isn’t the only thing with worth in the project\nThis is why there are whole businesses founded on “here’s the Linux source code”\nSo when we’re sharing code we are letting people do stupid things with it but we’re not recommending that they do stupid things with it\nPeople do stupid things with Excel and Microsoft don’t accept liability for that, and neither should we\nThis issue of sharing analytic code and merchantability for a particular purpose is poorly understood and I think everyone needs to be clearer on it (us, and our customers)\nIn my view a world where consultants are selling our code is better than a world where they’re selling their spreadsheets" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values", + "title": "Agile and scrum working", + "section": "Scrum values", + "text": "Scrum values\n\nCourage\nFocus\nCommitment\nRespect\nOpenness" }, { - "objectID": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano", - "href": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano", - "title": "Open source licensing", - "section": "“Open source as in piano”", - "text": "“Open source as in piano”\n\nThe patient experience QDC project\nOur current project\nOpen source code is not necessarily to be run, but understood and learned from\nBuilding a group of people who can use and contribute to your code is arguably as important as writing it\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" + "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software", + "href": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software", + "title": "Agile and scrum working", + "section": "Using agile outside of software", + "text": "Using agile outside of software\n\nData science is outside of software (IMHO)\n\nWe don’t have daily standups and some of our processes run longer than in software development\n\nYou can build cars with Agile\nMarketing and UX design\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations" }, { "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#targets-for-analysts", diff --git a/sitemap.xml b/sitemap.xml index 984b291..070cb34 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,150 +2,154 @@ https://the-strategy-unit.github.io/data_science/index.html - 2024-08-21T07:56:41.558Z + 2024-08-22T14:08:32.579Z https://the-strategy-unit.github.io/data_science/style/project_structure.html - 2024-08-21T07:56:41.606Z + 2024-08-22T14:08:32.631Z https://the-strategy-unit.github.io/data_science/style/git_and_github.html - 2024-08-21T07:56:41.606Z + 2024-08-22T14:08:32.631Z https://the-strategy-unit.github.io/data_science/presentations/2023-03-23_coffee-and-coding/index.html - 2024-08-21T07:56:41.570Z + 2024-08-22T14:08:32.591Z https://the-strategy-unit.github.io/data_science/presentations/2023-02-23_coffee-and-coding/index.html - 2024-08-21T07:56:41.558Z + 2024-08-22T14:08:32.583Z https://the-strategy-unit.github.io/data_science/presentations/index.html - 2024-08-21T07:56:41.606Z + 2024-08-22T14:08:32.631Z https://the-strategy-unit.github.io/data_science/presentations/2023-03-23_collaborative-working/index.html - 2024-08-21T07:56:41.570Z + 2024-08-22T14:08:32.591Z https://the-strategy-unit.github.io/data_science/presentations/2023-10-17_conference-check-in-app/index.html - 2024-08-21T07:56:41.594Z + 2024-08-22T14:08:32.615Z https://the-strategy-unit.github.io/data_science/presentations/2023-08-02_mlcsu-ksn-meeting/index.html - 2024-08-21T07:56:41.574Z + 2024-08-22T14:08:32.595Z - https://the-strategy-unit.github.io/data_science/presentations/2023-08-24_coffee-and-coding_geospatial/index.html - 2024-08-21T07:56:41.574Z + https://the-strategy-unit.github.io/data_science/presentations/2024-05-30_open-source-licensing/index.html + 2024-08-22T14:08:32.627Z - https://the-strategy-unit.github.io/data_science/presentations/2023-08-23_nhs-r_unit-testing/index.html - 2024-08-21T07:56:41.574Z + https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_midlands-analyst-rap/index.html + 2024-08-22T14:08:32.591Z - https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_coffee-and-coding/index.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html + 2024-08-22T14:08:32.607Z - https://the-strategy-unit.github.io/data_science/presentations/2023-07-11_haca-nhp-demand-model/index.html - 2024-08-21T07:56:41.570Z + https://the-strategy-unit.github.io/data_science/presentations/2024-05-16_store-data-safely/index.html + 2024-08-22T14:08:32.615Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-02-28_sankey_plot.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26_alternative_remotes.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/index.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/azure_python.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-13_one-year-coffee-code.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-21-rstudio-tips/index.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-24_hotfix-with-git.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26-reinstalling-r-packages.html + 2024-08-22T14:08:32.579Z + + + https://the-strategy-unit.github.io/data_science/blogs/index.html + 2024-08-22T14:08:32.575Z https://the-strategy-unit.github.io/data_science/about.html - 2024-08-21T07:56:41.554Z + 2024-08-22T14:08:32.575Z - https://the-strategy-unit.github.io/data_science/blogs/index.html - 2024-08-21T07:56:41.554Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-24_hotfix-with-git.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26-reinstalling-r-packages.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-13_one-year-coffee-code.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-21-rstudio-tips/index.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/index.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/azure_python.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/blogs/posts/2024-02-28_sankey_plot.html + 2024-08-22T14:08:32.579Z - https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26_alternative_remotes.html - 2024-08-21T07:56:41.558Z + https://the-strategy-unit.github.io/data_science/presentations/2023-07-11_haca-nhp-demand-model/index.html + 2024-08-22T14:08:32.591Z - https://the-strategy-unit.github.io/data_science/presentations/2024-05-16_store-data-safely/index.html - 2024-08-21T07:56:41.598Z + https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_coffee-and-coding/index.html + 2024-08-22T14:08:32.583Z - https://the-strategy-unit.github.io/data_science/presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html - 2024-08-21T07:56:41.586Z + https://the-strategy-unit.github.io/data_science/presentations/2023-08-23_nhs-r_unit-testing/index.html + 2024-08-22T14:08:32.595Z - https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_midlands-analyst-rap/index.html - 2024-08-21T07:56:41.570Z + https://the-strategy-unit.github.io/data_science/presentations/2023-08-24_coffee-and-coding_geospatial/index.html + 2024-08-22T14:08:32.595Z - https://the-strategy-unit.github.io/data_science/presentations/2024-05-30_open-source-licensing/index.html - 2024-08-21T07:56:41.606Z + https://the-strategy-unit.github.io/data_science/presentations/2024-08-22_agile-and-scrum/index.html + 2024-08-22T14:08:32.627Z https://the-strategy-unit.github.io/data_science/presentations/2024-01-25_coffee-and-coding/index.html - 2024-08-21T07:56:41.594Z + 2024-08-22T14:08:32.615Z https://the-strategy-unit.github.io/data_science/presentations/2023-05-15_text-mining/index.html - 2024-08-21T07:56:41.570Z + 2024-08-22T14:08:32.591Z https://the-strategy-unit.github.io/data_science/presentations/2023-09-07_coffee_and_coding_functions/index.html - 2024-08-21T07:56:41.578Z + 2024-08-22T14:08:32.599Z https://the-strategy-unit.github.io/data_science/presentations/2023-05-23_data-science-for-good/index.html - 2024-08-21T07:56:41.570Z + 2024-08-22T14:08:32.591Z https://the-strategy-unit.github.io/data_science/presentations/2024-05-23_github-team-sport/index.html - 2024-08-21T07:56:41.602Z + 2024-08-22T14:08:32.623Z https://the-strategy-unit.github.io/data_science/presentations/2023-02-01_what-is-data-science/index.html - 2024-08-21T07:56:41.558Z + 2024-08-22T14:08:32.579Z https://the-strategy-unit.github.io/data_science/style/data_storage.html - 2024-08-21T07:56:41.606Z + 2024-08-22T14:08:32.631Z https://the-strategy-unit.github.io/data_science/style/style_guide.html - 2024-08-21T07:56:41.610Z + 2024-08-22T14:08:32.631Z