Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micellaneous fixes to the release #897

Merged
merged 5 commits into from
Feb 20, 2024
Merged

Conversation

euanmillar
Copy link
Collaborator

@euanmillar euanmillar commented Feb 15, 2024

@rikukissa I had a number of issues when I was testing setting up 4 new servers. We also experienced space filling up on servers with old kernel files perhaps related to ubuntu periodic updates. I will be re-tagging the 1.4.0 release with these fixes before the IPP training program commences.

  • ALERT_EMAIL & SMTP details not passed to Elastalert correctly.
  • Elastalert having a connection issue.
  • Minor bugs in the provision playbook.
  • Kibana rules not added to new servers in deploy.sh as Kibana isnt ready to receive comms. Put in a 2 minute wait.
  • The Ansible version on Github contains this bug which causes the created backup user to have bad permissions and the backup script gets a permission denied (public key) error when SSH'ing in during the cronjob. Instead we will instruct the implementers to manually create a backup user in documentation.
  • The yarn environment:init script doesnt create RESTORE_BACKUP_ENCRYPTION_PASSPHRASE for any server. Therefore the staging server crontab doesnt have an entry to restore a backup. In my view there is no need for restore_backup_encryption_passphrase as it will be the same as the backup_encryption_passphrase.

Copy link
Member

@n1koo n1koo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@euanmillar euanmillar merged commit 85c6175 into release-v1.4.0 Feb 20, 2024
1 check passed
@euanmillar euanmillar deleted the fixes-to-release-v1.4.0 branch February 20, 2024 18:16
Comment on lines +33 to +40
for file in /opt/opencrvs/infrastructure/monitoring/elastalert/rules/*.yaml; do
sed -i -e "s%{{HOST}}%$1%" $file
sed -i -e "s%{{SMTP_HOST}}%$SMTP_HOST%" $file
sed -i -e "s%{{SMTP_PORT}}%$SMTP_PORT%" $file
sed -i -e "s%{{ALERT_EMAIL}}%$ALERT_EMAIL%" $file
sed -i -e "s%{{SENDER_EMAIL_ADDRESS}}%$SENDER_EMAIL_ADDRESS%" $file
sed -i -e "s%{{DOMAIN}}%$DOMAIN%" $file
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are replaced by the country config server here and there shouldn't be a need to pass these in anymore.

HOST, SMTP_HOST and SMTP_PORT are not referred to anymore in the Elastalert *.yaml files so adding these won't change the behaviour. The only thing Elastalert needs to know is to which URL the email HTTP request needs to be sent and the country config server takes care of using the right SMTP details and other configuration.

My guess is a deployment was made using an 1.3.x version of the server and 1.4.0 version of the infrastructure? @euanmillar

@@ -26,6 +26,19 @@ done
KIBANA_ENCRYPTION_KEY=`uuidgen`
sed -i "s/{{KIBANA_ENCRYPTION_KEY}}/$KIBANA_ENCRYPTION_KEY/g" /opt/opencrvs/infrastructure/monitoring/kibana/kibana.yml

# Move metabase file
mv /opt/opencrvs/infrastructure/metabase.init.db.sql /data/metabase/metabase.init.db.sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metabase init SQL file is not read from the /data directory. This is because it doesn't ever change at runtime. There's also a security implication if we copy the file under /data: done this way, the deployment user needs write and read access to the /data directory which I'd really want to avoid.

https://github.com/opencrvs/opencrvs-farajaland/blob/develop/infrastructure/docker-compose.deploy.yml#L1019-L1020

@@ -366,6 +372,7 @@ echo
echo "Waiting 2 mins for mongo to deploy before working with data. Please note it can take up to 10 minutes for the entire stack to deploy in some scenarios."
echo

sleep 120 # Required as Kibana cannot be immediately contacted
echo "Setting up Kibana config & alerts"

while true; do
Copy link
Member

@rikukissa rikukissa Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This while loop should run until the curl commands succeeds. What exactly was the issue here? I originally removed the 120s sleep as in some cases it wasn't enough to guarantee ES + Kibana was up and didn't account for random failures.

I added logic to the script now to first verify Kibana is up and ready to receive requests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was that on a brand new server alert rules were not added into Kibana because Kibana wasnt ready to receive them.

Comment on lines +355 to +360
HOST=$HOST
SMTP_HOST=$SMTP_HOST
SMTP_PORT=$SMTP_PORT
ALERT_EMAIL=$ALERT_EMAIL
SENDER_EMAIL_ADDRESS=$SENDER_EMAIL_ADDRESS
DOMAIN=$DOMAIN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are passed down automatically by configured_ssh as part of it passing down all Github secrets

@@ -405,6 +405,8 @@ services:
- '/opt/opencrvs/infrastructure/monitoring/elastalert/rules:/opt/elastalert/rules'
networks:
- overlay_net
depends_on:
Copy link
Member

@rikukissa rikukissa Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The depends_on option is ignored when deploying a stack in swarm mode with a version 3 Compose file.

https://docs.docker.com/compose/compose-file/compose-file-v3/#depends_on

I cleaned these up from the compose files completely now

Comment on lines -19 to -20
es_username: '{{ES_USERNAME}}'
es_password: '{{ES_PASSWORD}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are probably needed for Elastalert to be able to connect to ES?

Copy link
Member

@rikukissa rikukissa Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, these can indeed be removed as environment variables passed down to Elastalert overwrite and have been overwriting these for a long time. Assuming env variables are defined, these shouldn't affect anything but good to clean up

@@ -87,16 +87,6 @@
vars:
manager_hostname: "{{ groups['docker-manager-first'][0] }}"
tasks:
- name: Ensure backup user is present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ansible version on Github contains ansible/ansible#75352 bug which causes the created backup user to have bad permissions and the backup script gets a permission denied (public key) error when SSH'ing in during the cronjob. Instead we will instruct the implementers to manually create a backup user in documentation.

It's better to keep the setup as automated as possible so we have maximum control and predictability over the environment. Let's instead add another step that sets the permissions correctly to mitigate the issue.

@rikukissa
Copy link
Member

The yarn environment:init script doesnt create RESTORE_BACKUP_ENCRYPTION_PASSPHRASE for any server. Therefore the staging server crontab doesnt have an entry to restore a backup. In my view there is no need for restore_backup_encryption_passphrase as it will be the same as the backup_encryption_passphrase

Yea, this is fine 👍

rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
@rikukissa
Copy link
Member

rikukissa commented Feb 27, 2024

Went through these issues with a completely new 1.4.1 + 1.4.1 environment:

Alert emails

  • ALERT_EMAIL & SMTP details not passed to Elastalert correctly. ✅
    • Elastalert now only needs the HTTP endpoint for countryconfig. This seems to work ok now.
  • Elastalert having a connection issue. ✅

I saw the following errors in country config logs

Unable to send email to {{ALERT_EMAIL}} for error : Error: Connection timeout

In the end, there was two different issues here:

  1. In 1.4.x, the credential formats have changed a bit for Sendgrid as we're now using nodemailer. The values need to be
SMTP_HOST="smtp.sendgrid.net"
SMTP_USERNAME="apikey" # Hard-coded string
SMTP_PASSWORD="SG.TTtfXtp......bPI" # Raw API key
SMTP_PORT="587"
SMTP_SECURE="false"
  1. The handlebar replacement was happening after that log was being sent. This didn't affect how the code was working but caused the confusing log output. This is now fixed.

Other

  • Minor bugs in the provision playbook ✅

  • Kibana rules not added to new servers in deploy.sh as Kibana isnt ready to receive comms. Put in a 2 minute wait. ✅

    • I restructured the script a bit to get this fixed. The script now waits until Kibana responds with 200 status before doing anything.
  • The Ansible version on Github contains User module creates user's home with root:root permissions ansible/ansible#75352 bug which causes the created backup user to have bad permissions and the backup script gets a permission denied (public key) error when SSH'ing in during the cronjob. Instead we will instruct the implementers to manually create a backup user in documentation.

This one I couldn't reproduce 🤔

riku@riku-staging:~$ sudo ls -la /home/
total 32
drwxr-xr-x  8 root      root      4096 Feb 28 07:14 .
drwxr-xr-x 20 root      root      4096 Feb 27 13:54 ..
drwxr-xr-x  4 backup    backup    4096 Feb 28 07:15 backup
drwxr-x---  5 provision provision 4096 Feb 27 17:41 provision
drwxr-x---  4 riku      riku      4096 Feb 28 06:23 riku

riku@riku-staging:~$ sudo ls -la  /home/backup
total 28
drwxr-xr-x 4 backup backup 4096 Feb 28 07:15 .
drwxr-xr-x 8 root   root   4096 Feb 28 07:14 ..
-rw-r--r-- 1 backup backup  220 Jan  7  2023 .bash_logout
-rw-r--r-- 1 backup backup 3771 Jan  7  2023 .bashrc
-rw-r--r-- 1 backup backup    0 Feb 27 12:54 .cloud-locale-test.skip
-rw-r--r-- 1 backup backup  807 Jan  7  2023 .profile
drwx------ 2 backup root   4096 Feb 28 07:14 .ssh
drwxr-xr-x 2 backup root   4096 Feb 28 07:15 backups

rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 27, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 28, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 28, 2024
comments about why certain changes are potentially needed can be found here #897
rikukissa added a commit that referenced this pull request Feb 28, 2024
- Improves email logging
- Removes `depends_on` configuration from docker compose files
- Removes some deprecated deployment code around Elastalert config file formatting
- Creates backup user on backup servers automatically
- Verifys Kibana is ready before setting up alert configuration

comments about why certain changes are potentially needed can be found here #897
naftis added a commit to opencrvs/opencrvs-countryconfig-mosip that referenced this pull request Apr 17, 2024
* Change the verbiage from sms to email in Send for approval section and in introduction page (#868)

* update the translations to use email instead of SMS

* update changelog

* update password copy to suggest a 12 character long passphrase (#867)

* chore: update translations for password change

* make the wording same as in the number validation

* Allow pip to update externally-managed packages

* add SSH_USER to deployment pipeline

* add provision user key to legacy authorized_keys file

* start release v1.4.1

* fix release link reference

* Farajaland certificates updated (#891)

* Farajaland certificates updated

* Remove outdated README

* Make occupation field usable for deceased (#888)

* Make occupation field usable for deceased

* Update CHANGELOG

* Farajaland form updates (#805)

* Add custom field in death form deceased section: Number of dependants

* Replace national id input with type of id select and id number input

* Uncomment spouse section in death form

* Feedback amends

* Make id field custom

* Informant section amends

* Fix typo

* Remove unused import

* Update death informant id fields conditionals to support nid integration

* Reason for delayed registration custom form field (#894)

* Custom field reason for late registration

* Add field for death

* Update metabase init db

* Update metabase auth db

* Revert "Update metabase auth db"

This reverts commit 9fd5898.

* Fix preceding field id for the marriage informant address (#906)

* add age of deceased handlebar & marriage international address fix (#895)

* add handlebar of age of deceased in death certificate

* update place of marriage info in certificate for international location

* update for ageOfDeceased handlebars in death form

* ocrvs-6517 Hide fields when informantType not selected (#899)

* ocrvs-6204 fix date of birth unknown checkbox conditional & update CHANGELOG (#898)

* Consider detailsExist conditional when age unknown

* Update CHANGELOG

* Add search id placeholder copy

* Fix death certificate birthdate field

* Update handlebars (#905)

* Update ID number handlebars

* Remove duplicate location handlebars

* Keep the placeOfBirth handlebar

* bump release number

* Fix typo in certificate handlebar names

* remove authorized keys file

* Clean up deprecated deployment code

- Improves email logging
- Removes `depends_on` configuration from docker compose files
- Removes some deprecated deployment code around Elastalert config file formatting
- Creates backup user on backup servers automatically
- Verifys Kibana is ready before setting up alert configuration

comments about why certain changes are potentially needed can be found here opencrvs/opencrvs-farajaland#897

* Fix regex of dob fields

* update metabase init file so it's compatible with current metabase version

* fix elastalert matching rule for ssh logins

* Add a default fallback for variable enable_backups in development provision

* Set default value false for enable_backups in development.yml

* Apply default value in enable_backups implementation

* Update tag name in action input

* fix Elastalert2 matching rule for SSH logins

* Sort staged translation files (#795)

* Add a precommit hook for sorting translation files

* Sort the translations files

* Skip the description files

Co-authored-by: Tameem Bin Haider <[email protected]>

* Sort client, login, notification message files

* Update messages with according to recent requirements

* Update descriptions

* send an email to verify successful deployment

* polish the deployment email a bit

* move copy files to csv files

* remove validate translations file now that from the CSV it's easier to see what is missing and what is not

* add back disappeared content handler

* add unit test CI pipeline back

* update sort-translations script

* use unquoted format for CSVs so it matches the google sheets export format

* import and export csv files in google sheets once so doing it later doesnt cause extra diff

* add new copy entries to csv files, remove AVAILABLE_LANGUAGES_SELECT

* update changelog

* fix failing test

* do not send displayName field anymore

* add missing translations

* revert package.json changes

* remove jest types

---------

Co-authored-by: Tameem Bin Haider <[email protected]>
Co-authored-by: Riku Rouvila <[email protected]>
Co-authored-by: Euan Millar <[email protected]>
Co-authored-by: Tahmid Rahman <[email protected]>
Co-authored-by: tahmidrahman-dsi <[email protected]>
Co-authored-by: Md. Ashikul Alam <[email protected]>
Co-authored-by: Md. Ashikul Alam <[email protected]>
Co-authored-by: euanmillar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants