-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Micellaneous fixes to the release #897
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
for file in /opt/opencrvs/infrastructure/monitoring/elastalert/rules/*.yaml; do | ||
sed -i -e "s%{{HOST}}%$1%" $file | ||
sed -i -e "s%{{SMTP_HOST}}%$SMTP_HOST%" $file | ||
sed -i -e "s%{{SMTP_PORT}}%$SMTP_PORT%" $file | ||
sed -i -e "s%{{ALERT_EMAIL}}%$ALERT_EMAIL%" $file | ||
sed -i -e "s%{{SENDER_EMAIL_ADDRESS}}%$SENDER_EMAIL_ADDRESS%" $file | ||
sed -i -e "s%{{DOMAIN}}%$DOMAIN%" $file | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are replaced by the country config server here and there shouldn't be a need to pass these in anymore.
HOST
, SMTP_HOST
and SMTP_PORT
are not referred to anymore in the Elastalert *.yaml files so adding these won't change the behaviour. The only thing Elastalert needs to know is to which URL the email HTTP request needs to be sent and the country config server takes care of using the right SMTP details and other configuration.
My guess is a deployment was made using an 1.3.x version of the server and 1.4.0 version of the infrastructure? @euanmillar
@@ -26,6 +26,19 @@ done | |||
KIBANA_ENCRYPTION_KEY=`uuidgen` | |||
sed -i "s/{{KIBANA_ENCRYPTION_KEY}}/$KIBANA_ENCRYPTION_KEY/g" /opt/opencrvs/infrastructure/monitoring/kibana/kibana.yml | |||
|
|||
# Move metabase file | |||
mv /opt/opencrvs/infrastructure/metabase.init.db.sql /data/metabase/metabase.init.db.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metabase init SQL file is not read from the /data directory. This is because it doesn't ever change at runtime. There's also a security implication if we copy the file under /data
: done this way, the deployment user needs write and read access to the /data directory which I'd really want to avoid.
@@ -366,6 +372,7 @@ echo | |||
echo "Waiting 2 mins for mongo to deploy before working with data. Please note it can take up to 10 minutes for the entire stack to deploy in some scenarios." | |||
echo | |||
|
|||
sleep 120 # Required as Kibana cannot be immediately contacted | |||
echo "Setting up Kibana config & alerts" | |||
|
|||
while true; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This while loop should run until the curl commands succeeds. What exactly was the issue here? I originally removed the 120s sleep as in some cases it wasn't enough to guarantee ES + Kibana was up and didn't account for random failures.
I added logic to the script now to first verify Kibana is up and ready to receive requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was that on a brand new server alert rules were not added into Kibana because Kibana wasnt ready to receive them.
HOST=$HOST | ||
SMTP_HOST=$SMTP_HOST | ||
SMTP_PORT=$SMTP_PORT | ||
ALERT_EMAIL=$ALERT_EMAIL | ||
SENDER_EMAIL_ADDRESS=$SENDER_EMAIL_ADDRESS | ||
DOMAIN=$DOMAIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are passed down automatically by configured_ssh
as part of it passing down all Github secrets
@@ -405,6 +405,8 @@ services: | |||
- '/opt/opencrvs/infrastructure/monitoring/elastalert/rules:/opt/elastalert/rules' | |||
networks: | |||
- overlay_net | |||
depends_on: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The depends_on option is ignored when deploying a stack in swarm mode with a version 3 Compose file.
https://docs.docker.com/compose/compose-file/compose-file-v3/#depends_on
I cleaned these up from the compose files completely now
es_username: '{{ES_USERNAME}}' | ||
es_password: '{{ES_PASSWORD}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are probably needed for Elastalert to be able to connect to ES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, these can indeed be removed as environment variables passed down to Elastalert overwrite and have been overwriting these for a long time. Assuming env variables are defined, these shouldn't affect anything but good to clean up
@@ -87,16 +87,6 @@ | |||
vars: | |||
manager_hostname: "{{ groups['docker-manager-first'][0] }}" | |||
tasks: | |||
- name: Ensure backup user is present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ansible version on Github contains ansible/ansible#75352 bug which causes the created backup user to have bad permissions and the backup script gets a permission denied (public key) error when SSH'ing in during the cronjob. Instead we will instruct the implementers to manually create a backup user in documentation.
It's better to keep the setup as automated as possible so we have maximum control and predictability over the environment. Let's instead add another step that sets the permissions correctly to mitigate the issue.
Yea, this is fine 👍 |
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
Went through these issues with a completely new 1.4.1 + 1.4.1 environment: Alert emails
I saw the following errors in country config logs
In the end, there was two different issues here:
Other
This one I couldn't reproduce 🤔
|
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
comments about why certain changes are potentially needed can be found here #897
- Improves email logging - Removes `depends_on` configuration from docker compose files - Removes some deprecated deployment code around Elastalert config file formatting - Creates backup user on backup servers automatically - Verifys Kibana is ready before setting up alert configuration comments about why certain changes are potentially needed can be found here #897
* Change the verbiage from sms to email in Send for approval section and in introduction page (#868) * update the translations to use email instead of SMS * update changelog * update password copy to suggest a 12 character long passphrase (#867) * chore: update translations for password change * make the wording same as in the number validation * Allow pip to update externally-managed packages * add SSH_USER to deployment pipeline * add provision user key to legacy authorized_keys file * start release v1.4.1 * fix release link reference * Farajaland certificates updated (#891) * Farajaland certificates updated * Remove outdated README * Make occupation field usable for deceased (#888) * Make occupation field usable for deceased * Update CHANGELOG * Farajaland form updates (#805) * Add custom field in death form deceased section: Number of dependants * Replace national id input with type of id select and id number input * Uncomment spouse section in death form * Feedback amends * Make id field custom * Informant section amends * Fix typo * Remove unused import * Update death informant id fields conditionals to support nid integration * Reason for delayed registration custom form field (#894) * Custom field reason for late registration * Add field for death * Update metabase init db * Update metabase auth db * Revert "Update metabase auth db" This reverts commit 9fd5898. * Fix preceding field id for the marriage informant address (#906) * add age of deceased handlebar & marriage international address fix (#895) * add handlebar of age of deceased in death certificate * update place of marriage info in certificate for international location * update for ageOfDeceased handlebars in death form * ocrvs-6517 Hide fields when informantType not selected (#899) * ocrvs-6204 fix date of birth unknown checkbox conditional & update CHANGELOG (#898) * Consider detailsExist conditional when age unknown * Update CHANGELOG * Add search id placeholder copy * Fix death certificate birthdate field * Update handlebars (#905) * Update ID number handlebars * Remove duplicate location handlebars * Keep the placeOfBirth handlebar * bump release number * Fix typo in certificate handlebar names * remove authorized keys file * Clean up deprecated deployment code - Improves email logging - Removes `depends_on` configuration from docker compose files - Removes some deprecated deployment code around Elastalert config file formatting - Creates backup user on backup servers automatically - Verifys Kibana is ready before setting up alert configuration comments about why certain changes are potentially needed can be found here opencrvs/opencrvs-farajaland#897 * Fix regex of dob fields * update metabase init file so it's compatible with current metabase version * fix elastalert matching rule for ssh logins * Add a default fallback for variable enable_backups in development provision * Set default value false for enable_backups in development.yml * Apply default value in enable_backups implementation * Update tag name in action input * fix Elastalert2 matching rule for SSH logins * Sort staged translation files (#795) * Add a precommit hook for sorting translation files * Sort the translations files * Skip the description files Co-authored-by: Tameem Bin Haider <[email protected]> * Sort client, login, notification message files * Update messages with according to recent requirements * Update descriptions * send an email to verify successful deployment * polish the deployment email a bit * move copy files to csv files * remove validate translations file now that from the CSV it's easier to see what is missing and what is not * add back disappeared content handler * add unit test CI pipeline back * update sort-translations script * use unquoted format for CSVs so it matches the google sheets export format * import and export csv files in google sheets once so doing it later doesnt cause extra diff * add new copy entries to csv files, remove AVAILABLE_LANGUAGES_SELECT * update changelog * fix failing test * do not send displayName field anymore * add missing translations * revert package.json changes * remove jest types --------- Co-authored-by: Tameem Bin Haider <[email protected]> Co-authored-by: Riku Rouvila <[email protected]> Co-authored-by: Euan Millar <[email protected]> Co-authored-by: Tahmid Rahman <[email protected]> Co-authored-by: tahmidrahman-dsi <[email protected]> Co-authored-by: Md. Ashikul Alam <[email protected]> Co-authored-by: Md. Ashikul Alam <[email protected]> Co-authored-by: euanmillar <[email protected]>
@rikukissa I had a number of issues when I was testing setting up 4 new servers. We also experienced space filling up on servers with old kernel files perhaps related to ubuntu periodic updates. I will be re-tagging the 1.4.0 release with these fixes before the IPP training program commences.