Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[foreman] Size limit production.log and ssl logs #3821

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TurboTurtle
Copy link
Member

Adds a (high) default sizelimit to production.log and foreman *ssl.log collections to avoid potentially causing out of space issues on systems with large active logs.


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

Adds a (high) default sizelimit to `production.log` and foreman
`*ssl.log` collections to avoid potentially causing out of space issues
on systems with large active logs.

Signed-off-by: Jake Hunsaker <[email protected]>
@TurboTurtle
Copy link
Member Author

@pmoravec let's chat about this before merging anywhere.

I understand this log is very verbose and big, hence the current "collect it all" posture. However, I just ran into a situation where collecting an sos report on our (thankfully, lab) Satellite caused an out-of-space issue due to our production.log being over 1.5GB and having a relatively small /var/tmp partition per policy.

So, I'd like to add at least some kind of bounding here, but I get that it needs to be unusually (for sos) high for a default limit for it to be of any use to support engineers. 500 was a bit of an off the hip number, so happy to discuss raising or lowering it.

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3821
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@pmoravec
Copy link
Contributor

pmoravec commented Oct 25, 2024

Cc @pafernanr who knows more about max sizes of the logfile(s).

I hit some customers who disabled logrotation and had either of the two files really / ridiculously huge. So some limit makes sense. Let me do some checks internally, we should reply in a week or two.

Preliminary check on some sosreports: the biggest production.log I recently seen has 10GB (at it is for 10 hours of activity!) and the biggest access log has 3.7GB. 500MB limit would truncate 6 out of 67 production.log files (this truncation hurts more often, since production.log is almost everytime daily rotated so truncation means less than day logs), while the limit would truncate 9 out of 68 access logs (smaller problem).

My stats are based on few tens of sosreports only. But on the scale from "get all data required for investigation, whatever size it is" via "get reasonable data of reasonable size" to "get some data of sosreport size suitable every time", the compromise of 500MB seems a good value to me.

Raw data of filesizes in bytes:

$ ll */sosreport*/var/log/foreman/production.log | awk '{ print $5 }' | sort -n
15545
150860
158997
404913
472546
538747
558705
584792
647789
766507
771690
854775
1026267
1042440
1163377
1463763
1950660
2527874
2880178
3210521
4429335
4511534
5395693
6356248
6415146
8228387
8684596
9203619
10422473
11170127
12415144
15182713
15889866
20294540
20696610
26447370
27708533
35004228
36072334
43214184
46666040
61781077
107605531
116861014
131967414
147576554
186647187
189280747
189313440
198413862
204541033
207606029
214804387
224681091
225788814
226618006
242879171
281252881
287829664
314523313
337337272
479318164
530864677
545563029
724712833
824840643
838362665
10150455078
$

and:

$ ll */sosreport*/var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $5 }' | sort -n 
46158
156392
275447
354468
363823
519761
679415
977624
1442534
1560073
1968304
2601579
3861028
5855124
6104682
6269225
6350928
6449916
6854737
8718445
10040019
11541886
12219313
16557429
21210852
28173341
32324777
33207305
33821491
39270789
39530597
41192814
42342074
48142030
49109106
53109460
54733469
57690891
72205005
77677806
80752170
85829658
90214183
108626730
118057318
131128659
133014592
152056328
158538389
160007922
214987101
217446120
248349933
255144006
289960635
358935641
374190153
412529895
604978281
676826270
743380659
765114177
898115047
989233134
1773910333
2414277213
3737595478
$

@pafernanr
Copy link
Contributor

Hello,

I think I will be able to provide valuable stats for production.log. But I only have access to a "small" number of foreman-ssl_access_ssl.log. Let me sometime to get it and I will share the output with you.

Regards,

@pafernanr
Copy link
Contributor

pafernanr commented Oct 25, 2024

Find below inittial stats for over 200 sos reports (included @pmoravec and other colleagues)

access.log (100Mb increment)

Mb(>) count
0 170
100 18
200 10
300 8
400 5
500 3
600 2
700 4
800 1
900 2
1100 1
1600 1
1700 1
2100 1
2400 1
2600 1
3700 1
9400 1

production.log (100Mb increment)

Mb(>) count
0 179
100 23
200 14
300 3
400 2
500 4
700 1
800 2
1000 1
1200 1
2100 1
2600 1
9900 1

I'm still trying to get more accurate stats for production.log. Let's see what can be done.

@TurboTurtle
Copy link
Member Author

@pmoravec @pafernanr - any preferences based on that data? It looks like 500 would get the majority, but I am also open to raising it a bit further.

@pafernanr
Copy link
Contributor

Hello @TurboTurtle,

Setting it to 500 is a good choice IMO. The extra time/resource consumption should be asumible for those big-foreman-infrastructure users, and that file size should contain most than enough time range to look at any "recent" issue.

@pmoravec ¿?

@pmoravec
Copy link
Contributor

pmoravec commented Nov 4, 2024

+1 for 500, seems as the best compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants