|
@@ -32,7 +32,7 @@ Wednesday Dec 11:
|
|
|
Thursday Dec 12:
|
|
|
* Moose and Internal
|
|
|
* Reboots
|
|
|
-* All Customer PoP
|
|
|
+* All Customer PoP/LCP
|
|
|
* Patching (AM)
|
|
|
* Reboots (PM)
|
|
|
|
|
@@ -53,6 +53,9 @@ The customer and user impact will be during the reboots so they will be done in
|
|
|
## Detailed Steps (brad's patching)
|
|
|
|
|
|
### Day 1 (Wednesday), step 1 of 1: Moose and Internal infrastructure - Patching
|
|
|
+
|
|
|
+Patch TEST first! This helps find problems in TEST and potential problems in PROD.
|
|
|
+
|
|
|
Post to slack:
|
|
|
```
|
|
|
FYI, patching today.
|
|
@@ -73,11 +76,39 @@ salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' c
|
|
|
salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
|
|
|
#review packages that will be updated. some packages are versionlocked (Collectd, Splunk,etc.).
|
|
|
salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' cmd.run 'yum check-update'
|
|
|
-salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' pkg.upgrade
|
|
|
+#OpenVPN sometimes goes down with patching and needs a restart of the service. Let's patch the VPN after everthing else. I am not sure which package is causing the issue. Kernal? bind-utils?
|
|
|
+# Also, the phantom_repo pkg wants to upgrade, but we are not ready. Let's exclude that package to prevent errors.
|
|
|
+salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* or openvpn* )' pkg.upgrade exclude='phantom_repo'
|
|
|
+salt -C 'openvpn*' pkg.upgrade
|
|
|
+#Just to be sure, run it again to make sure nothing got missed.
|
|
|
+salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' pkg.upgrade exclude='phantom_repo'
|
|
|
```
|
|
|
|
|
|
> :warning: After upgrades check on Portal to make sure it is still up.
|
|
|
|
|
|
+Phantom error
|
|
|
+```
|
|
|
+phantom.msoc.defpoint.local:
|
|
|
+ ERROR: Problem encountered upgrading packages. Additional info follows:
|
|
|
+
|
|
|
+ changes:
|
|
|
+ ----------
|
|
|
+ result:
|
|
|
+ ----------
|
|
|
+ pid:
|
|
|
+ 40718
|
|
|
+ retcode:
|
|
|
+ 1
|
|
|
+ stderr:
|
|
|
+ Running scope as unit run-40718.scope.
|
|
|
+ Error in PREIN scriptlet in rpm package phantom_repo-4.9.39220-1.x86_64
|
|
|
+ phantom_repo-4.9.37880-1.x86_64 was supposed to be removed but is not!
|
|
|
+ stdout:
|
|
|
+ Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
|
|
|
+ Logging to /var/log/phantom/phantom_install_log
|
|
|
+ error: %pre(phantom_repo-4.9.39220-1.x86_64) scriptlet failed, exit status 7
|
|
|
+```
|
|
|
+
|
|
|
#### Error: `error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed`
|
|
|
|
|
|
`salt ma-* cmd.run 'pip uninstall urllib3 -y'`
|
|
@@ -113,32 +144,43 @@ RESOLUTION: Manually reboot the OS, this is most likely due to a kernal upgrade.
|
|
|
|
|
|
### Day 2 (Thursday), step 1 of 4: Reboot Internals
|
|
|
|
|
|
+Don't forget to reboot test.
|
|
|
+
|
|
|
Post to slack:
|
|
|
```
|
|
|
FYI, patching today.
|
|
|
* In about 15 minutes: Reboots of moose and internal systems, including the VPN.
|
|
|
-* Following that, patching (but not rebooting) of all customer PoPs.
|
|
|
-* Then this afternoon, reboots of those those PoPs.
|
|
|
+* Following that, patching (but not rebooting) of all customer PoPs/LCPs.
|
|
|
+* Then this afternoon, reboots of those those PoPs/LCPs.
|
|
|
```
|
|
|
|
|
|
Be sure to select ALL events in sensu for silencing not just the first 25.
|
|
|
Sensu -> Entities -> Sort (name) -> Select Entity and Silence. This will silence both keepalive and other checks.
|
|
|
Some silenced events will not unsilence and will need to be manually unsilenced.
|
|
|
-*IDEA! restart the sensu server and the vault-3 server first. this helps with the clearing of the silenced entities.*
|
|
|
+*IDEA! restart the sensu server and the vault-3 server first. This helps with the clearing of the silenced entities.*
|
|
|
|
|
|
```
|
|
|
salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping
|
|
|
date; salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' system.reboot
|
|
|
watch "salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping"
|
|
|
-salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* )' test.ping --out=txt
|
|
|
-date; salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* )' system.reboot
|
|
|
+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' test.ping --out=txt
|
|
|
+date; salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' system.reboot
|
|
|
#you will lose connectivity to openvpn and salt master
|
|
|
#log back in and verify they are back up
|
|
|
-watch "salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* )' cmd.run 'uptime' --out=txt"
|
|
|
+watch "salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' cmd.run 'uptime' --out=txt"
|
|
|
+#take care of the interconencts/resolvers one at a time.
|
|
|
+salt 'interconnect-0.pvt.xdr.accenturefederalcyber.com' test.ping
|
|
|
+salt 'interconnect-0.pvt.xdr.accenturefederalcyber.com' system.reboot
|
|
|
+salt 'interconnect-1.pvt.xdr.accenturefederalcyber.com' test.ping
|
|
|
+salt 'interconnect-1.pvt.xdr.accenturefederalcyber.com' system.reboot
|
|
|
+salt 'resolver-commercial.pvt.xdr.accenturefederalcyber.com' test.ping
|
|
|
+salt 'resolver-commercial.pvt.xdr.accenturefederalcyber.com' system.reboot
|
|
|
+salt 'resolver-govcloud.pvt.xdr.accenturefederalcyber.com' test.ping
|
|
|
+salt 'resolver-govcloud.pvt.xdr.accenturefederalcyber.com' system.reboot
|
|
|
```
|
|
|
|
|
|
I (Duane) did this a little different. Salt-master first, then openvpn, then everything but
|
|
|
-interconnects and resolvers.
|
|
|
+interconnects and resolvers. interconnects and resolvers reboot one at a time.
|
|
|
|
|
|
```
|
|
|
salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* or openvpn* or qcomp* or salt-master* or moose-splunk-indexer-* or interconnect* or resolver* )' cmd.run 'shutdown -r now'
|
|
@@ -168,7 +210,7 @@ Repeat the above patching steps for the additional indexers, waiting for 3 green
|
|
|
# Do the second indexer
|
|
|
salt -C 'moose-splunk-indexer-i-0b11e585de680b383.msoc.defpoint.local' test.ping --out=txt
|
|
|
date; salt -C 'moose-splunk-indexer-i-0b11e585de680b383.msoc.defpoint.local' system.reboot
|
|
|
-#indexers take a while date; salt -C 'moose-splunk-indexer-i-00ca1da87a2abcd56.msoc.defpoint.local' system.rebootto restart
|
|
|
+#indexers take a while to restart
|
|
|
watch "salt -C 'moose-splunk-indexer-i-0b11e585de680b383.msoc.defpoint.local' cmd.run 'uptime' --out=txt"
|
|
|
|
|
|
# Do the third indexer
|
|
@@ -380,13 +422,13 @@ afs-splunk-syslog-4: {u'location': u'San Antonio'}
|
|
|
|
|
|
salt -C 'afs-splunk-syslog* grains.item location
|
|
|
|
|
|
-salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'uptime'
|
|
|
-date; salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' system.reboot
|
|
|
-watch "salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' test.ping"
|
|
|
+salt -L 'afs-splunk-syslog-3, afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'uptime'
|
|
|
+date; salt -L 'afs-splunk-syslog-3, afs-splunk-syslog-5, afs-splunk-syslog-7' system.reboot
|
|
|
+watch "salt -L 'afs-splunk-syslog-3, afs-splunk-syslog-5, afs-splunk-syslog-7' test.ping"
|
|
|
|
|
|
-salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'uptime'
|
|
|
-date; salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' system.reboot
|
|
|
-watch "salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' test.ping"
|
|
|
+salt -L 'afs-splunk-syslog-4, afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'uptime'
|
|
|
+date; salt -L 'afs-splunk-syslog-4, afs-splunk-syslog-6, afs-splunk-syslog-8' system.reboot
|
|
|
+watch "salt -L 'afs-splunk-syslog-4, afs-splunk-syslog-6, afs-splunk-syslog-8' test.ping"
|
|
|
```
|
|
|
|
|
|
####verify logs are flowing
|