5 years ago · 400701d390
--- a/Notes.md
+++ b/Notes.md
@@ -40,3 +40,27 @@ Don't just terminate the instance, run `terraform destroy` in the appropriate fo
 
				  7. Email Asha (Compliance/ISSO) and inform her that the servers can be removed from the FedRAMP inventory
			
 
				 
			
 
				 
			
 
				+Remove IPs SAF: 12.42.184.208
			
 
				+
			
 
				+## Remove the Customer from the Code
			
 
				+
			
 
				+Remove references of the customer from these places:
			
 
				+
			
 
				+ 1. Atlantis configs ( atlantis.yaml )
			
 
				+ 2. Splunk Monitoring Console ( salt/fileroots/splunk/monitoring_console/init.sls  - salt/fileroots/splunk/search_head/init.sls )
			
 
				+ 3. Salt master configs ( default_acl.conf )
			
 
				+ 4. Salt Splunk files (salt/fileroots/splunk/files/saf_variables.jinja)
			
 
				+ 5. Salt top.sls and pillar/top.sls ( salt/fileroots/top.sls - salt/pillar/top.sls )
			
 
				+ 6. Salt global_variables.sls, os_settings.sls (salt/pillar/global_variables.sls - salt/pillar/os_settings.sls )
			
 
				+ 7. Salt Customer specific Pillars ( salt/pillar/saf_pop_settings.sls - salt/pillar/saf_variables.sls )
			
 
				+ 8. Salt gitfs pillar ( salt/pillar/salt_master.sls )
			
 
				+ 9. Terraform salt provision references ( terraform/02-msoc_vpc/cloud-init/provision_salt_master.sh )
			
 
				+ 10. Terraform C&C IP whitelisting for salt master and reposerver ( terraform/02-msoc_vpc/security-groups.tf )
			
 
				+ 11. Terraform customer folder ( terraform/102-saf/ )
			
 
				+ 12. Terraform common variables ( terraform/common/variables.tf )
			
 
				+ 
			
 
				+Update salt master
			
 
				+`salt salt* state.sls salt_master`
			
 
				+
			
 
				+## Report the Decommissioned Hosts to the AFCC Team
			
 
				+
			
--- a/Notes.md
+++ b/Notes.md
@@ -2,12 +2,20 @@
 
				 
			
 
				 Fluentd is part of Treasure Data. So the service name is td-agent. 
			
 
				 
			
 
				-systemctl status td-agent
			
 
				+`systemctl status td-agent`
			
 
				 
			
 
				 Fluentd is installed on afs-splunk-syslog-1. Fluentd will not start unless the directories specifid in the config file are created. 
			
 
				 
			
 
				+```
			
 
				 salt -L 'afs-splunk-syslog-1' cmd.run 'ls -larth /opt/syslog-ng/'
			
 
				 salt -L 'afs-splunk-syslog-1' cmd.run 'mkdir /opt/syslog-ng/zscaler_firewall/'
			
 
				 salt -L 'afs-splunk-syslog-1' cmd.run 'mkdir /opt/syslog-ng/zscaler_dns/'
			
 
				 salt -L 'afs-splunk-syslog-1' cmd.run 'chown td-agent:td-agent /opt/syslog-ng/zscaler_firewall/'
			
 
				 salt -L 'afs-splunk-syslog-1' cmd.run 'chown td-agent:td-agent /opt/syslog-ng/zscaler_dns/'
			
 
				+```
			
 
				+
			
 
				+Folder structure changed!
			
 
				+```
			
 
				+salt -L 'afs-splunk-syslog-1' cmd.run 'tail /opt/syslog-ng/zscaler/web/log/2020-05-26/zscaler_web.2020-05-26T2020_0.log'
			
 
				+
			
 
				+```
			
--- a/Notes.md
+++ b/Notes.md
@@ -12,3 +12,11 @@ SDC drive
 
				 sorry needs to be xfs
			
 
				 1:46
			
 
				 salt 'afs*syslog-[5678]*' cmd.run 'mkfs -t xfs -f  /dev/vg_syslog/lv_syslog'
			
 
				+
			
 
				+
			
 
				+## Splunk on POP Nodes
			
 
				+5/27/2020
			
 
				+
			
 
				+
			
 
				+The POP nodes have Splunk UF on them that should have outputs.conf that point to Moose external LB. 
			
 
				+
			
--- a/Notes.md
+++ b/Notes.md
@@ -3,7 +3,7 @@
 
				 ## Timeline
			
 
				 * Asha likes to submit her FedRAMP packet before about the 20th, so try to get it done before that.
			
 
				 * Send email ~ 1 week before.
			
 
				-* Give 15 minute warning in #mdr-patching, #mdr-content, etc before patching
			
 
				+* Give 15 minute warning in these slack channels #mdr-patching, #mdr-content, etc before patching
			
 
				 
			
 
				 # Patching Process
			
 
				 
			
@@ -19,108 +19,46 @@ SUBJECT: December Patching
 
				 
			
 
				 It is time for monthly patching again. Patching is going to occur during business hours within the next week or two.  Everything - including Customer POPs - needs patching.  We will be doing the servers in 2 waves.
			
 
				  
			
 
				-Wave 1 is hot patching of all systems.
			
 
				- 
			
 
				-Wave 2 will be the needed reboots; as this is where we see the customer impact.
			
 
				- 
			
 
				 For real-time patching announcements, join the slack channel #mdr-patching. Announcements will be posted in that channel on what is going down and when.
			
 
				  
			
 
				 Here is the proposed patching schedule:
			
 
				 
			
 
				 Wednesday Dec 11:
			
 
				 * Moose and Internal infrastructure
			
 
				-  * Wave 1
			
 
				+  * Patching
			
 
				  
			
 
				 Thursday Dec 12:
			
 
				 * Moose and Internal
			
 
				-  * Wave 2
			
 
				+  * Reboots
			
 
				 * All Customer PoP
			
 
				-  * Wave 1 (AM)
			
 
				-  * Wave 2 (PM)
			
 
				+  * Patching (AM)
			
 
				+  * Reboots (PM)
			
 
				 
			
 
				 Monday Dec 16:
			
 
				 * All Customer XDR Cloud
			
 
				-  * Wave 1
			
 
				+  * Patching
			
 
				 * All Search heads
			
 
				-  * Wave 2 (PM)
			
 
				+  * Reboots (PM)
			
 
				 
			
 
				 Tuesday Dec 17:
			
 
				 * All Remaining XDR Cloud
			
 
				-  * Wave 2 (AM)
			
 
				+  * Reboots (AM)
			
 
				  
			
 
				 The customer and user impact will be during the reboots so they will be done in batches to reduce our total downtime is less.
			
 
				 
			
 
				 ***
			
 
				 
			
 
				 
			
 
				-```
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-#restarting the indexers one at a time (one from each group). Use the CM to see if the indexer comes back up properly. 
			
 
				-salt -C ' ( *moose* or *saf* ) and *indexer-1*' cmd.run 'shutdown -r now'
			
 
				-#check to ensure the hot volume is mounted /opt/splunkdata/hot
			
 
				-salt -C '( *moose* or *saf* ) and *indexer-1*' cmd.run 'df -h'
			
 
				-
			
 
				-#WAIT FOR 3 checks in the CM before restarting the next indexer. 
			
 
				-
			
 
				-#repeat for indexer 2
			
 
				-salt -C ' ( *moose* or *saf* ) and *indexer-2*' cmd.run 'shutdown -r now'
			
 
				-#check to ensure the hot volume is mounted /opt/splunkdata/hot
			
 
				-salt -C ' ( *moose* or *saf* ) and *indexer-2*' cmd.run 'df -h'
			
 
				-
			
 
				-#WAIT FOR 3 checks in the CM before restarting the next indexer.
			
 
				-
			
 
				-#repeat for indexer 3
			
 
				-salt -C ' ( *moose* or *saf* ) and *indexer-3*' cmd.run 'shutdown -r now'
			
 
				-#check to ensure the hot volume is mounted /opt/splunkdata/hot
			
 
				-salt -C ' ( *moose* or *saf* ) and *indexer-3*' cmd.run 'df -h'
			
 
				-
			
 
				-
			
 
				-IF/WHEN and indexer doesn't come back up follow these steps:
			
 
				-in AWS grab the instance id. 
			
 
				-
			
 
				-run the MDR/get-console.sh
			
 
				-look for "Please enter passphrase for disk splunkhot"
			
 
				-
			
 
				-in AWS console stop instance (which will remove ephemeral splunk data) then start it. 
			
 
				-Then ensure the /opt/splunkdata/hot exists.
			
 
				-if it doesn't then manually run the cloudinit boot hook. 
			
 
				-sh /var/lib/cloud/instance/boothooks/part-002
			
 
				-
			
 
				-ensure the hot folder is owned by splunk:splunk
			
 
				-it will be waiting for the luks.key
			
 
				-systemctl deamon-reload
			
 
				-systemctl restart systemd-cryptsetup@splunkhot
			
 
				-It is waiting for command prompt, when you restart the service it picks up the key from a file. Systemd sees the crypt setup service as a dependency for the splunk service. 
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-#restart indexers (one at a time; wait for 3 green checkmarks in Cluster Master)
			
 
				-salt -C 'nga*indexer-1*' test.ping
			
 
				-salt -C 'nga*indexer-1*' cmd.run 'shutdown -r now'
			
 
				-
			
 
				-#Repeat for indexer-2 and indexer-3
			
 
				-
			
 
				-#Ensure all have been restarted. Then done with NGA
			
 
				-salt -C '*nga*' cmd.run 'uptime'
			
 
				-
			
 
				-```
			
 
				-
			
 
				 ---
			
 
				 
			
 
				 
			
 
				 
			
 
				 
			
 
				-# Brad's Actual Patching
			
 
				+# Brad's Patching
			
 
				 
			
 
				 > :warning: **See if Github Has any updates!** Coordinate with Duane on Github Patching.
			
 
				 
			
 
				-Starting with moose and internal infra Wave 1. Check disk space for potential issues. 
			
 
				+Starting with moose and internal infra patching. Check disk space for potential issues. 
			
 
				 ```
			
 
				 salt -C '* not ( afs* or saf* or nga* )' test.ping --out=txt
			
 
				 salt -C '* not ( afs* or saf* or nga* )' cmd.run 'df -h /boot'  
			
@@ -132,7 +70,7 @@ salt -C '* not ( afs* or saf* or nga* )' cmd.run 'yum check-update'
 
				 salt -C '* not ( afs* or saf* or nga* )' pkg.upgrade
			
 
				 ```
			
 
				 This error: error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed
			
 
				-pip uninstall urllib3
			
 
				+salt ma-* cmd.run 'pip uninstall urllib3 -y'
			
 
				 
			
 
				 This error is caused by the versionlock on the package. Use this to view the list
			
 
				 yum versionlock list
			
@@ -153,8 +91,12 @@ package-cleanup --oldkernels --count=1
 
				 
			
 
				 If VPN server stops working, try a stop and start of the vpn server. The private IP will probably change. 
			
 
				 
			
 
				+ISSUE: salt-minion doesn't come back and has this error
			
 
				+`/usr/lib/dracut/modules.d/90kernel-modules/module-setup.sh: line 16: /lib/modules/3.10.0-957.21.3.el7.x86_64///lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/sound/drivers/mpu401/snd-mpu401.ko.xz: No such file or directory`
			
 
				 
			
 
				-# Wave 2 Internals 
			
 
				+RESOLUTION: Manually reboot the OS, this is most likely due to a kernal upgrade.
			
 
				+
			
 
				+# Reboots Internals 
			
 
				 
			
 
				 Be sure to select ALL events in sensu for silencing not just the first 25. 
			
 
				 Sensu -> Entities -> Sort (name) -> Select Entity and Silence. This will silence both keepalive and other checks. 
			
@@ -165,33 +107,43 @@ Some silenced events will still trigger. Not sure why. The keepalive still trigg
 
				 ```
			
 
				 salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping
			
 
				 salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' cmd.run 'shutdown -r now'
			
 
				-salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* or vault-3* or sensu* )' test.ping --out=txt
			
 
				-salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* or vault-3* or sensu* )' cmd.run 'shutdown -r now'
			
 
				+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or la-* or vault-3* or sensu* )' test.ping --out=txt
			
 
				+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or la-* or vault-3* or sensu* )' cmd.run 'shutdown -r now'
			
 
				 #you will lose connectivity to openvpn and salt master
			
 
				 #log back in and verify they are back up
			
 
				 salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* )' cmd.run 'uptime' --out=txt
			
 
				 ```
			
 
				 
			
 
				-# Wave 2 Moose
			
 
				+# Reboots Moose
			
 
				 
			
 
				 ```
			
 
				 salt -C 'moose-splunk-indexer*' test.ping --out=txt
			
 
				 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'shutdown -r now'
			
 
				 #indexers take a while to restart
			
 
				+watch "salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' test.ping"
			
 
				 ping moose-splunk-indexer-1.msoc.defpoint.local
			
 
				 ```
			
 
				 #WAIT FOR SPLUNK CLUSTER TO HAVE 3 CHECKMARKS
			
 
				 indexer2 is not coming back up...look at screenshot in aws... see this: Probing EDD (edd=off to disable)... ok
			
 
				 look at system log in AWS see this:  Please enter passphrase for disk splunkhot!:
			
 
				 
			
 
				+IF/WHEN and indexer doesn't come back up follow these steps:
			
 
				+in AWS grab the instance id. 
			
 
				+
			
 
				+run the MDR/get-console.sh ( Duane's script for pulling the system log)
			
 
				+look for "Please enter passphrase for disk splunkhot"
			
 
				+
			
 
				+
			
 
				 In AWS console stop instance (which will remove ephemeral splunk data) then start it. 
			
 
				 Then ensure the /opt/splunkdata/hot exists.
			
 
				 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'df -h'
			
 
				 IF the MOUNT for /opt/splunkdata/hot DOESN"T EXISTS STOP SPLUNK! Splunk will write to the wrong volume. 
			
 
				 before mounting the new volume clear out the wrong /opt/splunkdata/
			
 
				+rm -rf /opt/splunkdata/hot/*
			
 
				+
			
 
				 
			
 
				 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'systemctl stop splunk'
			
 
				-Ensure the /opt/splunkdata doesn't already exist, before the boothook. (theory that this causes the issue) 
			
 
				+Ensure the /opt/splunkdata doesn't already exist, before the boothook. 
			
 
				 ssh prod-moose-splunk-indexer-1
			
 
				 if it doesn't then manually run the cloudinit boot hook. 
			
 
				 sh /var/lib/cloud/instance/boothooks/part-002
			
@@ -201,7 +153,7 @@ ensure the hot folder is owned by splunk:splunk
 
				 ll /opt/splunkdata/
			
 
				 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'ls -larth /opt/splunkdata'
			
 
				 chown -R splunk: /opt/splunkdata/
			
 
				-salt -C 'nga-splunk-indexer-2.msoc.defpoint.local' cmd.run 'chown -R splunk: /opt/splunkdata/'
			
 
				+salt -C '' cmd.run 'chown -R splunk: /opt/splunkdata/'
			
 
				 it will be waiting for the luks.key
			
 
				 systemctl daemon-reload
			
 
				 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'systemctl daemon-reload'
			
@@ -222,10 +174,10 @@ check the servers again to ensure all of them have rebooted.
 
				 salt -C ''moose-splunk-indexer*'' cmd.run 'uptime' --out=txt | sort
			
 
				 
			
 
				 Ensure all Moose and Internal have been rebooted
			
 
				-salt -C '* not ( afs* or saf* or nga* )' cmd.run 'uptime' --out=txt | sort
			
 
				+salt -C '* not ( afs* or nga* or la-* or ma-* )' cmd.run 'uptime' --out=txt | sort
			
 
				 
			
 
				 
			
 
				-# Wave 1 POPs
			
 
				+# Patching POPs
			
 
				 
			
 
				 ```
			
 
				 salt -C '* not *.local' test.ping --out=txt
			
@@ -240,7 +192,19 @@ salt -C '* not *.local' pkg.upgrade disablerepo=msoc-repo
 
				 salt -C '* not *.local' pkg.upgrade
			
 
				 ```
			
 
				 
			
 
				-# Wave 2 POPs
			
 
				+Error on afs-splunk-ds-3: error: cannot open Packages database in /var/lib/rpm
			
 
				+Solution: 
			
 
				+
			
 
				+```
			
 
				+mkdir /root/backups.rpm/
			
 
				+cp -avr /var/lib/rpm/ /root/backups.rpm/
			
 
				+rm -f /var/lib/rpm/__db*
			
 
				+db_verify /var/lib/rpm/Packages
			
 
				+rpm --rebuilddb
			
 
				+yum clean all
			
 
				+```
			
 
				+
			
 
				+# Reboots POPs
			
 
				 
			
 
				 DO NOT restart all POP at the same time
			
 
				 
			
@@ -252,12 +216,14 @@ salt -C '*syslog-1* not *.local' cmd.run 'ps -ef | grep syslog-ng | grep -v grep
 
				 #look for /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
			
 
				 ```
			
 
				 
			
 
				-SAF will need the setenforce run
			
 
				+if syslog-ng doesn't start, it might need the setenforce 0 command run ( left here for legacy reasons )
			
 
				 ```
			
 
				 salt saf-splunk-syslog-1 cmd.run 'setenforce 0'
			
 
				 salt saf-splunk-syslog-1 cmd.run 'systemctl stop rsyslog'
			
 
				 salt saf-splunk-syslog-1 cmd.run 'systemctl start syslog-ng'
			
 
				 
			
 
				+watch "salt -C '*syslog-1* not *.local' test.ping"
			
 
				+
			
 
				 salt -C '*syslog-2* not *.local' cmd.run 'uptime'
			
 
				 salt -C '*syslog-2* not *.local' cmd.run 'shutdown -r now'
			
 
				 salt -C '*syslog-2* not *.local' cmd.run 'ps -ef | grep syslog-ng | grep -v grep'
			
@@ -274,6 +240,7 @@ repeat for syslog-5, syslog-6, syslog-7, and syslog-8
 
				 (might be able to reboot some of these at the same time. if they are if different locations. check the location grain on them.)
			
 
				 grains.item location
			
 
				 
			
 
				+```
			
 
				 afs-splunk-syslog-8: {u'location': u'az-east-us-2'}
			
 
				 afs-splunk-syslog-6: {u'location': u'az-central-us'}
			
 
				 
			
@@ -285,15 +252,14 @@ salt -C 'afs-splunk-syslog*  grains.item location
 
				 
			
 
				 salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'uptime'
			
 
				 salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'shutdown -r now'
			
 
				+watch "salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' test.ping"
			
 
				 
			
 
				 salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'uptime'
			
 
				 salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'shutdown -r now'
			
 
				+watch "salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' test.ping"
			
 
				+```
			
 
				 
			
 
				 #verify logs are flowing
			
 
				-https://saf-splunk-sh.msoc.defpoint.local:8000/en-US/app/search/search
			
 
				-ddps03.corp.smartandfinal.com
			
 
				-index=* source=/opt/syslog-ng/* host=ddps* earliest=-15m | stats  count by host
			
 
				-
			
 
				 https://afs-splunk-sh.msoc.defpoint.local:8000/en-US/app/search/search
			
 
				 afssplhf103.us.accenturefederal.com
			
 
				 index=* source=/opt/syslog-ng/* host=afs* earliest=-15m | stats  count by host
			
@@ -304,6 +270,8 @@ index=network sourcetype="citrix:netscaler:syslog" earliest=-15m
 
				 index=* source=/opt/syslog-ng/* host=aws* earliest=-60m | stats count by host
			
 
				 
			
 
				 POP ds (could these be restarted at the same time? Or in 2 batches?)
			
 
				+
			
 
				+```
			
 
				 salt -C '*splunk-ds-1* not *.local' cmd.run 'uptime'
			
 
				 salt -C '*splunk-ds-1* not *.local' cmd.run 'shutdown -r now'
			
 
				 
			
@@ -312,55 +280,78 @@ salt -C '*splunk-ds-2* not *.local' cmd.run 'shutdown -r now'
 
				 
			
 
				 salt afs-splunk-ds-[2,3,4] cmd.run 'uptime'
			
 
				 salt afs-splunk-ds-[2,3,4] cmd.run 'shutdown -r now'
			
 
				+```
			
 
				 
			
 
				 Don't forget ds-3 and ds-4
			
 
				-
			
 
				+```
			
 
				+#try reboot at the same time
			
 
				+salt '*splunk*ds*' cmd.run 'uptime'
			
 
				+salt '*splunk*ds*' cmd.run 'shutdown -r now'
			
 
				+watch "salt '*splunk*ds*' test.ping"
			
 
				 salt '*splunk-ds*' cmd.run 'systemctl status splunk'
			
 
				+```
			
 
				 
			
 
				 
			
 
				-POP dcn
			
 
				-salt -C '*splunk-dcn-1* not *.local' cmd.run 'uptime'
			
 
				-salt -C '*splunk-dcn-1* not *.local' cmd.run 'shutdown -r now'
			
 
				-
			
 
				 Did you get all of them?
			
 
				-salt -C ' * not *local ' cmd.run 'uptime'
			
 
				 
			
 
				-###
			
 
				-# Customer Slices Wave 1
			
 
				+```
			
 
				+salt -C ' * not *local ' cmd.run 'uptime' --out=txt | sort
			
 
				+```
			
 
				+
			
 
				 
			
 
				-salt -C 'afs*local or saf*local or nga*local' test.ping --out=txt
			
 
				-salt -C 'afs*local or saf*local or nga*local' cmd.run 'uptime'
			
 
				-salt -C 'afs*local or saf*local or nga*local' cmd.run 'df -h'
			
 
				-salt -C 'afs*local or saf*local or nga*local' pkg.upgrade
			
 
				+# Customer Slices Wave 1
			
 
				 
			
 
				+```
			
 
				+salt -C 'afs*local or ma-*local or la-*local or nga*local' test.ping --out=txt
			
 
				+salt -C 'afs*local or ma-*local or la-*local or nga*local' cmd.run 'uptime'
			
 
				+salt -C 'afs*local or ma-*local or la-*local or nga*local' cmd.run 'df -h'
			
 
				+salt -C 'afs*local or ma-*local or la-*local or nga*local' pkg.upgrade
			
 
				+```
			
 
				 epel repo is enabled on afs-splunk-hf ( I don't know why)
			
 
				 had to run this to avoid issue with collectd package on msoc-repo
			
 
				 
			
 
				-yum update --disablerepo epel
			
 
				+`yum update --disablerepo epel`
			
 
				+
			
 
				+
			
 
				+
			
 
				+# Customer Slices Search Heads Only Wave 2
			
 
				 
			
 
				 Silence Sensu first!
			
 
				-Customer Slices Search Heads Only Wave 2
			
 
				-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' test.ping --out=txt
			
 
				-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'df -h'
			
 
				-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'shutdown -r now'
			
 
				-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'uptime'
			
 
				 
			
 
				-###
			
 
				+```
			
 
				+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' test.ping --out=txt
			
 
				+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'df -h'
			
 
				+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'shutdown -r now'
			
 
				+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'uptime'
			
 
				+```
			
 
				+
			
 
				+
			
 
				 # Customer Slices CMs Wave 2
			
 
				 
			
 
				 Silence Sensu first!
			
 
				+
			
 
				+```
			
 
				 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' test.ping --out=txt
			
 
				 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'df -h'
			
 
				 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'shutdown -r now'
			
 
				+watch "salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' test.ping --out=txt"
			
 
				 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'systemctl status splunk'
			
 
				 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'uptime'
			
 
				+```
			
 
				 
			
 
				+
			
 
				+May 27 17:08:57 la-c19-splunk-cm.msoc.defpoint.local splunk[3840]: /etc/rc.d/init.d/splunk: line 13: ulimit: open files: cannot modify limit: Invalid argument
			
 
				 afs-splunk-hf has a hard time restarting. Might need to stop then start the instance. 
			
 
				 
			
 
				-reboot indexers 1 at a time (AFS cluster gets backed up when an indexer is rebooted)
			
 
				+reboot indexers 1 at a time (AFS cluster gets backed up when an indexer is rebooted) How to replicate this with ASGs? 
			
 
				+
			
 
				+```
			
 
				+salt -C '*splunk-indexer-1* and G@ec2:placement:availability_zone:us-east-1a not moose*' test.ping
			
 
				+salt -C '*splunk-indexer-1* not moose*' test.ping --out=txt
			
 
				 salt -C '*splunk-indexer-1* not moose*' test.ping --out=txt
			
 
				 salt -C '*splunk-indexer-1* not moose*' cmd.run 'df -h'
			
 
				 salt -C '*splunk-indexer-1* not moose*' cmd.run 'shutdown -r now'
			
 
				+```
			
 
				 
			
 
				 Wait for 3 green check marks
			
 
				 #repeat for indexers 2 & 3
			
@@ -373,4 +364,4 @@ salt -L 'afs-splunk-indexer-3.msoc.defpoint.local,saf-splunk-indexer-3.msoc.defp
 
				 NGA had a hard time getting 3 checkmarks The CM was waiting on stuck buckets. Force rolled the buckets to get green checkmarks.
			
 
				 
			
 
				 salt -C '* not *.local' cmd.run 'uptime | grep days'
			
 
				-***MAKE SURE the Sensu checks are not silenced. ***
			
 
				+> :warning: ***MAKE SURE the Sensu checks are not silenced. ***
			
--- a/Notes.md
+++ b/Notes.md
@@ -32,3 +32,19 @@ yum install savinstpkg
 
				 Defined in salt/fileroots/splunk/new_install.sls
			
 
				 /etc/yum.repos.d/splunk.repo
			
 
				 http://reposerver.msoc.defpoint.local/splunk
			
 
				+
			
 
				+New Splunk Version
			
 
				+Splunk 7.2 needs to be created for PROD moose
			
 
				+
			
 
				+```
			
 
				+cd /var/www/html/splunk
			
 
				+mkdir 7.2
			
 
				+chown -R apache: .
			
 
				+cd 7.2
			
 
				+createrepo `pwd`
			
 
				+wget -O splunk-7.2.5.1-962d9a8e1586-linux-2.6-x86_64.rpm 'https://www.splunk.com/page/download_track?file=7.2.5.1/linux/splunk-7.2.5.1-962d9a8e1586-linux-2.6-x86_64.rpm&ac=&wget=true&name=wget&platform=Linux&architecture=x86_64&version=7.2.5.1&product=splunk&typed=release'
			
 
				+chown -R apache: .
			
 
				+cd /var/www/html/splunk/7.2
			
 
				+createrepo `pwd`
			
 
				+restorecon -R /var/www/html/splunk
			
 
				+```
			
--- a/Notes.md
+++ b/Notes.md
@@ -1,5 +1,8 @@
 
				-For smart and final customer
			
 
				+# Splunk MSCAS Notes.md
			
 
				 
			
 
				+
			
 
				+References:
			
 
				+https://github.mdr.defpoint.com/MDR-Content/mdr-content/wiki/CS0009:Search:MSOC---MS-CAS---Alert
			
 
				 https://jira.mdr.defpoint.com/browse/MSOCI-890
			
 
				 https://docs.microsoft.com/en-us/cloud-app-security/siem
			
 
				 https://splunkbase.splunk.com/app/3110/
			
--- a/Notes.md
+++ b/Notes.md
@@ -37,6 +37,8 @@ TEST SPLUNK CM admin password
 
				 admin
			
 
				 6VB^8V3CFjbaiZ4Q#hLjNW3a1
			
 
				 
			
 
				+TEST SPLUNK indexer-1 admin password
			
 
				+6VB8V3CFjbaiZ4QhLjNW3a1
			
 
				 
			
 
				     
			
 
				     
			
--- a/Notes.md
+++ b/Notes.md
@@ -218,9 +218,24 @@ in outputs.conf for splunk nodes
 
				 1. stand up new templates and ASGs
			
 
				 2. launch new ASG instances
			
 
				 3. Ensure three green checkmarks in CM
			
 
				-4. change salt master to new outputs
			
 
				-5. 
			
 
				+4. change salt master to new outputs (make sure it is working)
			
 
				+4.1 ensure they can connect first 
			
 
				+salt '*.local' network.connect moose-splunk-cm.msoc.defpoint.local 8089
			
 
				+5. Update all minions to IDXC discovery
			
 
				+6. silence sensu 
			
 
				+6. manually take the non-ASG indexers offline but don't stop instances
			
 
				+6.1 /opt/splunk/bin/splunk offline --enforce-counts
			
 
				+6.1.1 no users? create etc/system/local/user-seed.conf then restart ( https://answers.splunk.com/answers/834/how-to-reset-the-admin-password.html ) 
			
 
				+6.2 disable the service to prevent it startup back up
			
 
				+7. use tf to destroy the instances then remove the code from TF.
			
 
				 
			
 
				+salt-run survey.diff '*.local' cp.get_file_str file:////opt/splunkforwarder/etc/apps/moose_outputs/default/outputs.conf
			
 
				+salt sensu* cmd.run 'tail -50 /opt/splunkforwarder/var/log/splunk/splunkd.log'
			
 
				+
			
 
				+salt phantom* cmd.run 'tail -200 /opt/splunkforwarder/var/log/splunk/splunkd.log | grep TcpOutputProc'
			
 
				+
			
 
				+Check in Splunk for indexerdiscovery errors
			
 
				+index=_internal sourcetype="splunkd" source="/opt/splunkforwarder/var/log/splunk/splunkd.log" component=IndexerDiscoveryHeartbeatThread
			
 
				 
			
 
				 terraform apply -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg
			
 
				 
			
@@ -229,3 +244,15 @@ moose-splunk-indexer-i-048bf97164bd7401c.msoc.defpoint.local
 
				 moose-splunk-indexer-i-063613603d0afc287.msoc.defpoint.local
			
 
				 moose-splunk-indexer-i-0e1cde750a6408302.msoc.defpoint.local
			
 
				 
			
 
				+
			
 
				+PROD
			
 
				+
			
 
				+http://reposerver.msoc.defpoint.local/splunk/7.2/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
			
 
				+After salt highstate, splunkuf is running, but splunk is not.
			
 
				+salt moose-splunk-indexer-i* cmd.run 'systemctl stop splunkuf'
			
 
				+"/opt/splunk/etc/slave-apps/TA-Frozen-S3/bin/coldToFrozenS3.py"" ; no such script could be found on the filesystem." even after multiple attempts, Exiting..
			
 
				+^^^ this is a bug in splunk!
			
 
				+
			
 
				+added python3 to ALL servers including indexers and moved coldToFrozenS3.py to /usr/local/bin to fix the issue. 
			
 
				+
			
 
				+