Browse Source

splunk patching plus more

Brad Poulton 5 years ago
parent
commit
400701d390
8 changed files with 191 additions and 112 deletions
  1. 24 0
      Customer Decommision Notes.md
  2. 9 1
      Fluentd Notes.md
  3. 8 0
      POP Node Notes.md
  4. 99 108
      Patching Notes.md
  5. 16 0
      Reposerver Notes.md
  6. 4 1
      Splunk MSCAS Notes.md
  7. 2 0
      Splunk Notes.md
  8. 29 2
      Terraform Splunk ASG Notes.md

+ 24 - 0
Customer Decommision Notes.md

@@ -40,3 +40,27 @@ Don't just terminate the instance, run `terraform destroy` in the appropriate fo
  7. Email Asha (Compliance/ISSO) and inform her that the servers can be removed from the FedRAMP inventory
 
 
+Remove IPs SAF: 12.42.184.208
+
+## Remove the Customer from the Code
+
+Remove references of the customer from these places:
+
+ 1. Atlantis configs ( atlantis.yaml )
+ 2. Splunk Monitoring Console ( salt/fileroots/splunk/monitoring_console/init.sls  - salt/fileroots/splunk/search_head/init.sls )
+ 3. Salt master configs ( default_acl.conf )
+ 4. Salt Splunk files (salt/fileroots/splunk/files/saf_variables.jinja)
+ 5. Salt top.sls and pillar/top.sls ( salt/fileroots/top.sls - salt/pillar/top.sls )
+ 6. Salt global_variables.sls, os_settings.sls (salt/pillar/global_variables.sls - salt/pillar/os_settings.sls )
+ 7. Salt Customer specific Pillars ( salt/pillar/saf_pop_settings.sls - salt/pillar/saf_variables.sls )
+ 8. Salt gitfs pillar ( salt/pillar/salt_master.sls )
+ 9. Terraform salt provision references ( terraform/02-msoc_vpc/cloud-init/provision_salt_master.sh )
+ 10. Terraform C&C IP whitelisting for salt master and reposerver ( terraform/02-msoc_vpc/security-groups.tf )
+ 11. Terraform customer folder ( terraform/102-saf/ )
+ 12. Terraform common variables ( terraform/common/variables.tf )
+ 
+Update salt master
+`salt salt* state.sls salt_master`
+
+## Report the Decommissioned Hosts to the AFCC Team
+

+ 9 - 1
Fluentd Notes.md

@@ -2,12 +2,20 @@
 
 Fluentd is part of Treasure Data. So the service name is td-agent. 
 
-systemctl status td-agent
+`systemctl status td-agent`
 
 Fluentd is installed on afs-splunk-syslog-1. Fluentd will not start unless the directories specifid in the config file are created. 
 
+```
 salt -L 'afs-splunk-syslog-1' cmd.run 'ls -larth /opt/syslog-ng/'
 salt -L 'afs-splunk-syslog-1' cmd.run 'mkdir /opt/syslog-ng/zscaler_firewall/'
 salt -L 'afs-splunk-syslog-1' cmd.run 'mkdir /opt/syslog-ng/zscaler_dns/'
 salt -L 'afs-splunk-syslog-1' cmd.run 'chown td-agent:td-agent /opt/syslog-ng/zscaler_firewall/'
 salt -L 'afs-splunk-syslog-1' cmd.run 'chown td-agent:td-agent /opt/syslog-ng/zscaler_dns/'
+```
+
+Folder structure changed!
+```
+salt -L 'afs-splunk-syslog-1' cmd.run 'tail /opt/syslog-ng/zscaler/web/log/2020-05-26/zscaler_web.2020-05-26T2020_0.log'
+
+```

+ 8 - 0
POP Node Notes.md

@@ -12,3 +12,11 @@ SDC drive
 sorry needs to be xfs
 1:46
 salt 'afs*syslog-[5678]*' cmd.run 'mkfs -t xfs -f  /dev/vg_syslog/lv_syslog'
+
+
+## Splunk on POP Nodes
+5/27/2020
+
+
+The POP nodes have Splunk UF on them that should have outputs.conf that point to Moose external LB. 
+

+ 99 - 108
Patching Notes.md

@@ -3,7 +3,7 @@
 ## Timeline
 * Asha likes to submit her FedRAMP packet before about the 20th, so try to get it done before that.
 * Send email ~ 1 week before.
-* Give 15 minute warning in #mdr-patching, #mdr-content, etc before patching
+* Give 15 minute warning in these slack channels #mdr-patching, #mdr-content, etc before patching
 
 # Patching Process
 
@@ -19,108 +19,46 @@ SUBJECT: December Patching
 
 It is time for monthly patching again. Patching is going to occur during business hours within the next week or two.  Everything - including Customer POPs - needs patching.  We will be doing the servers in 2 waves.
  
-Wave 1 is hot patching of all systems.
- 
-Wave 2 will be the needed reboots; as this is where we see the customer impact.
- 
 For real-time patching announcements, join the slack channel #mdr-patching. Announcements will be posted in that channel on what is going down and when.
  
 Here is the proposed patching schedule:
 
 Wednesday Dec 11:
 * Moose and Internal infrastructure
-  * Wave 1
+  * Patching
  
 Thursday Dec 12:
 * Moose and Internal
-  * Wave 2
+  * Reboots
 * All Customer PoP
-  * Wave 1 (AM)
-  * Wave 2 (PM)
+  * Patching (AM)
+  * Reboots (PM)
 
 Monday Dec 16:
 * All Customer XDR Cloud
-  * Wave 1
+  * Patching
 * All Search heads
-  * Wave 2 (PM)
+  * Reboots (PM)
 
 Tuesday Dec 17:
 * All Remaining XDR Cloud
-  * Wave 2 (AM)
+  * Reboots (AM)
  
 The customer and user impact will be during the reboots so they will be done in batches to reduce our total downtime is less.
 
 ***
 
 
-```
-
-
-
-
-#restarting the indexers one at a time (one from each group). Use the CM to see if the indexer comes back up properly. 
-salt -C ' ( *moose* or *saf* ) and *indexer-1*' cmd.run 'shutdown -r now'
-#check to ensure the hot volume is mounted /opt/splunkdata/hot
-salt -C '( *moose* or *saf* ) and *indexer-1*' cmd.run 'df -h'
-
-#WAIT FOR 3 checks in the CM before restarting the next indexer. 
-
-#repeat for indexer 2
-salt -C ' ( *moose* or *saf* ) and *indexer-2*' cmd.run 'shutdown -r now'
-#check to ensure the hot volume is mounted /opt/splunkdata/hot
-salt -C ' ( *moose* or *saf* ) and *indexer-2*' cmd.run 'df -h'
-
-#WAIT FOR 3 checks in the CM before restarting the next indexer.
-
-#repeat for indexer 3
-salt -C ' ( *moose* or *saf* ) and *indexer-3*' cmd.run 'shutdown -r now'
-#check to ensure the hot volume is mounted /opt/splunkdata/hot
-salt -C ' ( *moose* or *saf* ) and *indexer-3*' cmd.run 'df -h'
-
-
-IF/WHEN and indexer doesn't come back up follow these steps:
-in AWS grab the instance id. 
-
-run the MDR/get-console.sh
-look for "Please enter passphrase for disk splunkhot"
-
-in AWS console stop instance (which will remove ephemeral splunk data) then start it. 
-Then ensure the /opt/splunkdata/hot exists.
-if it doesn't then manually run the cloudinit boot hook. 
-sh /var/lib/cloud/instance/boothooks/part-002
-
-ensure the hot folder is owned by splunk:splunk
-it will be waiting for the luks.key
-systemctl deamon-reload
-systemctl restart systemd-cryptsetup@splunkhot
-It is waiting for command prompt, when you restart the service it picks up the key from a file. Systemd sees the crypt setup service as a dependency for the splunk service. 
-
-
-
-
-
-
-#restart indexers (one at a time; wait for 3 green checkmarks in Cluster Master)
-salt -C 'nga*indexer-1*' test.ping
-salt -C 'nga*indexer-1*' cmd.run 'shutdown -r now'
-
-#Repeat for indexer-2 and indexer-3
-
-#Ensure all have been restarted. Then done with NGA
-salt -C '*nga*' cmd.run 'uptime'
-
-```
-
 ---
 
 
 
 
-# Brad's Actual Patching
+# Brad's Patching
 
 > :warning: **See if Github Has any updates!** Coordinate with Duane on Github Patching.
 
-Starting with moose and internal infra Wave 1. Check disk space for potential issues. 
+Starting with moose and internal infra patching. Check disk space for potential issues. 
 ```
 salt -C '* not ( afs* or saf* or nga* )' test.ping --out=txt
 salt -C '* not ( afs* or saf* or nga* )' cmd.run 'df -h /boot'  
@@ -132,7 +70,7 @@ salt -C '* not ( afs* or saf* or nga* )' cmd.run 'yum check-update'
 salt -C '* not ( afs* or saf* or nga* )' pkg.upgrade
 ```
 This error: error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed
-pip uninstall urllib3
+salt ma-* cmd.run 'pip uninstall urllib3 -y'
 
 This error is caused by the versionlock on the package. Use this to view the list
 yum versionlock list
@@ -153,8 +91,12 @@ package-cleanup --oldkernels --count=1
 
 If VPN server stops working, try a stop and start of the vpn server. The private IP will probably change. 
 
+ISSUE: salt-minion doesn't come back and has this error
+`/usr/lib/dracut/modules.d/90kernel-modules/module-setup.sh: line 16: /lib/modules/3.10.0-957.21.3.el7.x86_64///lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/sound/drivers/mpu401/snd-mpu401.ko.xz: No such file or directory`
 
-# Wave 2 Internals 
+RESOLUTION: Manually reboot the OS, this is most likely due to a kernal upgrade.
+
+# Reboots Internals 
 
 Be sure to select ALL events in sensu for silencing not just the first 25. 
 Sensu -> Entities -> Sort (name) -> Select Entity and Silence. This will silence both keepalive and other checks. 
@@ -165,33 +107,43 @@ Some silenced events will still trigger. Not sure why. The keepalive still trigg
 ```
 salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping
 salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' cmd.run 'shutdown -r now'
-salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* or vault-3* or sensu* )' test.ping --out=txt
-salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* or vault-3* or sensu* )' cmd.run 'shutdown -r now'
+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or la-* or vault-3* or sensu* )' test.ping --out=txt
+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or la-* or vault-3* or sensu* )' cmd.run 'shutdown -r now'
 #you will lose connectivity to openvpn and salt master
 #log back in and verify they are back up
 salt -C '* not ( moose-splunk-indexer* or afs* or saf* or nga* )' cmd.run 'uptime' --out=txt
 ```
 
-# Wave 2 Moose
+# Reboots Moose
 
 ```
 salt -C 'moose-splunk-indexer*' test.ping --out=txt
 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'shutdown -r now'
 #indexers take a while to restart
+watch "salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' test.ping"
 ping moose-splunk-indexer-1.msoc.defpoint.local
 ```
 #WAIT FOR SPLUNK CLUSTER TO HAVE 3 CHECKMARKS
 indexer2 is not coming back up...look at screenshot in aws... see this: Probing EDD (edd=off to disable)... ok
 look at system log in AWS see this:  Please enter passphrase for disk splunkhot!:
 
+IF/WHEN and indexer doesn't come back up follow these steps:
+in AWS grab the instance id. 
+
+run the MDR/get-console.sh ( Duane's script for pulling the system log)
+look for "Please enter passphrase for disk splunkhot"
+
+
 In AWS console stop instance (which will remove ephemeral splunk data) then start it. 
 Then ensure the /opt/splunkdata/hot exists.
 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'df -h'
 IF the MOUNT for /opt/splunkdata/hot DOESN"T EXISTS STOP SPLUNK! Splunk will write to the wrong volume. 
 before mounting the new volume clear out the wrong /opt/splunkdata/
+rm -rf /opt/splunkdata/hot/*
+
 
 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'systemctl stop splunk'
-Ensure the /opt/splunkdata doesn't already exist, before the boothook. (theory that this causes the issue) 
+Ensure the /opt/splunkdata doesn't already exist, before the boothook. 
 ssh prod-moose-splunk-indexer-1
 if it doesn't then manually run the cloudinit boot hook. 
 sh /var/lib/cloud/instance/boothooks/part-002
@@ -201,7 +153,7 @@ ensure the hot folder is owned by splunk:splunk
 ll /opt/splunkdata/
 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'ls -larth /opt/splunkdata'
 chown -R splunk: /opt/splunkdata/
-salt -C 'nga-splunk-indexer-2.msoc.defpoint.local' cmd.run 'chown -R splunk: /opt/splunkdata/'
+salt -C '' cmd.run 'chown -R splunk: /opt/splunkdata/'
 it will be waiting for the luks.key
 systemctl daemon-reload
 salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'systemctl daemon-reload'
@@ -222,10 +174,10 @@ check the servers again to ensure all of them have rebooted.
 salt -C ''moose-splunk-indexer*'' cmd.run 'uptime' --out=txt | sort
 
 Ensure all Moose and Internal have been rebooted
-salt -C '* not ( afs* or saf* or nga* )' cmd.run 'uptime' --out=txt | sort
+salt -C '* not ( afs* or nga* or la-* or ma-* )' cmd.run 'uptime' --out=txt | sort
 
 
-# Wave 1 POPs
+# Patching POPs
 
 ```
 salt -C '* not *.local' test.ping --out=txt
@@ -240,7 +192,19 @@ salt -C '* not *.local' pkg.upgrade disablerepo=msoc-repo
 salt -C '* not *.local' pkg.upgrade
 ```
 
-# Wave 2 POPs
+Error on afs-splunk-ds-3: error: cannot open Packages database in /var/lib/rpm
+Solution: 
+
+```
+mkdir /root/backups.rpm/
+cp -avr /var/lib/rpm/ /root/backups.rpm/
+rm -f /var/lib/rpm/__db*
+db_verify /var/lib/rpm/Packages
+rpm --rebuilddb
+yum clean all
+```
+
+# Reboots POPs
 
 DO NOT restart all POP at the same time
 
@@ -252,12 +216,14 @@ salt -C '*syslog-1* not *.local' cmd.run 'ps -ef | grep syslog-ng | grep -v grep
 #look for /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
 ```
 
-SAF will need the setenforce run
+if syslog-ng doesn't start, it might need the setenforce 0 command run ( left here for legacy reasons )
 ```
 salt saf-splunk-syslog-1 cmd.run 'setenforce 0'
 salt saf-splunk-syslog-1 cmd.run 'systemctl stop rsyslog'
 salt saf-splunk-syslog-1 cmd.run 'systemctl start syslog-ng'
 
+watch "salt -C '*syslog-1* not *.local' test.ping"
+
 salt -C '*syslog-2* not *.local' cmd.run 'uptime'
 salt -C '*syslog-2* not *.local' cmd.run 'shutdown -r now'
 salt -C '*syslog-2* not *.local' cmd.run 'ps -ef | grep syslog-ng | grep -v grep'
@@ -274,6 +240,7 @@ repeat for syslog-5, syslog-6, syslog-7, and syslog-8
 (might be able to reboot some of these at the same time. if they are if different locations. check the location grain on them.)
 grains.item location
 
+```
 afs-splunk-syslog-8: {u'location': u'az-east-us-2'}
 afs-splunk-syslog-6: {u'location': u'az-central-us'}
 
@@ -285,15 +252,14 @@ salt -C 'afs-splunk-syslog*  grains.item location
 
 salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'uptime'
 salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' cmd.run 'shutdown -r now'
+watch "salt -L 'afs-splunk-syslog-6, afs-splunk-syslog-8' test.ping"
 
 salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'uptime'
 salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' cmd.run 'shutdown -r now'
+watch "salt -L 'afs-splunk-syslog-5, afs-splunk-syslog-7' test.ping"
+```
 
 #verify logs are flowing
-https://saf-splunk-sh.msoc.defpoint.local:8000/en-US/app/search/search
-ddps03.corp.smartandfinal.com
-index=* source=/opt/syslog-ng/* host=ddps* earliest=-15m | stats  count by host
-
 https://afs-splunk-sh.msoc.defpoint.local:8000/en-US/app/search/search
 afssplhf103.us.accenturefederal.com
 index=* source=/opt/syslog-ng/* host=afs* earliest=-15m | stats  count by host
@@ -304,6 +270,8 @@ index=network sourcetype="citrix:netscaler:syslog" earliest=-15m
 index=* source=/opt/syslog-ng/* host=aws* earliest=-60m | stats count by host
 
 POP ds (could these be restarted at the same time? Or in 2 batches?)
+
+```
 salt -C '*splunk-ds-1* not *.local' cmd.run 'uptime'
 salt -C '*splunk-ds-1* not *.local' cmd.run 'shutdown -r now'
 
@@ -312,55 +280,78 @@ salt -C '*splunk-ds-2* not *.local' cmd.run 'shutdown -r now'
 
 salt afs-splunk-ds-[2,3,4] cmd.run 'uptime'
 salt afs-splunk-ds-[2,3,4] cmd.run 'shutdown -r now'
+```
 
 Don't forget ds-3 and ds-4
-
+```
+#try reboot at the same time
+salt '*splunk*ds*' cmd.run 'uptime'
+salt '*splunk*ds*' cmd.run 'shutdown -r now'
+watch "salt '*splunk*ds*' test.ping"
 salt '*splunk-ds*' cmd.run 'systemctl status splunk'
+```
 
 
-POP dcn
-salt -C '*splunk-dcn-1* not *.local' cmd.run 'uptime'
-salt -C '*splunk-dcn-1* not *.local' cmd.run 'shutdown -r now'
-
 Did you get all of them?
-salt -C ' * not *local ' cmd.run 'uptime'
 
-###
-# Customer Slices Wave 1
+```
+salt -C ' * not *local ' cmd.run 'uptime' --out=txt | sort
+```
+
 
-salt -C 'afs*local or saf*local or nga*local' test.ping --out=txt
-salt -C 'afs*local or saf*local or nga*local' cmd.run 'uptime'
-salt -C 'afs*local or saf*local or nga*local' cmd.run 'df -h'
-salt -C 'afs*local or saf*local or nga*local' pkg.upgrade
+# Customer Slices Wave 1
 
+```
+salt -C 'afs*local or ma-*local or la-*local or nga*local' test.ping --out=txt
+salt -C 'afs*local or ma-*local or la-*local or nga*local' cmd.run 'uptime'
+salt -C 'afs*local or ma-*local or la-*local or nga*local' cmd.run 'df -h'
+salt -C 'afs*local or ma-*local or la-*local or nga*local' pkg.upgrade
+```
 epel repo is enabled on afs-splunk-hf ( I don't know why)
 had to run this to avoid issue with collectd package on msoc-repo
 
-yum update --disablerepo epel
+`yum update --disablerepo epel`
+
+
+
+# Customer Slices Search Heads Only Wave 2
 
 Silence Sensu first!
-Customer Slices Search Heads Only Wave 2
-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' test.ping --out=txt
-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'df -h'
-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'shutdown -r now'
-salt -C 'afs-splunk-sh*local or saf-splunk-sh*local or nga-splunk-sh*local' cmd.run 'uptime'
 
-###
+```
+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' test.ping --out=txt
+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'df -h'
+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'shutdown -r now'
+salt -C 'afs-splunk-sh*local or ma-*-splunk-sh*local or la-*-splunk-sh*local or nga-splunk-sh*local' cmd.run 'uptime'
+```
+
+
 # Customer Slices CMs Wave 2
 
 Silence Sensu first!
+
+```
 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' test.ping --out=txt
 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'df -h'
 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'shutdown -r now'
+watch "salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' test.ping --out=txt"
 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'systemctl status splunk'
 salt -C '( *splunk-cm* or *splunk-hf* ) not moose*' cmd.run 'uptime'
+```
 
+
+May 27 17:08:57 la-c19-splunk-cm.msoc.defpoint.local splunk[3840]: /etc/rc.d/init.d/splunk: line 13: ulimit: open files: cannot modify limit: Invalid argument
 afs-splunk-hf has a hard time restarting. Might need to stop then start the instance. 
 
-reboot indexers 1 at a time (AFS cluster gets backed up when an indexer is rebooted)
+reboot indexers 1 at a time (AFS cluster gets backed up when an indexer is rebooted) How to replicate this with ASGs? 
+
+```
+salt -C '*splunk-indexer-1* and G@ec2:placement:availability_zone:us-east-1a not moose*' test.ping
+salt -C '*splunk-indexer-1* not moose*' test.ping --out=txt
 salt -C '*splunk-indexer-1* not moose*' test.ping --out=txt
 salt -C '*splunk-indexer-1* not moose*' cmd.run 'df -h'
 salt -C '*splunk-indexer-1* not moose*' cmd.run 'shutdown -r now'
+```
 
 Wait for 3 green check marks
 #repeat for indexers 2 & 3
@@ -373,4 +364,4 @@ salt -L 'afs-splunk-indexer-3.msoc.defpoint.local,saf-splunk-indexer-3.msoc.defp
 NGA had a hard time getting 3 checkmarks The CM was waiting on stuck buckets. Force rolled the buckets to get green checkmarks.
 
 salt -C '* not *.local' cmd.run 'uptime | grep days'
-***MAKE SURE the Sensu checks are not silenced. ***
+> :warning: ***MAKE SURE the Sensu checks are not silenced. ***

+ 16 - 0
Reposerver Notes.md

@@ -32,3 +32,19 @@ yum install savinstpkg
 Defined in salt/fileroots/splunk/new_install.sls
 /etc/yum.repos.d/splunk.repo
 http://reposerver.msoc.defpoint.local/splunk
+
+New Splunk Version
+Splunk 7.2 needs to be created for PROD moose
+
+```
+cd /var/www/html/splunk
+mkdir 7.2
+chown -R apache: .
+cd 7.2
+createrepo `pwd`
+wget -O splunk-7.2.5.1-962d9a8e1586-linux-2.6-x86_64.rpm 'https://www.splunk.com/page/download_track?file=7.2.5.1/linux/splunk-7.2.5.1-962d9a8e1586-linux-2.6-x86_64.rpm&ac=&wget=true&name=wget&platform=Linux&architecture=x86_64&version=7.2.5.1&product=splunk&typed=release'
+chown -R apache: .
+cd /var/www/html/splunk/7.2
+createrepo `pwd`
+restorecon -R /var/www/html/splunk
+```

+ 4 - 1
Splunk MSCAS Notes.md

@@ -1,5 +1,8 @@
-For smart and final customer
+# Splunk MSCAS Notes.md
 
+
+References:
+https://github.mdr.defpoint.com/MDR-Content/mdr-content/wiki/CS0009:Search:MSOC---MS-CAS---Alert
 https://jira.mdr.defpoint.com/browse/MSOCI-890
 https://docs.microsoft.com/en-us/cloud-app-security/siem
 https://splunkbase.splunk.com/app/3110/

+ 2 - 0
Splunk Notes.md

@@ -37,6 +37,8 @@ TEST SPLUNK CM admin password
 admin
 6VB^8V3CFjbaiZ4Q#hLjNW3a1
 
+TEST SPLUNK indexer-1 admin password
+6VB8V3CFjbaiZ4QhLjNW3a1
 
     
     

+ 29 - 2
Terraform Splunk ASG Notes.md

@@ -218,9 +218,24 @@ in outputs.conf for splunk nodes
 1. stand up new templates and ASGs
 2. launch new ASG instances
 3. Ensure three green checkmarks in CM
-4. change salt master to new outputs
-5. 
+4. change salt master to new outputs (make sure it is working)
+4.1 ensure they can connect first 
+salt '*.local' network.connect moose-splunk-cm.msoc.defpoint.local 8089
+5. Update all minions to IDXC discovery
+6. silence sensu 
+6. manually take the non-ASG indexers offline but don't stop instances
+6.1 /opt/splunk/bin/splunk offline --enforce-counts
+6.1.1 no users? create etc/system/local/user-seed.conf then restart ( https://answers.splunk.com/answers/834/how-to-reset-the-admin-password.html ) 
+6.2 disable the service to prevent it startup back up
+7. use tf to destroy the instances then remove the code from TF.
 
+salt-run survey.diff '*.local' cp.get_file_str file:////opt/splunkforwarder/etc/apps/moose_outputs/default/outputs.conf
+salt sensu* cmd.run 'tail -50 /opt/splunkforwarder/var/log/splunk/splunkd.log'
+
+salt phantom* cmd.run 'tail -200 /opt/splunkforwarder/var/log/splunk/splunkd.log | grep TcpOutputProc'
+
+Check in Splunk for indexerdiscovery errors
+index=_internal sourcetype="splunkd" source="/opt/splunkforwarder/var/log/splunk/splunkd.log" component=IndexerDiscoveryHeartbeatThread
 
 terraform apply -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg
 
@@ -229,3 +244,15 @@ moose-splunk-indexer-i-048bf97164bd7401c.msoc.defpoint.local
 moose-splunk-indexer-i-063613603d0afc287.msoc.defpoint.local
 moose-splunk-indexer-i-0e1cde750a6408302.msoc.defpoint.local
 
+
+PROD
+
+http://reposerver.msoc.defpoint.local/splunk/7.2/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
+After salt highstate, splunkuf is running, but splunk is not.
+salt moose-splunk-indexer-i* cmd.run 'systemctl stop splunkuf'
+"/opt/splunk/etc/slave-apps/TA-Frozen-S3/bin/coldToFrozenS3.py"" ; no such script could be found on the filesystem." even after multiple attempts, Exiting..
+^^^ this is a bug in splunk!
+
+added python3 to ALL servers including indexers and moved coldToFrozenS3.py to /usr/local/bin to fix the issue. 
+
+