Browse Source

Merge branch 'master' of github.xdr.accenturefederalcyber.com:mdr-engineering/infrastructure-notes

Brad Poulton 3 years ago
parent
commit
e2c93e15db
7 changed files with 188 additions and 56 deletions
  1. 16 0
      AWS Notes.md
  2. 21 0
      CA Notes.md
  3. 36 19
      GitHub Server Notes.md
  4. 15 4
      Patching Notes--CaaSP.md
  5. 33 33
      Patching Notes.md
  6. 42 0
      Splunk App Distribution.md
  7. 25 0
      Terragrunt Notes.md

+ 16 - 0
AWS Notes.md

@@ -242,3 +242,19 @@ ln -s ~/Library/Python/3.9/bin/amicleaner /usr/local/bin/amicleaner
 ```
 
 You now have amicleaner in your path, and can run `~/xdr-terraform-live/bin/clean_old_amis.sh`.
+
+## Rescuing Systems using SSM
+
+For example, the disk space on sensu in test was full.
+
+```
+aws --profile mdr-test-c2-gov \
+    ssm send-command --document-name "AWS-RunShellScript" \
+    --instance-ids "i-0d5072669fb00c2fb" --comment "SystemRescue" \
+    --parameters commands='rm -rf /var/log/sudo-io/*'
+# Grab Command ID from the output
+aws --profile mdr-test-c2-gov \
+   ssm list-command-invocations \
+   --command-id ae86ec00-e356-4ad6-b162-6b5ec5260ed9
+```
+

+ 21 - 0
CA Notes.md

@@ -40,6 +40,8 @@ curl http://xdr-subordinate-crl.s3.us-gov-east-1.amazonaws.com/crl/FILLTHISINFRO
 
 ## Generate an audit report
 
+The moose SH automatically generates these reports via `/opt/splunk/etc/apps/SA-moose/`.
+
 ```
 # Root CA
 aws --profile mdr-prod-root-ca-gov \
@@ -61,6 +63,25 @@ aws --profile mdr-common-services-gov \
   --audit-report-response-format CSV 
 ```
 
+### Audit Report Not Updating
+
+If you get an email:
+```
+The alert condition for 'Private CA Audit Log - Not Downloaded or Generated' was triggered.
+
+THIS MEANS WE ARENT AUDITING THE PRIVATE CA! Investigate SA-Moose on the Moose SH.
+
+See FedRAMP SC-17
+```
+
+Splunk may have lost its permissions. Run:
+```
+cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-root-ca/011-root-CA
+terragrunt apply
+cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-root-ca/012-subordinate-cas
+terragrunt apply
+```
+
 ## Revoke a Certificate
 
 Grab the serial number from [Moose Private CA Dashboard](https://moose-splunk.pvt.xdr.accenturefederalcyber.com/en-US/app/splunk_app_aws/private_ca_status_dashboard)

+ 36 - 19
GitHub Server Notes.md

@@ -1,29 +1,34 @@
 # GitHub Server Notes
 
-`GitHub Enterprise Server` is an APPLIANCE. No salt minion, No sft. 
-To SSH in you must have your public key manually added. 
-
-Host github     
-  Port 122      
-  User admin      
-  HostName 10.80.101.78     
-  
+`GitHub Enterprise Server` is an APPLIANCE. No Salt minion, no Teleport.
+To SSH in you must have your public key manually added.
+
+Host github
+  Port 122
+  User admin
+  HostName 10.80.101.78
+
 # Adding New Users to GitHub Teams
 
-OKTA does NOT manage the permissions on the GitHub server. To give a user access to a new team, like `mdr-engineering`, log into the Github server and access this URL: [Login](https://github.xdr.accenturefederalcyber.com/orgs/mdr-engineering/teams/onboarding/members) . Find the new user by clicking on the "Add a member" button. 
+OKTA does NOT manage the permissions on the GitHub server. To give a user access to a new team, like `mdr-engineering`, log into the Github server and access this URL: [Login](https://github.xdr.accenturefederalcyber.com/orgs/mdr-engineering/teams/onboarding/members) . Find the new user by clicking on the "Add a member" button.
+
+# Updating
 
-# Updating 
 ```
 ghe-update-check
 ghe-upgrade /var/lib/ghe-updates/github-enterprise-2.17.22.hpkg
 ```
 
 Upgrading major version
+
 ```
 ghe-upgrade
 fdisk -l
 ```
-Two partitions are installed. When you run an `upgrade` the VM will install the upgrade to the other partition. After the upgrade it will switch the primary boot partitions. This leaves the previous version available for roll back. 
+
+NOTE: The output of `ghe-upgrade-check` will provide you with the command to use to upgrade GitHub Enterprise.
+
+Two partitions are installed. When you run an `upgrade` the VM will install the upgrade to the other partition. After the upgrade it will switch the primary boot partitions. This leaves the previous version available for roll back.
 
 
 Hit ghe- (TAB) to view all ghe commands. GitHub [Command-line utilities](https://docs.github.com/en/enterprise/2.17/admin/installation/command-line-utilities)
@@ -31,12 +36,11 @@ Hit ghe- (TAB) to view all ghe commands. GitHub [Command-line utilities](https:/
 
 # Installing new license
 
-Should be able to do just via the [Web UI](https://github.xdr.accenturefederalcyber.com:8443/setup/upgrade)     
-But there's a gotcha with disabling the DSA key (for a FEDRAMP POAM).  Your services
-may not restart after updating the license.
+Should be able to do just via the [Web UI](https://github.xdr.accenturefederalcyber.com:8443/setup/upgrade)
+But there's a gotcha with disabling the DSA key (for a FEDRAMP POAM). Your services may not restart after updating the license.
 
 ```
-+ cp /data/user/common/ssh_host_rsa_key /data/user/common/ssh_host_rsa_key.pub /data/user/common/ssh_host_dsa_key /data/user/common/ssh_host_dsa_key.pub /data/user/common/ssh_host_ecdsa_key /data/user/common/ssh_host_ecdsa_key.pub /etc/ssh/
+cp /data/user/common/ssh_host_rsa_key /data/user/common/ssh_host_rsa_key.pub /data/user/common/ssh_host_dsa_key /data/user/common/ssh_host_dsa_key.pub /data/user/common/ssh_host_ecdsa_key /data/user/common/ssh_host_ecdsa_key.pub /etc/ssh/
 cp: cannot stat '/data/user/common/ssh_host_dsa_key': No such file or directory
 cp: cannot stat '/data/user/common/ssh_host_dsa_key.pub': No such file or directory
 Jun 30 16:09:54 ERROR: Preparing storage device
@@ -55,7 +59,7 @@ sudo mv /data/user/common/ssh_host_dsa_key* /data/user/user-tmp/
 sudo systemctl restart babeld
 ```
 
-I'll open a case with Github too. 
+I'll open a case with Github too.
 
 # GitHub-Backup
 
@@ -63,7 +67,7 @@ The `ghe-backup` servers are instances running `Docker`.
 
 Docker is installed via the `docker` Salt state.
 
-Most backup configuration is managed by the salt `github.backup` state:
+Most backup configuration is managed by the Salt `github.backup` state:
 * `/usr/local/github-backup-utils` contains a copy of the [github repository](https://github.com/github/backup-utils). Be sure to run `git pull origin master` prior to upgrading/rebuilding the docker image and use the release version in the image tag.
 * Build of the docker image, replace 'vX.y.z' with the backup-utils release version. Manual command is: `docker build --build-arg=http_proxy=$HTTP_PROXY --build-arg=https_proxy=$HTTPS_PROXY -t github/backup-utils:vX.y.z .`. You can run this if you get an error when applying the state.
 * A script is run via a cronjob in `/etc/cron.d/ghe-backup`, which calls the script `/root/github-backup.sh`. This script calls docker to run the backup.
@@ -77,12 +81,25 @@ If there is a new GitHub or a new ghe-backup server, you will need to accept the
 sudo ssh -p 122 -i /etc/github-backup-utils/.ssh/id_rsa -o UserKnownHostsFile=/etc/github-backup-utils/.ssh/known_hosts github-enterprise-0.pvt.xdrtest.accenturefederalcyber.com -l admin
 ```
 
-and accept the key.
+And accept the key.
 
 # Restoring
 
 Restoring should be similar to the command called by /root/github-backup.sh, except with a 'ghe-restore' command. You may need to update the script to use the latest Docker image build/tag.
 
+# Troubleshooting Backup Failures
+
+SSH to the ghe-backup server, `sudo -iu root` to become root and `cd /efs/github-prod/log` (or `/efs/github-test/log` if on XDR Test), then `ls -lrth | tail -3`. Grab the newest (last listed) backup log file and use `tail` to see where it is at.
+
+Log entries to look for:
+`No leaked keys found` -- The job completed successfully
+
+`Error: A backup of github-enterprise-0.pvt.xdr.accenturefederalcyber.com may still be running on PID 1.
+If PID 1 is not a process related to the backup utilities, please remove the /data/in-progress file and try again.` -- Something prevented the job from completing such as a reboot whilst the Docker container was creating the backup. Delete the `/efs/github-prod/data/in-progress` file.
+
+Some failure alerts from Splunk may be due to the backup job taking longer than one hour to complete. This is likely due to some other process taking up CPU/memory on the ghe-backup host, preventing the Docker process from working efficiently (such as clamd).
+
+
 # Migration Steps to Govcloud:
 
 1) Create Okta App Manually
@@ -106,4 +123,4 @@ Restoring should be similar to the command called by /root/github-backup.sh, exc
    * Fix mailserver
 11) Restore crontab to original
 12) Disable old app in okta
-13) Highstate salt
+13) Highstate Salt

+ 15 - 4
Patching Notes--CaaSP.md

@@ -18,7 +18,7 @@ There isn't typically a need to inform anyone of patching as CaaSP is not consid
 ### Day 1
 #### Step 1: Victim Instances
 
-Connect to the CaaSP Salt Master with `tshd caasp-salt-master` and run the following commands:
+Connect to the CaaSP Salt Master with `tshd caasp-salt-server` and run the following commands:
 
 ```
 ### There is also the grain 'role:caasp-victim' that can be used instead of 'vic-*' or 'vic-* or VIC-*'.
@@ -118,12 +118,18 @@ Post to Slack [#xdr-patching Channel](https://afscyber.slack.com/archives/CJ462R
 ```
 date; salt -L 'caasp-splunk-sh-dev,caasp-splunk-hf,caasp-splunk-cm,caasp-phantom' test.ping -t 5
 
+# Check for disk usage
+salt -L 'caasp-splunk-sh-dev,caasp-splunk-hf,caasp-splunk-cm,caasp-phantom' cmd.run 'df -h | egrep "[890][0-9]\%"'
+
 # Reboot the dev search head, HF, CM, and Phantom
 date; salt -L 'caasp-splunk-sh-dev,caasp-splunk-hf,caasp-splunk-cm,caasp-phantom' system.reboot --async
 
 # Wait for them ...
 watch "salt -L 'caasp-splunk-sh-dev,caasp-splunk-hf,caasp-splunk-cm,caasp-phantom' status.uptime --out=txt"
 
+# Verify Splunk Service is Active
+salt -L 'caasp-splunk-sh-dev,caasp-splunk-hf,caasp-splunk-cm' cmd.run 'systemctl status splunk | grep Active'
+
 # Reboot the search head
 salt caasp-splunk-sh test.ping
 date; salt caasp-splunk-sh system.reboot --async
@@ -152,6 +158,7 @@ Repeat the above patching steps for the additional indexers, waiting for `four`
 ```
 # Do the second indexer
 salt caasp-splunk-idx-i-0babc3 test.ping --out=txt
+salt caasp-splunk-idx-i-0babc3 cmd.run 'df -h | egrep "[890][0-9]\%"'
 date; salt caasp-splunk-idx-i-0babc3 system.reboot --async
 
 # Indexers take a while to restart
@@ -163,6 +170,7 @@ watch "salt caasp-splunk-idx-i-0babc3 status.uptime --out=txt"
 ```
 # Do the third indexer
 salt caasp-splunk-idx-i-04665e test.ping --out=txt
+salt caasp-splunk-idx-i-04665e cmd.run 'df -h | egrep "[890][0-9]\%"'
 date; salt caasp-splunk-idx-i-04665e system.reboot --async
 
 # Indexers take a while to restart
@@ -170,12 +178,14 @@ watch "salt caasp-splunk-idx-i-04665e status.uptime --out=txt"
 
 # Verify all indexers rebooted (check for seconds less than a few thousand):
 salt 'caasp-splunk-idx-i-*' status.uptime --out=txt
+salt 'caasp-splunk-idx-i-*' cmd.run 'systemctl status splunk | grep Active'
 ```
 
 #### Ensure all Splunk instances have been rebooted
 
 ```
 salt 'caasp-splunk-*' status.uptime --out=txt
+salt 'caasp-splunk-*' cmd.run 'systemctl status splunk | grep Active'
 ```
 
 #### Step 2 (Day 2): Reboot Kali, Jenkins, the Bastion, OSCDNS, Phoenix, and Salt Master
@@ -186,11 +196,12 @@ Rebooting CaaSP support infrastructure (Jenkins, Phoenix, etc.) now.
 ```
 
 ```
-salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-master' test.ping --out=txt
-date; salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-master' system.reboot --async
+salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-server' test.ping --out=txt
+salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-server' cmd.run 'df -h | egrep "[890][0-9]\%"'
+date; salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-server' system.reboot --async
 
 #### Rebooting will disconnect you from the Salt Master. Once you are able to ssh back in ...
-salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-master' status.uptime --out=txt
+salt -L 'caasp-kali,caasp-build-01,caasp-bastion,caasp-oscdns,caasp-phoenix-01,caasp-salt-server' status.uptime --out=txt
 ```
 
 ## Patching or Upgrading the Jenkins Container

+ 33 - 33
Patching Notes.md

@@ -19,9 +19,9 @@ Also, reminder that the legacy `Reposerver` was shutdown in late February 2021,
 
 Each month the AWS `GovCloud (GC) TEST/PROD` environments must be patched to comply with FedRAMP requirements. This wiki page outlines the process for patching the environment. 
 
-Email Template that needs to be sent out prior to patching and email addresses of individuals who should get the email. 
+Email Template that needs to be sent out prior or create a Calendar event for patching and email addresses of individuals who should get the invite. 
 ```
-Leonard, Wesley A. <wesley.a.leonard@accenturefederal.com>; Waddle, Duane E. <duane.e.waddle@accenturefederal.com>; Nair, Asha A. <asha.a.nair@accenturefederal.com>; Crawley, Angelita <angelita.crawley@accenturefederal.com>; Rivas, Gregory A. <gregory.a.rivas@accenturefederal.com>; Damstra, Frederick T. <frederick.t.damstra@accenturefederal.com>; Poulton, Brad <brad.poulton@accenturefederal.com>; Williams, Colby <colby.williams@accenturefederal.com>; Naughton, Brandon <brandon.naughton@accenturefederal.com>; Cooper, Jeremy <jeremy.cooper@accenturefederal.com>; Jennings, Kendall <kendall.jennings@accenturefederal.com>; Lohmeyer, Dean <dean.lohmeyer@accenturefederal.com>; xdr.patching@accenturefederal.com
+Leonard, Wesley A. <wesley.a.leonard@accenturefederal.com>; Waddle, Duane E. <duane.e.waddle@accenturefederal.com>; Nair, Asha A. <asha.a.nair@accenturefederal.com>; Crawley, Angelita <angelita.crawley@accenturefederal.com>; Rivas, Gregory A. <gregory.a.rivas@accenturefederal.com>; Damstra, Frederick T. <frederick.t.damstra@accenturefederal.com>; Poulton, Brad <brad.poulton@accenturefederal.com>; Kuykendall, Charles S. <charles.s.kuykendall@accenturefederal.com>; Williams, Colby <colby.williams@accenturefederal.com>; Naughton, Brandon <brandon.naughton@accenturefederal.com>; Cooper, Jeremy <jeremy.cooper@accenturefederal.com>; Jennings, Kendall <kendall.jennings@accenturefederal.com>; Lohmeyer, Dean <dean.lohmeyer@accenturefederal.com>; XDR-Patching <xdr.patching@accenturefederal.com>
 ```
 
 ```
@@ -90,13 +90,13 @@ FYI, patching today.
 Starting with Moose and Internal infra patching within `GC TEST`. Check disk space for potential issues. Return here to start on PROD after TEST is patched. 
 ```
 # Test connectivity between Salt Master and Minions
-salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' test.ping --out=txt
+salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* or resolver-vmray* or qcompliance* )' test.ping --out=txt
 
 # Fred's update for df -h - checks for disk utilization at the 80-90% area
-salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
+salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* or resolver-vmray* or qcompliance* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
 
 # Review packages that will be updated. some packages are versionlocked (Collectd, Splunk,etc.).
-salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' cmd.run 'yum check-update'
+salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* or resolver-vmray* or qcompliance* )' cmd.run 'yum check-update'
 
 #Older commands that are still viable if Fred's one-liner has issues; feel free to skip and move to pkg.upgrade line
 salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* )' cmd.run 'df -h /boot'  
@@ -111,7 +111,7 @@ salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or d
 
 ### Also, the phantom_repo pkg wants to upgrade, but we are not ready. Let's exclude that and OpenVPN server to prevent errors.
 ```
-salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or openvpn* or vmray* or phantom-0* )' pkg.upgrade
+salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or openvpn* or vmray* or resolver-vmray* or phantom-0* or qcompliance* )' pkg.upgrade
 
 #update phantom, but exclude the phantom repo. 
 salt -C 'phantom-0*' pkg.upgrade disablerepo='["phantom-base",]'
@@ -134,7 +134,7 @@ salt -C 'openvpn*' pkg.upgrade
 # What about threatq? Ask Duane! It needs special handling. 
 
 # Just to be sure, run it again to make sure nothing got missed. 
-salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* or phantom-0* )' pkg.upgrade
+salt -C '* not ( afs* or nga* or dc-c19* or la-c19* or bp-ot-demo* or bas-* or doed* or ca-c19* or frtib* or dgi* or threatq* or vmray* or resolver-vmray* or phantom-0* or qcompliance* )' pkg.upgrade
 ```
 > :warning: After upgrades check on Portal to make sure it is still up. 
 
@@ -248,15 +248,15 @@ watch "salt -C 'vault-3* or sensu*' test.ping --out=txt"
 
 Reboot majority of servers in `GC Test`.
 ```
-salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vault-3* )' test.ping --out=txt
-date; salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vault-3* )' system.reboot --async
+salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* )' test.ping --out=txt
+date; salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* )' system.reboot --async
 ```
 > :warning: 
 ### You will lose connectivity to Openvpn and Salt Master
 ### Log back in and verify they are back up
 
 ```
-watch "salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vault-3* )' cmd.run 'uptime' --out=txt"
+watch "salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* )' cmd.run 'uptime' --out=txt"
 ```
 
 Take care of the govcloud Resolvers one at a time. The vmray can be combined with one of the govcloud ones. 
@@ -272,7 +272,7 @@ watch "salt -C 'resolver-govcloud-2.pvt.*com' test.ping --out=txt"
 
 Check uptime on the minions in GC to make sure you didn't miss any. 
 ```
-salt -C  '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or threatq-* )' cmd.run 'uptime | grep days'
+salt -C  '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or threatq-* or resolver-vmray-* or vmray-server* )' cmd.run 'uptime | grep days'
 ```
 ### Duane Section (feel free to bypass)
 --
@@ -319,16 +319,16 @@ watch "salt -C 'vault-1*com or sensu*com' test.ping --out=txt"
 
 Reboot majority of servers in GC. 
 ```
-salt -C  '*com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-worker* )' test.ping --out=txt
+salt -C  '*com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or qcompliance* or vmray-worker* )' test.ping --out=txt
 
-date; salt -C  '*com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-worker* )' system.reboot --async
+date; salt -C  '*com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or qcompliance* or vmray-worker* )' system.reboot --async
 ```
 > :warning:
 ### You will lose connectivity to openvpn and salt master
 ### Log back in and verify they are back up
 
 ```
-watch "salt -C  '*accenturefederalcyber.com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com )' cmd.run 'uptime' --out=txt"
+watch "salt -C  '*accenturefederalcyber.com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or qcompliance* )' cmd.run 'uptime' --out=txt"
 ```
 
 Take care of the resolvers one at a time and with the `GC Prod Salt Master`. Reboot one of each at the same time.
@@ -351,7 +351,7 @@ watch "salt -C 'vmray-worker*com' test.ping --out=txt"
 
 Check uptime on the minions in `GC Prod` to make sure you didn't miss any. 
 ```
-salt -C  '*accenturefederalcyber.com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com )' cmd.run 'uptime | grep days'
+salt -C  '*accenturefederalcyber.com not ( afs* or nga* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bp-ot-demo* or bas-* or doed* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or qcompliance* )' cmd.run 'uptime | grep days'
 ```
 
 Verify Portal is up: [Portal](https://portal.xdr.accenturefederalcyber.com/)  
@@ -373,14 +373,14 @@ salt 'moose-splunk-idx*' test.ping --out=txt
 # date; salt moose-splunk-idx-63f.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Added during Dec 2021 patching
-salt moose-splunk-idx-f6b.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
-date; salt moose-splunk-idx-f6b.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt moose-splunk-idx-ad9.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
+date; salt moose-splunk-idx-ad9.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
 # watch "salt moose-splunk-idx-63f.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 # salt 'moose-splunk-idx-63f.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
 
-watch "salt moose-splunk-idx-f6b.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt moose-splunk-idx-ad9.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 
 ```
 
@@ -393,14 +393,14 @@ salt 'modelclient-splunk-idx*' test.ping --out=txt
 # salt 'modelclient-splunk-idx-a74.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
 # date; salt modelclient-splunk-idx-a74.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
-salt 'modelclient-splunk-idx-498.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
-date; salt modelclient-splunk-idx-498.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt 'modelclient-splunk-idx-822.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
+date; salt modelclient-splunk-idx-822.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
 #watch "salt modelclient-splunk-idx-a74.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 #salt 'modelclient-splunk-idx-a74.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
 
-watch "salt modelclient-splunk-idx-498.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt modelclient-splunk-idx-822.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 
 ```
 #### WAIT FOR SPLUNK CLUSTER TO HAVE 3 CHECKMARKS
@@ -413,43 +413,43 @@ Repeat the above patching steps for the additional indexers, waiting for `3 gree
 # date; salt moose-splunk-idx-d4f.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Added during Dec 2021 patching
-salt moose-splunk-idx-7d5.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
-date; salt moose-splunk-idx-7d5.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt moose-splunk-idx-b22.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
+date; salt moose-splunk-idx-b22.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
 # watch "salt moose-splunk-idx-d4f.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
-watch "salt moose-splunk-idx-7d5.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt moose-splunk-idx-b22.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 ```
 
 ```
 # Do the second Modelclient indexer
 # salt 'modelclient-splunk-idx-c9f.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
 # date; salt modelclient-splunk-idx-c9f.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
-salt 'modelclient-splunk-idx-561.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
-date; salt modelclient-splunk-idx-561.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt 'modelclient-splunk-idx-f28.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
+date; salt modelclient-splunk-idx-f28.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
 #watch "salt modelclient-splunk-idx-c9f.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
-watch "salt modelclient-splunk-idx-561.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt modelclient-splunk-idx-f28.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 ```
 
 ```
 # Do the third Moose indexer
-salt moose-splunk-idx-273.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
-date; salt moose-splunk-idx-273.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt moose-splunk-idx-568.pvt.xdrtest.accenturefederalcyber.com test.ping --out=txt
+date; salt moose-splunk-idx-568.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
-watch "salt moose-splunk-idx-273.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt moose-splunk-idx-568.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 
 # Do the third Modelclient indexer
 # salt 'modelclient-splunk-idx-a2a.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
 # date; salt modelclient-splunk-idx-a2a.pvt.xdrtest.accenturefederalcyber.com system.reboot
-salt 'modelclient-splunk-idx-bb5.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
-date; salt modelclient-splunk-idx-bb5.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
+salt 'modelclient-splunk-idx-820.pvt.xdrtest.accenturefederalcyber.com' test.ping --out=txt
+date; salt modelclient-splunk-idx-820.pvt.xdrtest.accenturefederalcyber.com system.reboot --async
 
 # Indexers take a while to restart
 # watch "salt modelclient-splunk-idx-a2a.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
-watch "salt modelclient-splunk-idx-bb5.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
+watch "salt modelclient-splunk-idx-820.pvt.xdrtest.accenturefederalcyber.com cmd.run 'uptime' --out=txt"
 ```
 ```
 # Verify all indexers on Moose and Modelclient have been patched:

+ 42 - 0
Splunk App Distribution.md

@@ -0,0 +1,42 @@
+# Splunk App Distribution
+
+Or, "How do apps get put onto splunk servers?"
+
+## Summary
+
+There are a few methods:
+* Salt from the `msoc-infrastructure` repository:
+* salt from customer-specific repositories such as `msoc-moose-cm`
+* The splunk-app-updater script
+* Manually
+
+## The splunk-app-updater script
+
+Each customer account has a splunk bucket of the format `xdr-modelclient-test-splunk-apps`.
+This bucket has folders for each server function:
+
+* sh-es - The ES SH
+* idx - Indexers
+* etc.
+
+In side these folders are subfolders per source. For example, stuff from the content team's `content_source` repository is placed into `/sh-es/content_source/`.
+
+The script `splunk-app-updater` runs on a cron schedule (10am ET Mon-Thur) and downloads all files for the server's purpose from
+that folder.
+
+If any packageshave changed since the last install, it installs the app using the splunk commandline (using the '--update' command).
+
+### But how do they get into the bucket?
+
+Any method can be used to place apps into the bucket. If it's in the bucket, it will be downloaded and installed.
+
+The Primary way they get into the bucket is through the CodeBuild scripts in each customer account. This codebuild script
+downloads the source from https://github.xdr.accenturefederalcyber.com/content-delivery/content_source and builds the apps
+based on the particular tags.
+
+### But how does it build?
+
+The apps are built using a container image that resides in common services. This container is built using the `Dockerfile.codebuild` file in
+https://github.xdr.accenturefederalcyber.com/content-delivery/content_generator
+
+The container is built and placed into ECR by a job that runs in the common services account.

+ 25 - 0
Terragrunt Notes.md

@@ -133,3 +133,28 @@ With tf14, terraform has added the creation of a 'provider state lock file' to p
 * If you need an extra provider, you should override the generation of `required_providers.tf` in your `terragrunt.hcl` file for the module. This must include the modules from the root `terragrunt.chl` that are used within your module. For an example, see `xdr-terraform-live/common/aws-us-gov/afs-mdr-common-services-gov/085-codebuild-ecr-customer-portal/terragrunt.hcl`
 * To regenerate or upgrade modules, I guess you just delete it?
 * There is possible compatibility issues with `TF_PLUGIN_CACHE_DIR`. You can try disabling this if you have trouble getting hashes.
+
+## Could not load plugin
+
+If you get:
+```
+Substituting 'git@github.xdr.accenturefederalcyber.com:mdr-engineering/xdr-terraform-modules.git//base/sensu-configuration' with '../../../../../xdr-terraform-modules//base/sensu-configuration'
+Acquiring state lock. This may take a few moments...
+Releasing state lock. This may take a few moments...
+╷
+│ Error: Could not load plugin
+│
+│
+│ Plugin reinitialization required. Please run "terraform init".
+│
+│ Plugins are external binaries that Terraform uses to access and manipulate
+│ resources. The configuration provided requires plugins which can't be
+│ located,
+│ don't satisfy the version constraints, or are otherwise incompatible.
+
+```
+It means that you've run the module before with an earlier version of the plugin. To fix, run:
+```
+terragrunt init --upgrade
+```
+(Or terragrunt-local if appropriate)