Browse Source

Merge branch 'master' of github.xdr.accenturefederalcyber.com:mdr-engineering/infrastructure-notes

Brad Poulton 3 years ago
parent
commit
296b2cd345
2 changed files with 42 additions and 24 deletions
  1. 19 7
      ALSI (Cribl LogStream) Notes.md
  2. 23 17
      Patching Notes.md

+ 19 - 7
ALSI (Cribl LogStream) Notes.md

@@ -1,10 +1,14 @@
 
-# WORK IN PROGRESS
-
 # Aggregated Log Source Ingestion a.k.a. ALSI (Cribl LogStream) Notes
 
+# THIS IS A WORK IN PROGRESS
+
+ℹ️ The following guide should get you, dear reader, 90% there. Cribl does not lend itself well to orchestration so the Salt states may fail. They worked once upon a time and efforts have been made to keep them working but it may be faster to stand up Cribl manually; the choice is yours.
+
 ## Create Okta (OIDC) Application
 
+⚠️ This step can be skipped if there is no Enterprise license available.
+
 Follow the [instructions from docs.cribl.io](https://docs.cribl.io/stream/usecase-sso-okta/) to create the Okta application. 
 
 Prerequisites:
@@ -47,19 +51,21 @@ Replace all encrypted values except the `admin_password` with the appropriate GP
 cribl:
   privatekey_path: "/opt/cribl/pki/privatekey.pem"
   certificate_path: "/opt/cribl/pki/cert.pem"
-  license:
-  hec_token:
+  # [] represents the free license distributed with Cribl.
+  # The free license does not permit SSO.
+  license: []
+  hec_token: ~
   admin_password: |
     -----BEGIN PGP MESSAGE-----
 
     Value removed
     -----END PGP MESSAGE-----
-  okta_client_id:
-  okta_client_secret:
+  okta_client_id: ~
+  okta_client_secret: ~
 {% endif %} {# If alsi #}
 ```
 
-> :information_source: See the GnuPG (gpg) Notes document for instructions on how to GPG-encrypt the various values.
+:information_source: See the GnuPG (gpg) Notes document for instructions on how to GPG-encrypt the various values.
 
 ## Creating the Cribl Infrastructure for a Customer
 
@@ -84,3 +90,9 @@ To create one or more worker nodes along with the leader, modify `xdr-terraform-
 ```
 
 If the customer requires public ELBs for HEC, enable those in `config.tf` as well.
+
+## Enable Indexer Discovery for Cribl
+
+Log into the customer's Cluster Manager (CM) and create an authentication token for the admin user _with no expiration date_ (or starting date). Copy the token it creates _*before*_ clicking the Close button.
+
+Log into the customer's Cribl Leader as the admin user and configure a Stream destination to use a [Splunk Load Balanced destination](https://docs.cribl.io/stream/destinations-splunk-lb) with indexer discovery enabled and provide the token. Be sure to enable TLS, otherwise Cribl will complain about reset connections.

+ 23 - 17
Patching Notes.md

@@ -102,13 +102,13 @@ FYI, patching today.
 Starting with Moose and Internal infra patching within `GC TEST`. Check disk space for potential issues. Return here to start on PROD after TEST is patched. 
 ```
 # Test connectivity between Salt Master and Minions
-salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' test.ping --out=txt
+salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or vmray* )' test.ping --out=txt
 
 # Fred's update for df -h - checks for disk utilization at the 80-90% area
-salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
+salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or vmray* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
 
 # Review packages that will be updated. Some packages are versionlocked (Collectd, Splunk, Teleport, etc.).
-salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or threatq* or vmray* )' cmd.run 'yum check-update'
+salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or vmray* )' cmd.run 'yum check-update'
 ```
 
 <!-- ```
@@ -122,7 +122,7 @@ salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19*
 
 ### Also, the `phantom_repo` pkg wants to upgrade, but we are not ready. Let's exclude that.
 ```
-date; salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or threatq* or vmray* or phantom-0* )' pkg.upgrade
+date; salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or vmray* or phantom-0* )' pkg.upgrade
 
 # update phantom, but exclude the phantom repo. 
 date; salt -C 'phantom-0*' pkg.upgrade disablerepo='["phantom-base",]'
@@ -156,7 +156,7 @@ salt vmray* cmd.run 'systemctl start vmray-server vmray-worker'
 
 ### Run it again to make sure nothing got missed. 
 ```
-salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or threatq* or vmray* or phantom-0* )' pkg.upgrade
+salt -C '* not ( afs* or nga* or doed* or dc-c19* or la-c19* or bas-* or ca-c19* or frtib* or dgi* or vmray* or phantom-0* )' pkg.upgrade
 ```
 ---
 
@@ -279,15 +279,15 @@ watch "salt -C 'vault-3* or sensu*' test.ping --out=txt"
 
 Reboot majority of servers in `GC Test`.
 ```
-salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* or rhsso-0* )' test.ping --out=txt
-date; salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* or rhsso-0* )' system.reboot --async
+salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or vmray-* or vault-3* or rhsso-0* )' test.ping --out=txt
+date; salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or vmray-* or vault-3* or rhsso-0* )' system.reboot --async
 ```
 > :warning: 
-### You will lose connectivity to Salt Master
+### You will lose connectivity to Teleport and Salt Master
 ### Log back in and verify they are back up
 
 ```
-watch "salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or threatq-* or vmray-* or vault-3* or rhsso-0* )' cmd.run 'uptime' --out=txt"
+watch "salt -C '*com not ( modelclient-splunk-idx* or moose-splunk-idx* or resolver* or sensu* or vmray-* or vault-3* or rhsso-0* )' cmd.run 'uptime' --out=txt"
 ```
 
 Take care of the govcloud Resolvers one at a time. The vmray can be combined with one of the govcloud ones. 
@@ -327,7 +327,7 @@ FYI, patching today. Rebooting PROD
 * Then this afternoon, reboots of those those PoPs/LCPs.
 ```
 
-SSH via TSH into `GC Salt-Master` to reboot servers in GC that are on `gc-prod`.
+SSH via TSH into `GC Salt-Master` to reboot servers in GC that are on `GC Prod`.
 
 ```
 # Login to Teleport
@@ -335,9 +335,10 @@ tsh --proxy=teleport.xdr.accenturefederalcyber.com login
 
 # SSH to GC Salt-Master (PROD)
 tsh ssh node=salt-master
-
 ```
 
+> :warning: Don't forget to silence Sensu! Be sure to post the Jurrasic Park, "Hold on to your butts" Meme into the xdr-soc channel before restarting Prod. `/giphy hold on to your butts`
+
 Start with `Vault` and `Sensu`
 ```
 # Vault-1 and Sensu
@@ -346,20 +347,18 @@ date; salt -C 'vault-1*com or sensu*com' system.reboot --async
 watch "salt -C 'vault-1*com or sensu*com' test.ping --out=txt"
 ```
 
-> :warning: Don't forget to silence Sensu! Be sure to post the Jurrasic Park, "Hold on to your butts" Meme into the xdr-soc channel before restarting Prod. `/giphy hold on to your butts`
-
-Reboot majority of servers in GC. 
+Reboot majority of servers. 
 ```
-salt -C  '*com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-worker* )' test.ping --out=txt
+salt -C  '*com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-* )' test.ping --out=txt
 
-date; salt -C  '*com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-worker* )' system.reboot --async
+date; salt -C  '*com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-* )' system.reboot --async
 ```
 > :warning:
 ### You will lose connectivity to Salt master
 ### Log back in and verify they are back up
 
 ```
-watch "salt -C  '*accenturefederalcyber.com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-worker* )' cmd.run 'uptime' --out=txt"
+watch "salt -C  '*accenturefederalcyber.com not ( afs* or nga* or doed* or dc-c19* or la-c19* or dgi-* or moose-splunk-idx* or modelclient-splunk-idx* or bas-* or frtib* or ca-c19* or resolver* or vault-1*com or sensu*com or vmray-* )' cmd.run 'uptime' --out=txt"
 ```
 
 ### Vault Service likes to crap out after reboot; verify the service is back up
@@ -536,6 +535,10 @@ watch "salt moose-splunk-idx-7fa.pvt.xdr.accenturefederalcyber.com cmd.run 'upti
 
 # Verify all indexers rebooted:
 salt 'moose-splunk-idx*' cmd.run 'uptime | grep days'
+
+# Verify Splunk is active on all indexers
+salt 'moose-splunk-idx*' cmd.run 'systemctl status splunk | grep Active'
+
 ```
 
 #### Troubleshooting
@@ -906,6 +909,9 @@ salt -C 'afs*local or afs*com or bas-*com or ca-c19*com or dc*com or dgi*com or
 # SKIP this one as long as Fred's kung fu works
 salt -C 'afs*local or afs*com or bas-*com or ca-c19*com or dc*com or dgi*com or doed*com or frtib*com or la-*com or nga*com or nga*local' cmd.run 'df -h'
 
+# Check for upgrades
+salt -C 'afs*local or afs*com or bas-*com or ca-c19*com or dc*com or dgi*com or doed*com or frtib*com or la-*com or nga*com or nga*local' cmd.run 'yum check-update'
+
 # Upgrade the Packages
 salt -C 'afs*local or afs*com or bas-*com or ca-c19*com or dc*com or dgi*com or doed*com or frtib*com or la-*com or nga*com or nga*local' pkg.upgrade