Procházet zdrojové kódy

Merge branch 'master' of github.xdr.accenturefederalcyber.com:mdr-engineering/infrastructure-notes

Brad Poulton před 3 roky
rodič
revize
1d86ecf832
3 změnil soubory, kde provedl 302 přidání a 15 odebrání
  1. 86 0
      ALSI (Cribl LogStream) Notes.md
  2. 49 10
      Jira Notes.md
  3. 167 5
      ThreatQ Notes.md

+ 86 - 0
ALSI (Cribl LogStream) Notes.md

@@ -0,0 +1,86 @@
+
+# WORK IN PROGRESS
+
+# Aggregated Log Source Ingestion a.k.a. ALSI (Cribl LogStream) Notes
+
+## Create Okta (OIDC) Application
+
+Follow the [instructions from docs.cribl.io](https://docs.cribl.io/stream/usecase-sso-okta/) to create the Okta application. 
+
+Prerequisites:
+* The Leader/Master ALB URL for the Web UI
+
+You may be thinking, "What is the URL? I have not created it yet." and you are correct. Fortunately, we have a standard naming convention so it should be simple to enter the correct value (the value doesn't have to be correct right now, for what it is worth).
+
+Example: `https://<customer>-alsi.pvt.accenturefederalcyber.com`
+
+The base URL will be required in two places with URL suffixes found in the documentation. You can also compare against existing Cribl Stream Okta applications.
+
+Whether or not you assign roles/users/etc. is not required at this point. We add the app to Okta first in order to get two strings to add to Salt.
+
+Copy the Okta client ID and client secret. You will add them to the customer's pillar variables file as described below.
+
+### Add Cribl Pillars to Customer Variables SLS
+
+#### Add the `mdr_wildcard_cert` pillars to Salt's `pillar/top.sls` for the ALSI instances.
+
+```yaml
+# pillar/top.sls
+---
+### Find the customer's section and add
+'<customer>-alsi-*':
+  - mdr_wildcard_cert
+### etc. etc.
+...
+```
+
+#### Enable GPG
+
+Add `#!jinja|yaml|gpg` to the top of the customer variables SLS file if it is not present.
+
+Copy the `cribl` pillars from an existing customer such as Moose (`moose_variables.sls`) to the customer's variables file.
+
+Replace all encrypted values except the `admin_password` with the appropriate GPG-encrypted blocks.
+
+```saltstack
+{% if grains['id'].startswith('bas-alsi-') %}
+cribl:
+  privatekey_path: "/opt/cribl/pki/privatekey.pem"
+  certificate_path: "/opt/cribl/pki/cert.pem"
+  license:
+  hec_token:
+  admin_password: |
+    -----BEGIN PGP MESSAGE-----
+
+    Value removed
+    -----END PGP MESSAGE-----
+  okta_client_id:
+  okta_client_secret:
+{% endif %} {# If alsi #}
+```
+
+> :information_source: See the GnuPG (gpg) Notes document for instructions on how to GPG-encrypt the various values.
+
+## Creating the Cribl Infrastructure for a Customer
+
+Copy the `175-splunk-alsi` directory from an existing customer (or test Moose) to the customer's directory in the `xdr-terraform-live` repository.
+
+```shell
+cp -a ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-bas/175-splunk-alsi ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-<customer>/
+```
+
+### Create Worker nodes
+
+To create one or more worker nodes along with the leader, modify `xdr-terraform-modules/base/splunk_servers/alsi/config.sls` and set the number of workers as an exception, using the account name found in `account.hcl`.
+
+```hcl
+  # If cribl is being used for log ingestion, remember to turn on splunk_private_hec
+  # in `splunk_servers/indexer_cluster/config.tf`, too.
+  alsi_workers_default = 0 # how many cribl workers
+  alsi_workers_exceptions = {
+    afs-mdr-test-c2-gov = 2,
+    mdr-prod-bas        = 2,
+  }
+```
+
+If the customer requires public ELBs for HEC, enable those in `config.tf` as well.

+ 49 - 10
Jira Notes.md

@@ -93,7 +93,7 @@ SAML magic is stored.
 
 # Load Balancer Stuff
 
-There's stuff in web.xml that tells it that it's in front of a load balancer.  The 
+There's stuff in `web.xml` that tells it that it's in front of a load balancer.  The 
 proxyName and proxyPort settings matter, because they will cause redirects when
 you connect to the wrong name.  Note that in the current config, the load balancer 
 terminates TLS and sends plain HTTP back to JIRA itself.
@@ -465,40 +465,79 @@ sudo /etc/rc.d/init.d/jira start
 
 # Upgrading
 
-A salt state can be used for upgrading. See `msoc-infrastructure/salt/fileroots/jira/`.
+A salt state can be used for upgrading. See `msoc-infrastructure/salt/fileroots/jira/README.md`.
+
+## Prepare
+
+1. Go to jira, click on teh Options->Applications option.
+1. Click on "Plan your upgrade"
+1. Review anything that looks dangerous.
 
 ## Staging the Update
 
 The state will install the update into a new directory (/opt/atlassian/jira-{version}), but will not stop/start it. You should be able to safely run this at any time (still, follow best practices: make backups first, run salt with `test=true`).
 
-1. download the latest LTS from https://www.atlassian.com/software/jira/download-journey. Download in .tar.gz
+1. Download the latest LTS from [Jira Software](https://www.atlassian.com/software/jira/download-journey). Download in .tar.gz
+  a. upgrade to latest release
+  b. Server
+  c. Latest Release
+  d. tar.gz archive
 1. Copy to afsxdr-binaries:
 ```
-aws --profile mdr-common-services-gov s3 cp atlassian-jira-software-8.13.10.tar.gz s3://afsxdr-binaries/jira/atlassian-jira-software-8.13.10.tar.gz
+aws --profile mdr-common-services-gov s3 cp atlassian-jira-software-8.13.10.tar.gz s3://afsxdr-binaries/jira/
 ```
-1. update init.sls with the latest version
+1. update `msoc-infrastructure/salt/fileroots/jira/init.sls` with the latest version
 1. Optionally, update the `okta_jar` using the same steps as above (download from okta admin, downloads).
-1. Run the state.
+1. Optionally, update the `xmlsec` jar using the same steps as above (download from url in `init.sls`).
+1. Do a PR to develop.
+1. Run the state in test, follow cutover below.
+  `salt jira\* state.sls jira --output-diff test=true` # not useful; will show lots of errors, since it doesn't download the update
+  `salt jira\* state.sls jira --output-diff test=false`
+1. Test in test ( https://jira.xdrtest.accenturefederalcyber.com/ ; Configuration is incomplete -- can't even log in )
+1. PR to master
+1. Apply, cutover, and test in prod
 
 ## Cutover
 
-This file does not change to the new version.
+The salt state does not change to the new version.
 
 To cut over:
 
-1. Judiciously remove older files in /opt/jira-data/jira/export
+1. Judiciously remove older files in `/opt/jira-data/jira/export`
 1. Create backups of jira and jira-data
+```
+cd /opt/
+sudo tar cvzf jira-backup.20220818.tar.gz atlassian jira-data
+```
 1. Run `sudo systemctl stop jira`
 1. Edit `/etc/init.d/jira`
 1. Change the path to the updated jira version
 1. `sudo systemctl daemon-reload`
 1. Run `sudo systemctl start jira && sudo tail -F /opt/atlassian/jira-VERSION/logs/catalina.out`
+1. If it warns "Upgrade: Custom changes have not been carried over", click 'ignore and continue'. If it returns to teh same screen, stop and start jira. The saltstate carried over these changes.
+```
+sudo systemctl restart jira && sudo tail -F /opt/atlassian/jira-9.1.0/logs/catalina.out
+```
+1. Test jira may redirect you to prod jira. Watch your URL!
+
+## Cleanup
+
+Once everything is runnign, there are a few things left:
+1. Clean out hte old directory to remove any warnings.
+```
+cd /opt/atlassian
+rm -rf jira-9.0.0 # or whatever previous version was
+```
+1. Go to the web interface, to the gear (settings)->manage apps
+1. click on 'manage applications' from the new menu
+1. click on 'update' on each application that has an update
+
 
-You may have to restart jira a second time if it warns about files that weren't updated (they really were!).
+...
 
 ## Java Updates
 
-No support in this state yet for updating/installing the JRE. See `infrasstructure-notes/Jira\ Notes.md` for detailed steps, or quick reference:
+No support in this state yet for updating/installing the JRE. See `infrastructure-notes/Jira\ Notes.md` for detailed steps, or quick reference:
 
 ## Error after Upgrading to 8.22.2
 

+ 167 - 5
ThreatQ Notes.md

@@ -1,5 +1,38 @@
 # ThreatQ Notes
 
+## TQ HelpCenter access
+
+https://helpcenter.threatq.com/welcome.htm
+
+## TQ Passwords:
+
+TQ HelpCenter: https://vault.pvt.xdr.accenturefederalcyber.com/ui/vault/secrets/engineering/show/threatq/helpcenter
+Admin Account: https://vault.pvt.xdr.accenturefederalcyber.com/ui/vault/secrets/engineering/show/threatq/admin
+MySQL root account: https://vault.pvt.xdr.accenturefederalcyber.com/ui/vault/secrets/engineering/show/threatq/mysql_admin
+
+
+## Operational things
+
+### Maintenance mode
+
+
+TQ has "maintenance mode" where app won't let anyone log in.
+
+To enable maintenance mode:
+
+```
+php /var/www/api/artisan down
+```
+
+Hardcore maintenance is to also stop apache httpd.  Use systemctl.
+
+To go out of maint:
+
+```
+php /var/www/api/artisan up
+```
+
+
 ## Compilation Notes
 
 ### Packer things
@@ -10,6 +43,12 @@ We have a specific packer build for the TQ AMI.  It's (mostly) unlike any other
 
 ### Firstboot
 
+First thing - make sure all the filesystems are grown.  Somehow we miss this?
+
+```
+mount | awk '$5 == "xfs" { print $3 }' | xargs -n 1 xfs_growfs
+```
+
 When doing firstboot on our TQ AMI, there are some un-hardenings in place.
 TQ expects during firstboot that the root password is a known value, that
 apache has the rights to `su` to root using that password, and that
@@ -17,23 +56,27 @@ selinux is permissive.
 
 After the GUI panel during firstboot is filled out (setting a new root
 password, a mysql root password, and an initial admin account), then we can
-undo the un-hardenings.
+undo the un-hardenings.  The firstboot GUI handles selinux.
 
-If you're doing a firstboot as part of restoring a backup from another TQ
-system, then please skip this part and come back!
+IF YOU'RE DOING A FIRSTBOOT AS PART OF RESTORING A BACKUP FROM ANOTHER TQ
+SYSTEM, THEN PLEASE SKIP THIS PART AND COME BACK!
 
 ```
 gpasswd -d apache sugroup
 echo "root:*" | chpasswd -e
 systemctl mask cockpit.service
 systemctl mask cockpit.socket
-shutdown -r now
+systemctl stop cockpit
 ```
 
-Now hopefully you can enable a salt minion without breaking anything
+You might have to reboot after this, depending on what you're doing.  If you're
+restoring TQ from backup, probably not.
 
 ### Restoring a backup
 
+GO THROUGH THE FIRSTBOOT STEPS VIA THE WEBUI, SET ADMIN AND MYSQL PASSWORDS
+AND INSTALL THE LICENSE
+
 You have to go through firstboot first, but DO NOT DO the unhardening
 above yet!
 
@@ -48,6 +91,15 @@ APP_INSTANCE_ID=18112b11-a13a-414b-93e8-b6b87adf27a5
 Now to do the restore ... docs are at https://helpcenter.threatq.com/index.htm#t=ThreatQ_Platform%2FBackup_and_Restore%2FBackup_and_Restore.htm
 but there are couple of things not there.
 
+Check to see if your mysql root password is right
+
+```
+mysql -u root -p
+```
+
+If you can connect to the DB good.  If not reset the pw.  There's some tomfoolery here I don't get yet.
+See bottom of these notes for how to reset the mysql PW.
+
 ```
 $ sudo -i
 # umask 0022
@@ -62,6 +114,74 @@ Now go back up and look at the un-hardening above and do that.  You probably don
 have to reboot twice.
 
 
+## Making a backup
+
+We have a script /usr/local/bin/tqbackup that does the needful-ish.  But if it's not
+there or if you need to specify some weird options (like --exclude-solr),
+you can run a backup directly.  Why would you want to exclude solr?  We've had problems
+with the backup working.  If you restore a backup that lacks solr, then you'll need to
+reindex everything.
+
+```
+cd /var/www/api
+php artisan threatq:backup --exclude-solr
+```
+
+The backup of solr goes to /tmp first so you'll need sufficient space there.  If the backup
+seems to be hanging / taking forever, it might be because of insufficient space.
+
+## Reindexing
+
+TQ uses Solr like elasticsearch.  Most of your UI interactions use Solr, but the
+real data is persisted in the DB.  Sometimes you'll need to reindex.  What you'd
+think of as an index in elasticsearch is a "core" in Solr.
+
+From TQ support to do a full reindex of everything:
+
+```
+php /var/www/api/artisan threatq:solr-import --full --all
+```
+
+This may take a few hours.  The biggest cores by size are "indicators" and
+"objectlinks", and they will take the longest to reindex.  Last reindex took
+4ish hours.  Folks can still use TQ while it's going, but a lot of things
+"won't show up"
+
+### Checking current reindex status
+
+```
+for i in indicators events objectlinks signatures attachments adversaries tasks campaign course_of_action exploit_target incident ttp attack_pattern identity intrusion_set malware report tool vulnerability; do
+	echo "============ $i ============"
+	php /var/www/api/artisan threatq:solr-import-status $i
+done
+```
+
+Each "core" (table/index?) will dump something like:
+
+```
+Solr Dataimport
+
+Status: busy
+  Type: delta
+
+Started At: 2022-08-19 22:28:02
+
+Stats:
++-----------------------------------+---------------------+
+| Time Elapsed                      | 0:23:23.485         |
+| Total Requests made to DataSource | 2165017             |
+| Total Rows Fetched                | 4806984             |
+| Total Documents Processed         | 203189              |
+| Total Documents Skipped           | 0                   |
+| Delta Dump started                | 2022-08-19 22:28:02 |
+| Identifying Delta                 | 2022-08-19 22:28:02 |
+| Deltas Obtained                   | 2022-08-19 22:28:08 |
+| Building documents                | 2022-08-19 22:28:08 |
+| Total Changed Documents           | 1574988             |
++-----------------------------------+---------------------+
+```
+
+
 ## Patching Notes
 
 TQ patching is a little different (of course).  You have to be very careful about
@@ -80,3 +200,45 @@ package
 
 ALWAYS do a `yum check-update` and make sure everything looks reasonable and that
 Centos packages aren't replacing their RHEL equivalents.
+
+
+## Some Common Problems
+
+### Mysql root password
+
+During firstboot, our password we set doesn't "take".  Connecting to the database becomes hard.
+Or any other situation where you need to reset the mysql password.  Adapted from a Digital Ocean
+help article https://www.digitalocean.com/community/tutorials/how-to-reset-your-mysql-or-mariadb-root-password
+
+```
+cat <<EOF > /etc/systemd/system/mariadb.service.d/damnit.conf
+[Service]
+Environment="MYSQLD_OPTS=--skip-grant-tables --skip-networking"
+EOF
+
+systemctl daemon-reload
+systemctl restart mariadb
+
+ps -ef | grep mysql  # looking for the skip settings to exist
+
+mysql -u root
+# These are mysql commands
+FLUSH PRIVILEGES;
+ALTER USER 'root'@'localhost' IDENTIFIED BY '<SOMEPASSWORD>';
+exit
+
+# Back to shell
+rm -f /etc/systemd/system/mariadb.service.d/damnit.conf
+systemctl daemon-reload
+systemctl restart mariadb
+ps -ef | grep mysql  # looking for the skip settings to NOT exist
+
+```
+
+### Weird things in sensu
+
+I had to comment out the IPv6 "::1" /etc/hosts entry to shut up sensu and its https check.
+
+I also had to `chmod 755 /var/log/audit` to shut up the disk space check for it.
+
+I thought both of these were in salt but maybe not