Browse Source

Merge branch 'master' of github.mdr.defpoint.com:mdr-engineering/infrastructure-notes

Fred Damstra 4 years ago
parent
commit
85220d0b17
5 changed files with 425 additions and 28 deletions
  1. 13 11
      Patching Notes--CaaSP.md
  2. 59 17
      Patching Notes.md
  3. 2 0
      RedHat Full Drive Notes.md
  4. 1 0
      Salt Upgrade Notes.md
  5. 350 0
      Splunk AFS Thaw Request Notes.md

+ 13 - 11
Patching Notes--CaaSP.md

@@ -24,14 +24,14 @@ Connect to the CaaSP Salt Master and run the following commands:
 ### There is also the grain 'role:caasp-victim' but it isn't present on every victim yet
 
 ### CentOS Victims
-salt -C 'vic-* or caasp-exp* and G@os:CentOS' test.ping --out=txt
-salt -C 'vic-* or caasp-exp* and G@os:CentOS' cmd.run 'df -h | egrep "[890][0-9]\%"'
+salt -C '( vic-* or caasp-exp* ) and G@os:CentOS' test.ping --out=txt
+salt -C '( vic-* or caasp-exp* ) and G@os:CentOS' cmd.run 'df -h | egrep "[890][0-9]\%"'
 
 # Review packages that will be updated.
-salt -C 'vic-* or caasp-exp* and G@os:CentOS' cmd.run 'yum check-update' 
+salt -C '( vic-* or caasp-exp* ) and G@os:CentOS' cmd.run 'yum check-update' 
 
 # Upgrade packages
-salt -C 'vic-* or caasp-exp* and G@os:CentOS' pkg.upgrade
+salt -C '( vic-* or caasp-exp* ) and G@os:CentOS' pkg.upgrade
 
 
 ### Windows Victims
@@ -41,7 +41,7 @@ salt -G 'os:Windows' chocolatey.upgrade all
 
 #### Step 2 (Day 1): Splunk, Kali, Bastion, etc. Instances
 
-NOTE: This may upgrade Salt!
+WARNING: This may upgrade Salt!
 
 NOTE: Upgrading Docker will stop or restart the Jenkins container.
 
@@ -50,7 +50,9 @@ salt -C 'not ( vic-* or caasp-exp* or VIC-* )' test.ping --out=txt
 salt -C 'not ( vic-* or caasp-exp* or VIC-* )' cmd.run 'df -h | egrep "[890][0-9]\%"'
 
 # Review packages that will be updated for CentOS.
-salt -C 'not ( vic-* or caasp-exp* or VIC-* ) and G@os:CentOS or G@os:Amazon' cmd.run 'yum check-update' 
+salt -C 'not ( vic-* or caasp-exp* or VIC-* ) and ( G@os:CentOS or G@os:Amazon )' cmd.run 'yum check-update' 
+
+# Review packages that will be upgraded for Ubuntu
 salt caasp-vault cmd.run 'apt-get update'
 
 # Upgrade packages
@@ -83,7 +85,7 @@ salt -C 'vic-* or caasp-exp* or VIC-*' system.reboot
 watch "salt -C 'vic-* or caasp-exp* or VIC-*' test.ping --out=txt"
 
 #### Check uptime. Look for values/seconds less than 1,000.
-salt 'vic-* or caasp-exp* or VIC-*' status.uptime --out=txt
+salt -C 'vic-* or caasp-exp* or VIC-*' status.uptime --out=txt
 ```
 
 ### Day 2
@@ -152,14 +154,14 @@ salt 'caasp-splunk-idx-i-*' status.uptime --out=txt
 salt 'caasp-splunk-*' system.uptime --out=txt
 ```
 
-#### Step 2 (Day 2): Reboot Kali, Jenkins, Vault, the Bastion, and Salt Master
+#### Step 2 (Day 2): Reboot Kali, Jenkins, Vault, the Bastion, OSCDNS, and Salt Master
 
 ```
-salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-salt-master' test.ping --out=txt
-salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-salt-master' system.reboot
+salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-oscdns,caasp-salt-master' test.ping --out=txt
+salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-oscdns,caasp-salt-master' system.reboot
 
 #### This will disconnect you from the Salt Master. Once you are able to ssh back in ...
-salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-salt-master' status.uptime --out=txt
+salt -L 'caasp-kali,caasp-build-01,caasp-vault,caasp-bastion,caasp-oscdns,caasp-salt-master' status.uptime --out=txt
 ```
 
 ## Patching or Upgrading the Jenkins Container

+ 59 - 17
Patching Notes.md

@@ -28,6 +28,8 @@ Here is the proposed patching schedule:
 Wednesday Dec 11:
 * Moose and Internal infrastructure
   * Patching
+* CaaSP 
+  * Patching
  
 Thursday Dec 12:
 * Moose and Internal
@@ -35,6 +37,8 @@ Thursday Dec 12:
 * All Customer PoP/LCP
   * Patching (AM)
   * Reboots (PM)
+* CaaSP
+  * Reboots
 
 Monday Dec 16:
 * All Customer XDR Cloud
@@ -46,7 +50,7 @@ Tuesday Dec 17:
 * All Remaining XDR Cloud
   * Reboots (AM)
  
-The customer and user impact will be during the reboots so they will be done in batches to reduce our total downtime is less.
+The customer and user impact will be during the reboots so they will be done in batches to reduce our total downtime.
 ```
 
 
@@ -59,7 +63,7 @@ Patch TEST first! This helps find problems in TEST and potential problems in PRO
 Post to Slack:
 ```
 FYI, patching today. 
-* This morning, patches to all internal systems and moose. 
+* This morning, patches to all internal systems, moose, and CaaSP. 
 * No reboots, so impact should be minimal.
 ```
 
@@ -88,6 +92,12 @@ salt -C 'openvpn*' pkg.upgrade
 
 # Just to be sure, run it again to make sure nothing got missed. 
 salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' pkg.upgrade exclude='phantom_repo'
+
+#patch GC ( from the GC salt master )
+salt -C  '*accenturefederalcyber.com not nihor*' test.ping
+salt -C  '*accenturefederalcyber.com not nihor*' cmd.run 'df -h | egrep "[890][0-9]\%"'
+salt -C  '*accenturefederalcyber.com not nihor*' cmd.run 'yum check-update'
+salt -C  '*accenturefederalcyber.com not nihor*' pkg.upgrade
 ```
 
 > :warning: After upgrades check on Portal to make sure it is still up. 
@@ -95,6 +105,11 @@ salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* )' p
 - Prod: https://portal.xdr.accenturefederalcyber.com/choose/login/
 - Test: https://portal.xdrtest.accenturefederalcyber.com/choose/login/
 
+#### Patch CaaSP
+See Patching Notes--CaaSP.md
+
+#### Troubleshooting
+
 Phantom error
 ```
 phantom.msoc.defpoint.local:
@@ -118,7 +133,7 @@ phantom.msoc.defpoint.local:
             error: %pre(phantom_repo-4.9.39220-1.x86_64) scriptlet failed, exit status 7
 ```
 
-#### Error: `error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed`
+##### Error: `error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed`
 
 `salt ma-* cmd.run 'pip uninstall urllib3 -y'`
 
@@ -133,7 +148,7 @@ Error: Package: salt-minion-2018.3.4-1.el7.noarch (@salt-2018.3)
                            salt = 2018.3.5-1.el7
 ```
 
-#### Error: installing package `kernel-3.10.0-1062.12.1.el7.x86_64` needs 7MB on the /boot filesystem
+##### Error: installing package `kernel-3.10.0-1062.12.1.el7.x86_64` needs 7MB on the /boot filesystem
 
 ```
 # Install yum utils 
@@ -143,23 +158,24 @@ yum install yum-utils
 package-cleanup --oldkernels --count=1 -y
 ```
 
-#### If VPN server stops working, 
+##### If VPN server stops working, 
 Try a stop and start of the VPN server ([instructions](OpenVPN%20Notes.md)). The private IP will probably change. 
 
-#### ISSUE: salt-minion doesn't come back and has this error
+##### ISSUE: salt-minion doesn't come back and has this error
 
 `/usr/lib/dracut/modules.d/90kernel-modules/module-setup.sh: line 16: /lib/modules/3.10.0-957.21.3.el7.x86_64///lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/sound/drivers/mpu401/snd-mpu401.ko.xz: No such file or directory`
 
 RESOLUTION: Manually reboot the OS, this is most likely due to a kernal upgrade.
 
 ### Day 2 (Thursday), step 1 of 4:  Reboot Internals 
+Long Day of Rebooting!
 
 Don't forget to reboot test. 
 
 Post to Slack:
 ```
 FYI, patching today. 
-* In about 15 minutes: Reboots of moose and internal systems, including the VPN.
+* In about 15 minutes: Reboots of moose, internal systems and CaaSP, including the VPN.
 * Following that, patching (but not rebooting) of all customer PoPs/LCPs.
 * Then this afternoon, reboots of those those PoPs/LCPs.
 ```
@@ -173,11 +189,11 @@ Some silenced events will not unsilence and will need to be manually unsilenced.
 salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping
 date; salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' system.reboot
 watch "salt -L 'vault-3.msoc.defpoint.local,sensu.msoc.defpoint.local' test.ping"
-salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' test.ping --out=txt
-date; salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' system.reboot
+salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* or nihor* )' test.ping --out=txt
+date; salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* or nihor* )' system.reboot
 ### You will lose connectivity to openvpn and salt master
 ### Log back in and verify they are back up
-watch "salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* )' cmd.run 'uptime' --out=txt"
+watch "salt -C '* not ( moose-splunk-indexer* or afs* or nga* or ma-* or mo-* or la-* or dc-* or vault-3* or sensu* or interconnect* or resolver* or nihor* )' cmd.run 'uptime' --out=txt"
 
 # Take care of the interconencts/resolvers one at a time. 
 
@@ -209,10 +225,14 @@ interconnects and resolvers. Interconnects and resolvers reboot one at a time.
 salt -C '* not ( afs* or saf* or nga* or ma-* or mo-* or dc-c19* or la-c19* or openvpn* or qcomp* or salt-master* or moose-splunk-indexer-* or interconnect* or resolver* )' cmd.run 'shutdown -r now'
 ```
 
+#### Reboot CaaSP
+See Patching Notes--CaaSP.md
 
 
 ### Day 2 (Thursday), Step 2 of 4: Reboot Moose
 
+Log in to https://moose-splunk-cm.msoc.defpoint.local:8000/ and go to `settings->indexer clustering`.
+
 ```
 salt -C 'moose-splunk-indexer*' test.ping --out=txt
 
@@ -222,13 +242,11 @@ date; salt -C 'moose-splunk-indexer-i-03ff4fb9915d5f7df.msoc.defpoint.local' sys
 
 # Indexers take a while to restart
 watch "salt -C 'moose-splunk-indexer-i-03ff4fb9915d5f7df.msoc.defpoint.local' cmd.run 'uptime' --out=txt"
-ping moose-splunk-indexer-1.msoc.defpoint.local
+ping moose-splunk-indexer-i-03ff4fb9915d5f7df.msoc.defpoint.local
 ```
 
 #### WAIT FOR SPLUNK CLUSTER TO HAVE 3 CHECKMARKS
 
-Log in to https://moose-splunk-cm.msoc.defpoint.local:8000/ and go to `settings->indexer clustering`.
-
 Repeat the above patching steps for the additional indexers, waiting for 3 green checks in between each one.
 
 ```
@@ -249,8 +267,9 @@ watch "salt -C 'moose-splunk-indexer-i-00ca1da87a2abcd56.msoc.defpoint.local' cm
 # Verify all indexers patched:
 salt 'moose-splunk-indexer*' cmd.run 'uptime' --out=txt
 ```
+#### Troubleshooting
 
-#### If the indexer/checkmarks don't come back
+##### If the indexer/checkmarks don't come back ( legacy information )
 
 If an indexer is not coming back up...look at screenshot in AWS... see this: `Probing EDD (edd=off to disable)... ok` then look at system log in AWS see this:  `Please enter passphrase for disk splunkhot!`:
 
@@ -301,7 +320,7 @@ salt -C 'moose-splunk-indexer-1.msoc.defpoint.local' cmd.run 'systemctl | egrep
 It is waiting for command prompt, when you restart the service it picks up the key from a file. Systemd sees the crypt setup service as a dependency for the splunk service. 
 
 
-#### Look for this. this is good, it is ready for restart of splunk
+Look for this. this is good, it is ready for restart of splunk
 Cryptography Setup for splunkhot
 
 ```
@@ -349,7 +368,9 @@ salt -C '* not *.local not *.pvt.xdr.accenturefederalcyber.com' pkg.upgrade
 # on 2020-07-23: salt -C 'nga-splunk-ds-1 or afs-splunk-ds-1 or afs-splunk-ds-2' pkg.upgrade disablerepo=splunk-7.0 # Optional for fix
 ```
 
-#### Error on afs-splunk-ds-3: error: cannot open Packages database in /var/lib/rpm
+#### Troubleshooting
+
+##### Error on afs-splunk-ds-3: error: cannot open Packages database in /var/lib/rpm
 Solution: 
 
 ```
@@ -361,7 +382,7 @@ rpm --rebuilddb
 yum clean all
 ```
 
-#### Error on `*-ds`: Could not resolve 'reposerver.msoc.defpoint.local/splunk/7.0/repodata/repomd.xml'
+##### Error on `*-ds`: Could not resolve 'reposerver.msoc.defpoint.local/splunk/7.0/repodata/repomd.xml'
 Reason:
 POP Nodes shouldn't be using the .local dns address.
 
@@ -509,6 +530,7 @@ salt -C ' * not *local not *.pvt.xdr.accenturefederalcyber.com' cmd.run 'uptime'
 
 
 ### Day 3 (Monday), Step 1 of 2, Customer Slices Patching
+Shorter day of Patching! :-)
 
 Post to Slack:
 ```
@@ -525,6 +547,16 @@ salt -C 'afs*local or ma-* or mo-*local or la-*local or nga*local or dc*local' c
 salt -C 'afs*local or ma-* or mo-*local or la-*local or nga*local or dc*local' cmd.run 'df -h | egrep "[890][0-9]\%"'
 salt -C 'afs*local or ma-* or mo-*local or la-*local or nga*local or dc*local' pkg.upgrade
 ```
+
+Don't forget to patch nihors* on gc-prod-salt-master!
+
+```
+salt -C 'nihor*com' cmd.run 'df -h | egrep "[890][0-9]\%"' 
+salt -C 'nihor*com' pkg.upgrade
+```
+
+#### Troubleshooting
+
 EPEL repo is enabled on afs-splunk-hf ( I don't know why); had to run this to avoid issue with collectd package on msoc-repo
 
 `yum update --disablerepo epel`
@@ -549,7 +581,17 @@ salt -C '*-sh* and not *moose* and not qcompliance* and not fm-shared-search*' s
 watch "salt -C '*-sh* and not *moose* and not qcompliance* and not fm-shared-search*' cmd.run 'uptime'"
 ```
 
+Don't forget to reboot nihors-splunk-sh* on gc-prod-salt-master!
+
+```
+salt -C 'nihor-splunk-sh*' cmd.run 'df -h | egrep "[890][0-9]\%"' 
+salt -C 'nihor-splunk-sh*' system.reboot
+```
+
+Don't forget to un-silence Sensu. 
+
 ### Day 4 (Tuesday), Step 1 of 1, Customer Slices CMs Reboots
+Long Day of Reboots!
 
 ```
 Today's patching is the indexing clusters for all XDR customer environments. Cluster masters and indexers will be rebooted this morning. Thank you for your cooperation.

+ 2 - 0
RedHat Full Drive Notes.md

@@ -12,6 +12,8 @@ du -sh /opt/syslog-ng/* |sort -h
 du -sh /opt/
 rm -rf /var/log/hubble
 
+/var/cache/yum
+yum clean all
 
 
 afs-splunk-hf.msoc.defpoint.local  /  prod-afs-splunk-hf   (websense runs once per day.)

+ 1 - 0
Salt Upgrade Notes.md

@@ -32,6 +32,7 @@
 - `salt cmd.run 'pip3 install boto3'`
 - `salt cmd.run 'pip3 install pyinotify'`
 - `salt saltutil.sync_all`
+- `salt saltutil.refresh_modules`
 - `salt grains.get ec2:placement:availability_zone`
 - `salt grains.get environment`
 - RESTART to apply beacon inotify changes `service.restart salt-minion`

+ 350 - 0
Splunk AFS Thaw Request Notes.md

@@ -0,0 +1,350 @@
+# Splunk AFS Thaw Request Notes
+
+This documents the process for searching the frozen data that is stored in S3.
+
+Plan:
+- charge time to CIRT Ops Support     SPROJ.061
+- Don't use TF to manage the servers. 
+- stand up muliple ( minimum 3? ) centos7 servers with large EBS disks
+- stand up one SH for the indexers
+- add EC2 instance policies with access to the S3 buckets
+- use aws s3 cp to pull the data down
+    - Use zztop.sh ( see below ) script to pull down data faster
+    - Data going back to Jan 1, 2020 (1577836861)
+- Thaw data ( no license needed since it is not ingestion )
+    - no need to thaw the data if it has all the needed parts in the bucket ( tsidx files )
+- Install AFS splunk apps on SH
+    - Zip them up and upload them to S3
+    - Download them from S3 on the new SH
+- Hand over to SOC for searching
+
+Assumptions:
+- The AWS account will be deleted when we are done. 
+- No cluster master needed
+- Note: Data does not get replicated ( in the cluster ) from the thawed directory. So, if you thaw just a single copy of some bucket, instead of all the copies, only that single copy will reside in the cluster, in the thawed directory of the peer node where you placed it.
+
+Build the VPC in the same region as the data is located in S3!
+
+VPC info ( pick a CIDR that has not been used just in case you need to use transit gateway )
+afs-data-thaw
+10.199.0.0/22
+
+indexers: c5.4xlarge on-demand $7/day m5d.xlarge or larger
+1 TB EBS storage attached to instances
+search head: m5a.xlarge 
+centos7 AMI
+key: msoc-build
+instance role: default-instance-role
+naming scheme: afs-splunk-sh
+encrypt EBS with default key
+give AWS IAM user both Administrator and IAMfullaccess to be able to launch instances!
+
+## Indexes
+
+```
+Needed indexes:
+app_mscas           967.3 GB        done
+app_o365            1.3 TB          done    afs-splunk-idx-1
+av                  86.8 MB         done
+azure               149.4 GB        done
+ids                 17.4 GB         done
+network_firewall    8.2 TB          done
+network             99.4 GB         done
+threat_metrics      8.0 GB          done    afs-splunk-idx-1
+websense            771.1 GB        done
+wineventlog         5.0 TB          done
+zscaler             87.3 GB         done
+Total               16.6 TB
+```
+
+Use AWS console to calculate total size of folders in S3 bucket. This will help to see how many indexers are needed. 
+
+## Permissions for S3 bucket
+
+Steps in New AWS Account
+- add role and attach to EC2 instance
+- add policy to the role allowing KMS* and S3* permissions
+
+Steps in Old AWS Account
+- modify KMS role to allow for role in New Account
+- modify S3 bucket policy to allow for role in New Account
+
+New Account policy for role
+```
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Sid": "Stmt1610588870140",
+      "Action": "kms:*",
+      "Effect": "Allow",
+      "Resource": "*"
+    },
+    {
+      "Sid": "Stmt1610588903413",
+      "Action": "s3:*",
+      "Effect": "Allow",
+      "Resource": "*"
+    }
+  ]
+}
+```
+
+Changes for the Old Accout KMS key policy
+```
+        {
+            "Sid": "Allow use of the key",
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": [
+                    "arn:aws:iam::948010823789:role/default-instance-role",
+                    "arn:aws:iam::477548533976:role/mdr_powerusers",
+                    "arn:aws:iam::477548533976:role/msoc-default-instance-role"
+                ]
+            },
+            "Action": [
+                "kms:ReEncrypt*",
+                "kms:GenerateDataKey*",
+                "kms:Encrypt",
+                "kms:DescribeKey",
+                "kms:Decrypt"
+            ],
+            "Resource": "*"
+        }
+```
+
+Old Account S3 Bucket Policy
+```
+{
+    "Version": "2012-10-17",
+    "Id": "Policy1584399307003",
+    "Statement": [
+        {
+            "Sid": "DownloadandUpload",
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::948010823789:role/default-instance-role"
+            },
+            "Action": [
+                "s3:GetObject",
+                "s3:GetObjectVersion",
+                "s3:RestoreObject"
+            ],
+            "Resource": "arn:aws:s3:::mdr-afs-prod-splunk-frozen/*"
+        },
+        {
+            "Sid": "ListBucket",
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::948010823789:role/default-instance-role"
+            },
+            "Action": "s3:ListBucket",
+            "Resource": "arn:aws:s3:::mdr-afs-prod-splunk-frozen"
+        }
+    ]
+}
+```
+
+test permissions/access from second account
+`aws s3 ls s3://mdr-afs-prod-splunk-frozen`
+
+after the objects have been restored from Glacier try to download some objects
+`aws s3 cp s3://mdr-afs-prod-splunk-frozen/junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/ . --recursive --force-glacier-transfer`
+
+
+
+## Restore Glacier objects
+
+https://infinityworks.com/insights/restoring-2-million-objects-from-glacier/
+https://github.com/s3tools/s3cmd
+
+***s3cmd was the best option for restoring because it can pull the list of files for you and restore an entire directory recursivly***
+s3cmd at this time, does not work with assumeRole STS credentials ( Don't run the command from laptop, just run the command from a new instance with the permissions )
+
+Steps
+- ensure your awscli can access the S3 buckets
+- use s3cmd to restore objects
+
+Plan
+- Restore for 30 days
+- Standard restore ( cheaper than Expedited )
+
+test access for file restore from glacier
+`aws s3 ls s3://mdr-afs-prod-splunk-frozen`
+
+warning: Skipping file s3://mdr-afs-prod-splunk-frozen/junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/rawdata/slicesv2.dat. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
+
+Need to restore the data from glacier for a period of time.
+
+Restore TIER 
+Expedited   $$$     1-5 minutes less than 250MB
+Standard    $$      3-5 hrs (Default if not given)
+Bulk        $       5-12 hrs
+
+
+See s3cmd command down below! It is better than s3api command. 
+
+List objects
+```
+aws s3api list-objects-v2 --bucket mdr-afs-prod-splunk-frozen --prefix junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE --query "Contents[?StorageClass=='GLACIER']" --output text
+```
+
+Output the results to a file
+```
+aws s3api list-objects-v2 --bucket mdr-afs-prod-splunk-frozen --prefix junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > file.txt
+```
+
+Test access
+`aws s3api restore-object –restore-request Days=2 --bucket mdr-afs-prod-splunk-frozen --key junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/splunk-autogen-params.dat`
+
+`aws s3api restore-object --restore-request Days=2 --bucket mdr-afs-prod-splunk-frozen --key junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/rawdata/slicesv2.dat`
+
+All in one command
+```
+aws s3api list-objects-v2 --bucket mdr-afs-prod-splunk-frozen --prefix junk/frozendb/db_1594951175_1594844814_53_94F7BD8A-9043-487B-8BD5-41AA54D7A925 --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' | xargs -L 1 aws s3api restore-object --restore-request '{ "Days" : 2, "GlacierJobParameters" : { "Tier":"Expedited" } }' --bucket mdr-afs-prod-splunk-frozen --key
+```
+
+This just means, we are kinda busy right now try again later. Not an error in your code, but your code needs to retry the request to ensure the request gets processed. This only happened with expedited requests.
+```
+An error occurred (GlacierExpeditedRetrievalNotAvailable) when calling the RestoreObject operation (reached max retries: 2): Glacier expedited retrievals are currently not available, please try again later
+```
+
+```
+#!bin/sh
+for x in $(cat file.txt); do
+echo "Start restoring the file $x"
+aws s3api restore-object restore-request Days=2  "$x"
+echo "Completed restoring the file $x"
+done
+```
+
+Expedite that mother 
+```
+#!bin/sh
+TIER=Expedited
+#TIER=Standard
+#TIER=Bulk
+DAYS=2
+for x in $(cat file.txt); do
+echo "Start restoring the file $x"
+aws s3api restore-object --restore-request '{ "Days" : 2, "GlacierJobParameters" : { "Tier":"Expedited" } }' --bucket mdr-afs-prod-splunk-frozen --key $x
+echo "Completed restoring the file $x"
+done
+```
+
+With s3cmd! Be sure to use the exclude rb_* in the commmand, no need to restore the replicated buckets. 
+
+Just a bucket
+`./s3cmd restore --restore-priority=expedited --restore-days=2 --recursive s3://mdr-afs-prod-splunk-frozen/junk/frozendb/db_1566830011_1562776263_316_BBE343D5-D0D2-4120-A307-8B35B5E48D95/`
+
+Whole index
+`./s3cmd restore --restore-priority=standard --restore-days=30 --recursive s3://mdr-afs-prod-splunk-frozen/av/`
+
+Exclude rb_* 
+
+`time ./s3cmd restore --restore-priority=standard --restore-days=30 --recursive --exclude="frozendb/rb_*" s3://mdr-afs-prod-splunk-frozen/av/`
+
+Distribute load to all servers via salt
+`salt afs-splunk-idx-8 cmd.run '/root/s3cmd-2.1.0/s3cmd restore --restore-priority=standard --restore-days=30 --recursive s3://mdr-afs-prod-splunk-frozen/zscaler/' --async`
+
+## Splunk
+
+### Server Prep
+- hostnamectl set-hostname afs-splunk-idx-2
+- install salt-master ( SH only), salt-minion (which includes python3)
+`rpm --import https://repo.saltstack.com/py3/redhat/7/x86_64/archive/3002.2/SALTSTACK-GPG-KEY.pub`
+`vi /etc/yum.repos.d/saltstack.repo`
+```
+[saltstack-repo]
+name=SaltStack repo for RHEL/CentOS 7 PY3
+baseurl=https://repo.saltstack.com/py3/redhat/7/$basearch/archive/3002.2
+enabled=1
+gpgcheck=1
+gpgkey=https://repo.saltstack.com/py3/redhat/7/$basearch/archive/3002.2/SALTSTACK-GPG-KEY.pub
+```
+yum clean expire-cache
+yum install salt-minion -y 
+sed -i 's/#master: salt/master: 10.199.0.83/' /etc/salt/minion
+systemctl start salt-minion
+systemctl enable salt-minion
+- run salt states
+
+stupid chrome hates the TLS certificate. Type this to bypass Chrome block for self-signed cert. 
+thisisunsafe
+
+### Installation
+
+run salt states
+
+### Data Pull with Salt
+
+Use Duane's zztop.sh!!!!
+
+`cmd.run '/root/s3cmd-2.1.0/s3cmd get --recursive s3://mdr-afs-prod-splunk-frozen/threat_metrics/frozendb/ /opt/splunk/var/lib/splunk/threat_metrics/thaweddb/'`
+
+
+/root/s3cmd-2.1.0/s3cmd get --recursive s3://mdr-afs-prod-splunk-frozen/app_o365/frozendb/ /opt/splunk/var/lib/splunk/app_o365/thaweddb/
+
+### Thaw it out!
+
+No need to thaw it out! The data was not fully frozen and the data does not need to be rebuilt. 
+
+## GetObject
+
+Pull the file after it has been restored
+`aws s3 cp s3://mdr-afs-prod-splunk-frozen/junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/splunk-autogen-params.dat here.dat`
+
+With S3cmd
+`./s3cmd get --recursive s3://mdr-afs-prod-splunk-frozen/junk/frozendb/db_1598282957_1598264767_511_50F6EC26-9620-4CAA-802C-857CD78386CE/ /home/centos/test-dir-2/` 
+
+BEST
+
+STEPS
+- get file of all S3 objects
+- split file into 10 different files ( or number of indexers )
+- run file with zztop to download the files
+
+
+Make list of indexes
+`aws s3 ls s3://mdr-afs-prod-splunk-frozen | awk '{ print $2 }' > foo1`
+
+Make list of ALL buckets in each index
+`for i in $(cat foo1| egrep -v ^_); do aws s3 ls s3://mdr-afs-prod-splunk-frozen/${i}frozendb/ | egrep "db" | awk -v dir=$i '{ printf("s3://mdr-afs-prod-splunk-frozen/%sfrozendb/%s\n",dir,$2)}' ; done > bucketlist`
+
+break up list ( 10 indexers in this case )
+`cat bucketlist | awk '{ x=NR%10 }{print >> "indexerlist"x}'`
+
+create zztop.sh
+$ cat zztop.sh
+```
+#!/bin/bash
+DEST=$( echo $1 | awk -F/ '{ print "/opt/splunk/var/lib/splunk/"$4"/thaweddb/"$6 }' )
+mkdir -p $DEST
+/usr/local/aws-cli/v2/current/bin/aws s3 cp $1 $DEST --recursive --force-glacier-transfer --no-progress
+```
+
+Distribute files using salt
+```
+salt '*idx-2' cmd.run 'mkdir /root/s3cp'
+salt '*idx-2' cp.get_file salt://s3cp/indexerlist0 /root/s3cp/indexerlist0
+salt '*idx-2' cp.get_file salt://s3cp/zztop.sh /root/s3cp/zztop.sh
+salt '*idx-2' cmd.run 'chmod +x /root/s3cp/zztop.sh'
+```
+
+idx-2   indexerlist2    needs restart
+idx-3   indexerlist1    needs restart
+idx-4   indexerlist2    running
+idx-5   indexerlist3    running
+idx-6   indexerlist4    running
+idx-7   indexerlist5    running
+idx-8   indexerlist6    running
+idx-9   indexerlist7    running
+idx-10   indexerlist8   running
+idx-11   indexerlist9   running
+...
+
+distribute each list to an indexer and use zztop script with egrep and xargs to download all the buckets.
+
+tmux ( process multiple lines at the same time with -P flag )
+`egrep -h "*" indexerlist* | xargs -P 15 -n 1 ./zztop.sh`
+`egrep -h "*" indexerlist* | head -1 | xargs -P 10 -n 1 ./zztop.sh`