Rough process to migrate from Commercial to GovCloud. Work in progress.
Verify Splunk is operating correctly:
On the cm, run:
sudo -u splunk /opt/splunk/bin/splunk show cluster-status
:warning: Remember to check the 'No Reboot' box!
Create "final" snapshots of the SH, CM, and HF on aws.
Name: moose-splunk-hf-FinalSnapshot-20200115
Description: Final snapshot before migration to GC
cd ~/msoc-infrastructure
git fetch --all
git checkout develop
git pull origin
git checkout -b feature/ftd_MSOCI-XXXX_MigrateXXXX
cd ~/msoc-infrastructure/tools/okta_app_maker
export OKTA_API_TOKEN=FILLMEIN
SPLUNK_PREFIX=moose
SPLUNK_PREFIX_CAP=Moose
ENVIRONMENT=prod
ENVIRONMENT_CAP=Prod
ENVIRONMENT_DNS=xdr # Alternative xdrtest
./okta_app_maker.py "${SPLUNK_PREFIX_CAP} Splunk SH [${ENVIRONMENT_CAP}] [GC]" "https://${SPLUNK_PREFIX}-splunk.pvt.${ENVIRONMENT_DNS}.accenturefederalcyber.com"
./okta_app_maker.py "${SPLUNK_PREFIX_CAP} Splunk CM [${ENVIRONMENT_CAP}] [GC]" "https://${SPLUNK_PREFIX}-splunk-cm.pvt.${ENVIRONMENT_DNS}.accenturefederalcyber.com:8000"
./okta_app_maker.py "${SPLUNK_PREFIX_CAP} Splunk HF [${ENVIRONMENT_CAP}] [GC]" "https://${SPLUNK_PREFIX}-splunk-hf.pvt.${ENVIRONMENT_DNS}.accenturefederalcyber.com:8000"
# For moose only:
./okta_app_maker.py "Qcompliance [${ENVIRONMENT_CAP}] [GC]" "https://qcompliance-splunk.pvt.${ENVIRONMENT_DNS}.accenturefederalcyber.com:8000"
Update ~/msoc-infrastructure/salt/pillar/xxxx_variables.sls with the values from the scripts.
cd ~/msoc-infrastructure/terraform/CUSTOMERDIRECTORY
tfswitch
# modify module "CUST_cluster" in "moose.tf" or "main.tf", and add:
# migration_cidr = [ "10.40.16.0/22" ] # Determine actual CIDR block for vpc-splunk in the new account
Commit to GIT and do a PR
terraform init
terraform workspace select prod
terraform apply
DEPRECATED: I don't think this is right. The existing security group should be updated.
~It's possible the This will update the security group, but due to the asg, it will not recreate the indexers. Rather than rebuilding each, use the AWS console to modify each indexer.~
If this is a brand new account, initialize following the usual processes, but do not apply the splunk modules yet.
If this is an existing account:
cd ~/xdr-terraform-live/
git checkout master
git fetch --all
git pull
git checkout -b feature/ftd_MSOCI-XXXX_MigrateCUSTToGC
cp -r 000-skeleton/{140-splunk-frozen-bucket,150-splunk-cluster-master,160-splunk-indexer-cluster,170-splunk-searchhead,180-splunk-heavy-forwarder} prod/aws-us-gov/CUSTOMERDIRECTORY/
git add prod/aws-us-gov/CUSTOMERDIRECTORY/{140-splunk-frozen-bucket,150-splunk-cluster-master,160-splunk-indexer-cluster,170-splunk-searchhead,180-splunk-heavy-forwarder}
Edit prod/aws-us-gov/CUSTOMERDIRECTORY/account.hcl
and verify the following variables:
vpc_info['vpc-splunk']
should be the correct subnet (Something out of 10.42.0.0/16)splunk_data_sources
should be an array of IPs that have access. (Maybe copied from ~/msoc-infrastructure/terraform/common/variables.tf
splunk_legacy_cidr
should be the legacy subnetsplunk_asg_sizes
should be [1, 1, 1] unless additional instances are neededsplunk_volume_sizes
, probably copied from elsewhereinstance_types
Commit and do a PR for approval
cd ~/xdr-terraform-live/test/aws-us-gov/mdr-test-c2/140-splunk-frozen-bucket
terragrunt apply
cd ~/msoc-infrastructure/salt/pillar
vim moose_variables.sls
# copy the saml settings, applying different settings in an {% if %} clause for 'pvt.xdr.accenturefederalcyber.com' (see moose_variables.sls)
# and update with the information obtained in the following steps.
By now, your PR for salt should have been merged. Make sure it's on all the salt masters.
tshp salt-master
salt-run fileserver.update
salt '*' saltutil.sync_all # optional but why not?
salt '*' saltutil.refresh_pillar # optional but why not?
cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2/150-splunk-cluster-master
terragrunt apply
ssh gc-dev-salt-master
salt 'moose-splunk-cm.pvt.xdr.accenturefederalcyber.com' state.highstate --output-diff
# run it twice
salt 'moose-splunk-cm.pvt.xdr.accenturefederalcyber.com' state.highstate --output-diff
# reboot to ensure all changes are active
salt 'moose-splunk-cm.pvt.xdr.accenturefederalcyber.com' system.reboot
watch "salt 'moose-splunk-cm.pvt.xdr.accenturefederalcyber.com' test.ping"
Run an initial rsync of the remote host while its running to to do an initial staging. This will reduce total time that we are running with reduced redundancy.
Prep for scp:
# generate key on new
tshp CUST-splunk-cm
sudo systemctl stop splunk
sudo systemctl disable splunk
sudo su - splunk
ssh-keygen
# enter x3
cat ~/.ssh/id_rsa.pub
exit
# authorize key on old
tshp CUST-splunk-cm.msoc.defpoint.local
mkdir .ssh
cat >> .ssh/authorized_keys
# paste from above
exit
# Validate that it's working
tshp CUST-splunk-cm
sudo su - splunk
ssh frederick.t.damstra@CUST-splunk-cm.msoc.defpoint.local
rsync legacy to local:
tshp CUST-splunk-cm
sudo su - splunk
# this can be run multiple times without issue. You may wish to do
# it first before you've stopped splunk to minimize the interruption.
time rsync --rsync-path="sudo rsync" -avz --delete --progress \
frederick.t.damstra@CUST-splunk-cm.msoc.defpoint.local:/opt/splunk/ /opt/splunk/ \
--exclude="*.log" --exclude '*.log.*' --exclude '*.bundle' --exclude ".ssh"
tshp salt-master
# stop splunk on old and new
salt 'CUST-*-cm*' service.stop splunk
salt 'CUST-splunk-cm.msoc.defpoint.local' service.disable splunk
exit
rsync legacy to local:
tshp CUST-splunk-cm
sudo su - splunk
# this can be run multiple times without issue. You may wish to do
# it first before you've stopped splunk to minimize the interruption.
time rsync --rsync-path="sudo rsync" -avz --delete --progress \
frederick.t.damstra@CUST-splunk-cm.msoc.defpoint.local:/opt/splunk/ /opt/splunk/ \
--exclude="*.log" --exclude '*.log.*' --exclude '*.bundle' --exclude ".ssh"
Fix references:
find /opt/splunk -type f -name "*.conf" -exec grep -Hi msoc.defpoint.local {} \;
# fix anything found
cd
cd tmp
git clone git@github.xdr.accenturefederalcyber.com:mdr-engineering/msoc-moose-cm.git
cd msoc-moose-cm/
grep msoc `find . -type f`
# fix anything found
# Fix the coldToFrozen script: NOT ALL CUSTOMERS HAVE THIS
vim master-apps/TA-Frozen-S3/bin/coldToFrozenS3.py
# make to match ~/msoc-infrastructure/salt/fileroots/splunk/files/coldToFrozenS3.py
These changes will be deployed later.
ssh gc-prod-moose-splunk-cm
sudo chown -R splunk:splunk /opt/splunk
sudo systemctl start splunk
# check for issues:
sudo tail -F /opt/splunk/var/log/splunk/splunkd.log
# Make sure the bundle is active
sudo -u splunk /opt/splunk/bin/splunk show cluster-status
sudo -u splunk /opt/splunk/bin/splunk validate cluster-bundle
sudo -u splunk /opt/splunk/bin/splunk show cluster-bundle-status
sudo -u splunk /opt/splunk/bin/splunk apply cluster-bundle
sudo -u splunk /opt/splunk/bin/splunk show cluster-bundle-status
# optional but recommended
sudo -u splunk /opt/splunk/bin/splunk enable maintenance-mode
Get list of indexers:
tshp ls | grep moose | grep indexer
For each indexer, as quickly as possible:
ssh dev-moose-splunk-indexer-i-00d5ea4121238bb1b
sudo sed -i 's/^master_uri.*$/master_uri = https:\/\/moose-splunk-cm.pvt.xdr.accenturefederalcyber.com:8089/g' /opt/splunk/etc/system/local/server.conf
sudo sed -i 's/msoc.defpoint.local/pvt.xdr.accenturefederalcyber.com/g' /opt/splunk/etc/apps/license_slave/local/server.conf
sudo systemctl stop splunk
sudo systemctl start splunk
# verify via 'show cluster-status' on cm that it joined
sudo tail -F /opt/splunk/var/log/splunk/splunkd.log
ssh gc-prod-moose-splunk-cm
sudo -u splunk /opt/splunk/bin/splunk show cluster-status
sudo -u splunk /opt/splunk/bin/splunk disable maintenance-mode
sudo -u splunk /opt/splunk/bin/splunk apply cluster-bundle
# this last one should error, but it's a failsafe
ssh prod-moose-splunk-sh
sudo sed -i 's/msoc.defpoint.local/pvt.xdr.accenturefederalcyber.com/g' /opt/splunk/etc/apps/license_slave/local/server.conf
sudo sed -i 's/msoc.defpoint.local/pvt.xdr.accenturefederalcyber.com/g' /opt/splunk/etc/apps/moose_sh_outputs/default/outputs.conf
sudo sed -i 's/moose-splunk-cm/moose-splunk-cm.pvt.xdr.accenturefederalcyber.com/g' /opt/splunk/etc/apps/connected_clusters/local/server.conf
sudo systemctl restart splunk
sudo tail -F /opt/splunk/var/log/splunk/splunkd.log
Edit salt and update any references to the CM:
cd ~/msoc-infrastructure/salt/pillar
grep "moose-splunk-cm" `find . -type f`
# vim vim vim
git commit
git push
Then update the salt master, and:
ssh gc-prod-salt-master
sudo salt-run fileserver.update
salt '*' saltutil.refresh_pillar
# Don't want to start splunk on the old cluster master, though it likely wouldn't do anything.
salt -C 'moose* not moose-splunk-cm.msoc*' state.highstate --output-diff --force-color test=true 2>&1 | less -iSR
# validate you expect the changes, and then do it for real:
salt -C 'moose* not moose-splunk-cm.msoc*' state.highstate --output-diff test=false
This should only be necessary for moose:
salt '*.msoc.defpoint.local state.sls internal_splunk_forwarder --output-diff --force-color test=true
salt '*.msoc.defpoint.local state.sls internal_splunk_forwarder --output-diff --force-color test=false
salt '*.pvt*' state.sls internal_splunk_forwarder --output-diff --force-color test=true
salt '*.pvt*' state.sls internal_splunk_forwarder --output-diff --force-color test=false
#########################################
Good idea to check currrent usage:
salt 'moose-splunk-i*' cmd.run 'df -h | grep opt'
Terraform the cluster:
cd ~/xdr-terraform-live/test/aws-us-gov/mdr-test-c2/160-splunk-indexer-cluster
terragrunt apply
Verify the indexers come online:
ssh gc-prod-salt-master
sudo salt-key -L | grep idx
exit
date
ssh gc-dev-moose-splunk-cm
# I believe it can take up to about a half hour to come online
sudo -u splunk /opt/splunk/bin/splunk show cluster-status
# you may need ot highstate a second time? Not clear what happens to cause this to work sometimes and not others
All converted clients need the legacy hec module. This is not included in the skeleton directory.
Update salt:
This may only be necessary for moose, but check for other hecs, as well:
vim ~/xdr-terraform-live/prod/env.hcl
# Update hec, hec_pub, and hec_pub_ack
cd ~/xdr-terraform-live/prod/aws/legacy-mdr-prod/045-kinesis-firehose-waf-logs
terragrunt apply
cd ~/xdr-terraform-live/prod/aws/legacy-mdr-prod/045-kinesis-portal-data-sync
terragrunt apply
Log into the console and check the old HEC to get a feel for what sort of traffic to expect.
cd ~/msoc-infrastructure/terraform/100-moose
vim moose.tf
# Change create_hec_lb to false
tfswitch
terraform init
terraform workspace select prod
terraform apply
Note: HEC communications will be down until the next step is completed
cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2/165-splunk-legacy-hec
tfswitch
terragrunt init
terragrunt apply
numerous salt things... this only applies to moose so notes not kept
Validate that traffic is hitting the new HEC in a similar fashion to the old (remember that the negative TTL of 1 hour may apply)
On each indexer:
sudo systemctl disable splunk
sudo -u splunk /opt/splunk/bin/splunk offline --enforce-counts
sudo tail -F /opt/splunk/var/log/splunk/splunkd.log
Watch the status via the cluster master before moving on to the next:
sudo -u splunk /opt/splunk/bin/splunk show cluster-status
If you get integrity warnings, it is most likely because the 'sed' above added a newline. To fix, for each file that has the warning:
vim {file}
:set binary
:set noeol
:w!
ZZ