New Customer Setup Notes - GovCloud.md 13 KB

New Customer Setup Notes - GovCloud

How to set up a new customer in govcloud

Future TODO:

  • Find a way to seed the splunk secrets without putting them in the git history?

Assumptions

Assumes your github repos are right off your ~ directory. Adjust paths accordingly. Assumes this is a fresh account. Assumes you're on Mac OSX

Prerequisites:

# There may be more that I just already had. If you find them, add them:
pip3 install passlib

Get your own OKTA API Key (Don't use Fred's)

If you don't have an OKTA API key then you should go get one.

Step x: Bootstrap the account

Follow the instructions in (AWS New Account Setup Notes.md) to bootstrap the account.

Step x: Gather information

You will need the following. Setting environment variables will help with some of the future steps, but manual substitution can be done, too.

IMPORTANT NOTE: Each time you run this, it will generate new passwords. So make sure you use the same window to perform all steps!

Commands tested on OSX and may not (probably won't) work on windows/linux.

export OKTA_API_TOKEN=<YOUR OKTA API KEY>
INITIALS=ftd
TICKET=MSOCI-1513
# prefix should have hyphens
CUSTOMERPREFIX=modelclient
PASS4KEY=`uuidgen | tr '[:upper:]' '[:lower:]'`
DISCOVERYPASS4KEY=`uuidgen | tr '[:upper:]' '[:lower:]'`
ADMINPASS="`openssl rand -base64 24`"
MINIONPASS="`openssl rand -base64 24`"
ESSJOBSPASS="`openssl rand -base64 24`"
# If the below doesn't work for you, generate your SHA-512 hashes for splunk however you'd like
ADMINHASH="`echo $ADMINPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
MINIONHASH="`echo $MINIONPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
ESSJOBSHASH="`echo $ESSJOBSPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"

Step x: Record Passwords in Vault

Connect to production VPN

Log into vault at https://vault.pvt.xdr.accenturefederalcyber.com (legacy: https://vault.mdr.defpoint.com)

Record the following into secrets/engineering/customer_slices/${CUSTOMERPREFIX}

echo $ADMINPASS  # record as `${CUSTOMERPREFIX}-splunk-cm admin`

At this time, we don't set the others on a per-account basis through salt, though it looks like admin password has been changed for some clients.

Step x: Update and Branch Git

cd ~/msoc-infrastructure
git checkout develop
git fetch --all
git pull origin develop
git checkout -b feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

cd ~/xdr-terraform-live
git checkout master
git fetch --all
git pull origin master
git checkout -b feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

Step x: Set up Okta

cd ~/msoc-infrastructure/tools/okta_app_maker

./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk SH [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk.pvt.xdr.accenturefederalcyber.com"
./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk CM [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk-cm.pvt.xdr.accenturefederalcyber.com:8000"
./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk HF [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk-hf.pvt.xdr.accenturefederalcyber.com:8000"

Each run of okta_app_maker.py will generate output similar to:

{% if grains['id'].startswith('<REPLACEME>') %}
auth_method: "saml"
okta:
  # This is the entityId / IssuerId
  uid: "http://www.okta.com/exk5kxd31hsbDuV7m297"
  # Login URL / Signon URL
  login: "https://mdr-multipass.okta.com/app/mdr-multipass_modelclientsplunkshtestgc_1/exk5kxd31hsbDuV7m297/sso/saml"
{% endif %}

Substite REPLACEME with ${CUSTOMERPREFIX}-splunk-sh, -cm, or -hf and record them.. You will need all 3.

Add permissions for the okta apps: 1) Log into the okta webpage (https://mdr-multipass.okta.com/) 1) Go to Admin->Applications 1) for each ${CUSTOMERPREFIX} site, click 'Assign to Groups' and add the following groups:

  • For Search heads:
    • Analyst
    • mdr-admins
    • mdr-engineers
  • For CM:
    • mdr-admins
    • mdr-engineers

Step x: Add the license file to salt

mkdir ~/msoc-infrastructure/salt/fileroots/splunk/files/licenses/${CUSTOMERPREFIX}
cd ~/msoc-infrastructure/salt/fileroots/splunk/files/licenses/${CUSTOMERPREFIX}
# Copy license into this directory. 
# If license is not yet available, ... ? Not sure. For testing, I copied something in there but that's not a good practice.

Step x: Set up Scaleft

  • Add the "project"
  • Assign groups to the project
    • mdr-admins: admin / sync group yes
    • mdr-engineers: user / sync group no
  • Make an enrollment token
  • Put the enrollment token ~/msoc-infrastructure/salt/pillar/os_settings.sls, under the jinja if/else

Step x: Set up the pillars

Each customer gets a pillars file for its own variables.

(n.b. DISCOVERYPASS4KEY must be done before PASS4KEY to replace correctly)

cd ~/msoc-infrastructure/salt/pillar/

# Append the customer variables to a topfile
echo "  '${CUSTOMERPREFIX}*':" >> top.sls
echo "    - ${CUSTOMERPREFIX}_variables" >> top.sls

# Generate the password file
cat customer_variables.sls.skeleton \
  | sed s#PREFIX#${CUSTOMERPREFIX}#g \
  | sed s#DISCOVERYPASS4KEY#${DISCOVERYPASS4KEY}#g \
  | sed s#PASS4KEY#${PASS4KEY}#g \
  | sed s#MINIONPASS#${MINIONPASS}#g \
  | sed s#ESSJOBSPASS#${ESSJOBSPASS}#g \
  > ${CUSTOMERPREFIX}_variables.sls

# Append okta configuration
cat >> ${CUSTOMERPREFIX}_variables.sls
# Paste the 3 okta entries here, and finish with ctrl-d

Review the file to make sure everything looks good.

Add to gitfs:

vim salt_master.conf
# Copy one of the customer_repos with the new customer
vim ~/msoc-infrastructure/salt/fileroots/salt_master/files/etc/salt/master.d/default_acl.conf
git add salt_master.conf ~/msoc-infrastructure/salt/fileroots/salt_master/files/etc/salt/master.d/default_acl.conf

Migrate pillars through to master:

git add top.sls ${CUSTOMERPREFIX}_variables.sls
git commit -m "Adds ${CUSTOMERPREFIX} variables. Will promote to master immediately."
git push origin feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

Follow the link to create the PR, and then submit another PR to master.

Step x: Update the salt master

Once approved, update the salt master

ssh gc-prod-salt-master
sudo vim /etc/salt/master.d/default_acl.conf
# Grant users access to the new prefix
# save and exit
salt 'salt*' cmd.run 'salt-run fileserver.update'
sudo service salt-master restart
exit

Step x: Create customer repositories

For now, we only use a repository for the CM. Clearly, we need one for the others.

Create a new repository using the cm template:

  1. Browse to https://github.mdr.defpoint.com/mdr-engineering/msoc-skeleton-cm
  2. Click "use this template" a. Name the new repository msoc-${CUSTOMERPREFIX}-cm b. Give it hte description: Splunk Cluster Master Configuration for [CUSTOMER DESCRIPTION] c. Set permissions to 'Private' d. Click 'create repository from template'
  3. Click on 'Settings', then 'Collaborators and Teams', and add the following:
    • infrastructure - Admin
    • automation - Read
    • onboarding - Write

Clone and modify the password (TODO: Just take care of this in salt):

mkdir ~/tmp
cd ~/tmp
git clone git@github.mdr.defpoint.com:mdr-engineering/msoc-${CUSTOMERPREFIX}-cm.git
cd msoc-${CUSTOMERPREFIX}-cm
sed -i "" "s#ADMINHASH#${ADMINHASH}#" passwd
sed -i "" "s#MINIONHASH#${MINIONHASH}#" passwd
git add passwd
git commit -m "Stored hashed passwords"
git push origin master

Step x: Set up xdr-terraform-live account

During the bootstrap process, you copied the skeleton across. Review the variables.

cd ~/xdr-terraform-live/prod/aws-us-gov/${CUSTOMERPREFIX}
vim account.hcl # Fill in all "TODO" items. Leave the "LATER" variables for later steps.

Step x: Add account to global variables, and apply necessary prerequisites

  1. Add the account number to account_map["prod"] in :
    • ~/xdr-terraform-live/prod/aws-us-gov/partition.hcl
    • ~/xdr-terraform-live/common/aws-us-gov/partition.hcl
  2. cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2
  3. Apply the modules: (* draft: are there more external requirements? *)

    for module in 005-account-standards-c2 008-transit-gateway-hub
    do
    pushd $module
    terragrunt apply
    popd
    done
    
  4. cd ~/xdr-terraform-live/common/aws-us-gov/afs-mdr-common-services-gov/

  5. Apply the modules:

    for module in 008-xdr-binaries 010-shared-ami-key 
    do
    pushd $module
    terragrunt apply
    popd
    done
    

Step x: Apply the Terraform in order

The xdr-terraform-live/bin directory should be in your path. You will need it for this step:

(n.b., if you are certain everything is good to go, you can do a yes yes | before the terragrunt-apply-all to bypass prompts. This does not leave you an out if you make a mistake, however, becasue it is difficult to break out of terragrunt/terraform without causing issues.)

cd ~/xdr-terraform-live/prod/aws-us-gov/${CUSTOMERPREFIX}
terragrunt-apply-all --skipqualys

You might run into an error when applying the VPC module 010-vpc-splunk. Error reads as:

Error: Invalid for_each argument
  on tgw.tf line 26, in resource "aws_route" "route_to_10":
  26:   for_each = toset(concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids))
The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

Workaround is:

cd 010-vpc-splunk
terragrunt apply -target module.vpc
terragrunt apply

Step x: Connect to Qualys

For complete details, see https://github.mdr.defpoint.com/mdr-engineering/msoc-infrastructure/wiki/Qualys.

Short version:

  1. Browse to https://mdr-multipass.okta.com/, and pick qualys
  2. In Qualys Console click "Create EC2 Connector", this will pop a wizard.
  3. Name the connector and pick account type based on partition
  4. Copy the External ID and put it in account.hcl as qualys_connector_externalid (search for 'LATER')
  5. Apply the terraform, it will output qualys_role_arn
  6. Copy that into the Qualys console, hit "Continue"
  7. Pick the Regions that should be in scope (all of them), hit "Continue"
  8. Check all the "Automatically Activate" buttons and pick the tag(s)
  9. Should be done with the wizard now. Back in the main list view click the drop-down and pick "Run" to pull current Assets

It should come back with a number of assets, no errors, and a hourglass for a bit.

Step x: Finalize the Salt

Substitute environment variables here:

ssh gc-prod-salt-master
CUSTOMERPREFIX=<entercustomerprefix}
sudo salt-key -L | grep $CUSTOMERPREFIX # Wait for all 6 servers to be listed (cm, sh, hf, and 3 idxs)
sleep 300 # Wait 5 minutes
salt ${CUSTOMERPREFIX}\* test.ping
# Repeat until 100% successful
salt ${CUSTOMERPREFIX}\* state.highstate --output-diff
# Review changes from above. Though i've seen indexers get hung. If they do, see note below
# splunk_service may fail, this is expected (it's waiting for port 8000)
salt ${CUSTOMERPREFIX}\* system.reboot
# Wait 5 minutes
salt ${CUSTOMERPREFIX}\* test.ping
# Apply the cluster bundle
salt ${CUSTOMERPREFIX}\*-cm\* state.sls splunk.master.apply_bundle_master --output-diff
exit

Note: If systems get hung on their bootup highstate.

System hangs appear to be because of a race condition with startup of firewalld and its configuration. No known solution at this time. If it hangs, you can reboot the systme and apply a highstate again.

Update the salt pillars with the encrypted forms

TODO: Document a step of updating the pillars/${CUSTOMERPREFIX}_variables.sls with encrypted forms of the passwords.

Step x: Test

Log into https://${CUSTOMERPREFIX}-splunk.pvt.xdr.accenturefederalcyber.com

It should "just work".

  1. UI on the cluster master look sane?
  2. Can you SAML into everything?
  3. Look on the indexers via CLI does everthing look believable

Some helpful sanity check searches

Should see 3 indexers:

index=_* | stats count by splunk_server

Should see all but the HF:

index=_* | stats count by host

Note from Fred: I'm leaving this next one here, copied from the legacy instructions, but I'm not sure where it's supposed to be run. My test on the search head didn't have any results.

Note from Duane: Should work anywhere. Main goal was to see that the cluster bundle got pushed correctly and all the indexes we were expecting to see were listed. I should probably improve this search at some point.

| rest /services/data/indexes splunk_server=*splunk-i*
| stats values(homePath_expanded) as home, values(coldPath_expanded) as cold, values(tstatsHomePath_expanded) as tstats by title 
| sort home

Additional tasks:

Splunk configuration

  • Install ES on the search head

Monitoring Console

  • Add cluster to monitoring console
  • Peer with CM, SH, and HF
  • Update MC topology

Create New Vault Engine for Customer for Feed Management

Naming Scheme: onboarding- Example: onboarding-la-covid

Keep George Happy and push out maxmind

salt -C '*splunk-indexer* or *splunk-sh* or *splunk-hf*' state.sls splunk.maxmind.pusher --state-verbose=False --state-output=terse

Got POP nodes? Ensure they are talking to Moose Splunk for Splunk UFs