New Customer Setup Notes - GovCloud.md 45 KB

New Customer Setup Notes - GovCloud

How to set up a new customer in govcloud

Future TODO:

  • Find a way to seed the splunk secrets without putting them in the git history?

Assumptions

Assumes your github repos are right off your ~ directory. Adjust paths accordingly. Assumes this is a fresh AWS account. Assumes you're on Mac OSX

Prerequisites:

# There may be more that I just already had. If you find them, add them:
pip3 install passlib
pip3 install requests
pip3 install dictdiffer

Get your own OKTA API Key (Don't use Fred's)

If you don't have an OKTA API key then you should go get one. See Okta API to Create token

Step x: Bootstrap the account

Follow the instructions in AWS New Account Setup Notes to bootstrap the account.

Step x: Gather information

You will need the following. Setting environment variables will help with some of the future steps, but manual substitution can be done, too.

:warning: IMPORTANT: Each time you run this, it will generate new passwords. So make sure you use the same window to perform all steps!

Do you have a Splunk license yet? No? Can you use a temp/dev license until the real one shows up? I hate doing that, but not much of a choice.

Do you know what size Splunk Indexers to use? Ask Wes to provide this information. Or look in the finance folder in OneDrive ( XDR Finance Private Group > Documents > General > Cost build up ). You might not have permissions to view it.

Commands tested on OSX and may not (probably won't) work on windows/linux.

export OKTA_API_TOKEN=<YOUR OKTA API KEY>
INITIALS=bp
TICKET=MSOCI-1550
# prefix should have hyphens
CUSTOMERPREFIX=modelclient
PASS4KEY=`uuidgen | tr '[:upper:]' '[:lower:]'`
DISCOVERYPASS4KEY=`uuidgen | tr '[:upper:]' '[:lower:]'`
ADMINPASS="`openssl rand -base64 24`"
MINIONPASS="`openssl rand -base64 24`"
REPORTSPASS="`openssl rand -base64 24`"
ESSJOBSPASS="`openssl rand -base64 24`"
# If the below doesn't work for you, generate your SHA-512 hashes for splunk however you'd like
ADMINHASH="`echo $ADMINPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
MINIONHASH="`echo $MINIONPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
ESSJOBSHASH="`echo $ESSJOBSPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"

Step x: Record Passwords in Vault

Connect to Production VPN

Log into Vault at PROD Vault

Record the following into engineering/customer_slices/${CUSTOMERPREFIX}

echo $ADMINPASS  # record as `${CUSTOMERPREFIX}-splunk-cm admin`
echo "${CUSTOMERPREFIX}-splunk-cm admin"

At this time, we don't set the others on a per-account basis through Salt, though it looks like admin password has been changed for some clients.

Step x: Update and Branch Git

You may have already created a new branch in xdr-terraform-live in a previous step.

cd ~/msoc-infrastructure
git checkout develop
git fetch --all
git pull origin develop
git checkout -b feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

#if needed...
cd ~/xdr-terraform-live
git checkout master
git fetch --all
git pull origin master
git checkout -b feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

cd ~/xdr-terraform-modules
git checkout master
git fetch --all
git pull origin master
git checkout -b feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

Step x: Set up Okta

Notice: the CUSTOMERPREFIX will keep the case used. You might want to manually input uppercase ${CUSTOMERPREFIX} instead of using lowercase ${CUSTOMERPREFIX} here. OR add code to force uppercase in okta_app_maker.py.

cd tools/okta_app_maker

./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk SH [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk.pvt.xdr.accenturefederalcyber.com"
./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk CM [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk-cm.pvt.xdr.accenturefederalcyber.com:8000"
./okta_app_maker.py ${CUSTOMERPREFIX}' Splunk HF [Prod] [GC]' "https://${CUSTOMERPREFIX}-splunk-hf.pvt.xdr.accenturefederalcyber.com:8000"

Each run of okta_app_maker.py will generate output similar to:

{% if grains['id'].startswith('<REPLACEME>') %}
auth_method: "saml"
okta:
  # This is the entityId / IssuerId
  uid: "http://www.okta.com/exk5kxd31hsbDuV7m297"
  # Login URL / Signon URL
  login: "https://mdr-multipass.okta.com/app/mdr-multipass_modelclientsplunkshtestgc_1/exk5kxd31hsbDuV7m297/sso/saml"
{% endif %}

Substite <REPLACEME> with ${CUSTOMERPREFIX}-splunk-sh, -cm, or -hf and record them.. You will need all 3.

Add permissions for the Okta apps: 1) Log into the Okta webpage - MDR Multipass 1) Go to Admin->Applications 1) For each ${CUSTOMERPREFIX} application, click 'Assign to Groups' and add the following groups:

  • For Search heads:
    • Analyst
    • mdr-admins
    • mdr-engineers
  • For CM & HF:
    • mdr-admins
    • mdr-engineers

1) While logged into OKTA, add the Splunk logo to the Apps. It is located in msoc-infrastructure/tools/okta_app_maker

Step x: Add the license file to salt

cd ~/msoc-infrastructure
mkdir salt/fileroots/splunk/files/licenses/${CUSTOMERPREFIX}
cd salt/fileroots/splunk/files/licenses/${CUSTOMERPREFIX}
# Copy license into this directory. 
# Rename license to match this format trial-<license-size>-<expiration-date>.lic
# e.g. trial-15gb-20210305.lic
# If license is not a trial, match this format SO<sales order number>_PO<purchase order number>.lic
# e.g. SO180368_PO7500026902.lic
# If license is not yet available, ... ? Not sure. For testing, I copied something in there but that's not a good practice.

Step x: Set up the pillars

Each customer gets a pillars file for its own variables. If you are setting up the syslog servers with Splunk, you will need to replace the FIXME value in the deployment_server pillar. The correct value of the deployment_server pillar is a customer provided DNS address pointing to the IP of the LCP deployment server.

:warning: IMPORTANT - In your sed commands, DISCOVERYPASS4KEY must be done before PASS4KEY to replace correctly.

#cd ~/msoc-infrastructure/salt/pillar/
cd ../../../../../pillar/

# Append the customer variables to a topfile
echo "  '${CUSTOMERPREFIX}*':" >> top.sls
echo "    - ${CUSTOMERPREFIX}_variables" >> top.sls

# Generate the password file
cat customer_variables.sls.skeleton \
  | sed s#PREFIX#${CUSTOMERPREFIX}#g \
  | sed s#DISCOVERYPASS4KEY#${DISCOVERYPASS4KEY}#g \
  | sed s#PASS4KEY#${PASS4KEY}#g \
  | sed s#MINIONPASS#${MINIONPASS}#g \
  | sed s#REPORTSPASS#${REPORTSPASS}#g \
  | sed s#ESSJOBSPASS#${ESSJOBSPASS}#g \
  > ${CUSTOMERPREFIX}_variables.sls

# Append okta configuration
cat >> ${CUSTOMERPREFIX}_variables.sls
# Paste the 3 okta entries here, and finish with ctrl-d

Review the file to make sure everything looks good. IF there will be multiple LCP clusters, you can setup the file for multiple serverclass.conf and multiple deployment servers. vim ${CUSTOMERPREFIX}_variables.sls

Add to gitfs pillars and allow salt access:

# In the salt/pillar/salt_master.sls file, copy one of the customer_repos and update with the new customer prefix. Update both the CM repo and the DS repo (deployment_servers), unless you know there will not be LCP/POP nodes. 
vim salt/pillar/salt_master.sls

# Add customer prefix to ACL
vim ../fileroots/salt_master/files/etc/salt/master.d/default_acl.conf
:%s/ca-c19\*/ca-c19\* or dgi\*/g

# Add Account number to xdr_asset_inventory.sh under GOVCLOUDACCOUNTS
vim ../fileroots/salt_master/files/xdr_asset_inventory/xdr_asset_inventory.sh


Migrate changes through to master branch:

git add ../fileroots/splunk/files/licenses/${CUSTOMERPREFIX}/<your-license-file>
git add ../fileroots/salt_master/files/etc/salt/master.d/default_acl.conf
git add ../fileroots/salt_master/files/xdr_asset_inventory/xdr_asset_inventory.sh
git add salt_master.sls top.sls ${CUSTOMERPREFIX}_variables.sls os_settings.sls
git commit -m "Adds ${CUSTOMERPREFIX} variables. Will promote to master immediately."
git push origin feature/${INITIALS}_${TICKET}_CustomerSetup_${CUSTOMERPREFIX}

Follow the link to create the PR, and then submit another PR to master and get the changes merged in to master branch.

Step x: Apply changes to salt master

Apply the changes to the xdr_asset_inventory salt salt* state.sls salt_master --output-diff

Step x: Create customer repositories

For now, we only use a repository for the CM and POP. Clearly, we need one for the others.

Create a new repository using the cm template:

  1. Browse to msco-skeleton-cm
  2. Click "use this template"

    a. Name the new repository msoc-${CUSTOMERPREFIX}-cm

    b. Give it the description: Splunk Cluster Master Configuration for [CUSTOMER DESCRIPTION]

    c. Set permissions to 'Private'

    d. Click 'create repository from template'

  3. Click on 'Settings', then 'Collaborators and Teams', and add the following:

    • infrastructure - Admin
    • automation - Read
    • onboarding - Write

Repeat for pop repo, unless customer will not have pop nodes.

  1. Browse to msoc-skeleton-pop
  2. Click "use this template"

    a. Name the new repository msoc-${CUSTOMERPREFIX}-pop

    b. Give it the description: Splunk POP Configuration for [CUSTOMER DESCRIPTION]

    c. Set permissions to 'Private'

    d. Click 'create repository from template'

  3. Click on 'Settings', then 'Collaborators and Teams', and add the following:

    • infrastructure - Admin
    • automation - Read
    • onboarding - Write

Clone and modify the CM repo:

mkdir ~/tmp
cd ~/tmp
# alternate cd ../../../
git clone git@github.xdr.accenturefederalcyber.com:mdr-engineering/msoc-${CUSTOMERPREFIX}-cm.git
cd msoc-${CUSTOMERPREFIX}-cm
sed -i "" "s#ADMINHASH#${ADMINHASH}#" passwd
sed -i "" "s#MINIONHASH#${MINIONHASH}#" passwd
sed -i "" "s#CHANGEME#${CUSTOMERPREFIX}-prod#" master-apps/indexer_volume_indexes/local/indexes.conf
git add passwd
git commit -m "Stored hashed passwords"
git push origin master

Step x: Update the salt master with new configs

Now that we have the git repos created, let's update the salt master.

ssh gc-prod-salt-master
salt 'salt*' cmd.run 'salt-run fileserver.update'
salt 'salt*' state.sls salt_master.salt_master_configs --output-diff test=true
sudo salt 'salt*' state.sls salt_master.salt_posix_acl --output-diff test=true
exit

Step x: Set up xdr-terraform-live account

During the bootstrap process, you copied the skeleton across. Review the variables.

cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-${CUSTOMERPREFIX}
# cd ../xdr-terraform-live/prod/aws-us-gov/mdr-prod-${CUSTOMERPREFIX}
vim account.hcl # Fill in all "TODO" items. Leave the "LATER" variables for later steps.
  1. Update the ref version in all the terragrunt.hcl files to match latest tag on modules Git Repo Tag Modules

Replace v1.XX.XX with the current tag. ../../../bin/update_refs --newtag v1.XX.XX

IGNORE BELOW, JUST USE SCRIPT! ^^^

2. `find . -name "terragrunt.hcl" -not -path "*/.terragrunt-cache/*" -exec sed -i '' s/?ref=v1.21.0/?ref=v1.x.x/ {} \;`
2. `find . -name "terragrunt.hcl" -not -path "*/.terragrunt-cache/*" -exec sed -i '' s/?ref=v1.0.0/?ref=v1.x.x/ {} \;`
Did you get them all? Don't forget about the subfolders in account_standards_regional. 
`cat */terragrunt.hcl | grep ref | grep -v 1.xx.xx`
`cat */*/terragrunt.hcl | grep ref`

Step x: Update xdr-terraform-modules and Add account to global variables, and apply necessary prerequisites

  1. Add the account number to account_map["prod"] in :
    • ~/xdr-terraform-modules/variables/accounts.tf
  2. cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2 OR cd ../mdr-prod-c2/
  3. Create PR and get changes approved for both live and module repos. Commit message should be, "Adds ${CUSTOMERPREFIX} customer"
  4. Apply the modules:

Copy and paste these commands into cmd line and run them.

for module in 005-account-standards-c2 008-transit-gateway-hub
do
  pushd $module
  terragrunt apply
  popd
done

oneliner for module in 005-account-standards-c2 008-transit-gateway-hub; do pushd $module; terragrunt apply; popd; done

  1. cd ~/xdr-terraform-live/common/aws-us-gov/afs-mdr-common-services-gov/
  2. cd ../../../common/aws-us-gov/afs-mdr-common-services-gov/
  3. Apply the modules:

    for module in 008-xdr-binaries 010-shared-ami-key 050-lcp-ami-sharing 300-s3-xdr-trumpet */300-s3-xdr-trumpet
    do
    pushd $module
    terragrunt apply
    popd
    done
    

If you get A conflicting conditional operation is currently in progress against this resource., run again. (TODO: Maybe this could be solved with a depends_on block in the right place?)

If you get other errors try...

rm -rf .terragrunt-cache
terragrunt init --upgrade

in each of the folders.

Step x: Share the AMI with the new account

The new AWS account needs permissions to access the AMIs before trying to create EC2 instances. Replace the aws-account-id in the below command.

cd ~/xdr-terraform-live/bin/ 
# OR cd ../../../bin/
# Dump a list of AMIs matching the filter just to get a good looky-loo
AWS_PROFILE=mdr-common-services-gov update-ami-accounts 'MSOC*'

# Now do the actual sharing of the AMIs with your new account
AWS_PROFILE=mdr-common-services-gov update-ami-accounts 'MSOC*' <aws-account-id>

One common problem here. You may need to add region= to your $HOME/.aws/config for mdr-common-services-gov, like so:

[profile mdr-common-services-gov]
source_profile = govcloud
role_arn = arn:aws-us-gov:iam::701290387780:role/user/mdr_terraformer
region = us-gov-east-1
color = ff0000

Also add the new account number to the packer build so that when new AMIs get built they are shared automatically with this account.

cd ~/msoc-infrastructure/packer 
#or cd ../../msoc-infrastructure/packer
vi Makefile
# Add the account(s) to GOVCLOUD_ACCOUNTS / COMMERCIAL_ACCOUNTS
# as needed.  PR it and exit
cd ../../xdr-terraform-live/bin/

Step x: Apply the Terraform in order

The xdr-terraform-live/bin directory should be in your path. You will need it for this step:

:warning: (IMPORTANT:, if you are certain everything is good to go, you can do a yes yes | before the terragrunt-apply-all to bypass prompts. This does not leave you an out if you make a mistake, however, because it is difficult to break out of terragrunt/terraform without causing issues.)

cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-${CUSTOMERPREFIX} 
# OR cd ../prod/aws-us-gov/mdr-prod-${CUSTOMERPREFIX}
terragrunt-apply-all --skipqualys --notlocal

You might run into an error when applying the module 006-account-standards.

Error creating CloudTrail: InsufficientS3BucketPolicyException: Incorrect S3 bucket policy is detected for bucket: xdr-cloudtrail-logs-prod

Resolution: Did you run terragrunt apply in mdr-prod-c2/005-account-standards-c2 ???

You might run into an error when applying the VPC module 010-vpc-splunk. Error reads as:

Error: Invalid for_each argument
  on tgw.tf line 26, in resource "aws_route" "route_to_10":
  26:   for_each = toset(concat(module.vpc.private_route_table_ids, module.vpc.public_route_table_ids))
The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

Workaround is:

cd 010-vpc-splunk
terragrunt apply -target module.vpc
terragrunt apply
cd ..

You might run into an error when applying the test instance module 025-test-instance. Error reads as:

Error: Your query returned no results. Please change your search criteria and try again.

Workaround is:

You forgot to share the AMI with the new account. See the instructions above and run this command in the appropriate folder and replace the aws-account-id.

cd ~/xdr-terraform-live/bin/
AWS_PROFILE=mdr-common-services-gov update-ami-accounts 'MSOC*' <aws-account-id>

Error: "x.x.x.x/32" is not a valid CIDR block: invalid CIDR address: x.x.x.x/32 Issue: the splunk_data_sources variable in account.hcl is not filled out correctly. Solution: If you don't have LCP IPs, yet, just comment this line out.

Error: No RAM Resource Share (arn:aws-us-gov:ram:us-gov-east-1:721817724804:resource-share/b116e73a-82c4-4f84-910f-2b3a53bf45) invitation found Issue: The resource share on the mdr-prod-c2 side is messed up.

Solution: Try manually applying /xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2/008-transit-gateway-hub then run the customer apply again.

Error: GET https://github.xdr.accenturefederalcyber.com/api/v3/repos//fm_source: 401 Must authenticate to access this API. Solution: export GITHUB_TOKEN=<gihub_token>

Step x: Finalize the Salt

Substitute environment variables here:

tshp salt-master
CUSTOMERPREFIX=<entercustomerprefix}
salt-key | grep $CUSTOMERPREFIX # Wait for all 6 servers to be listed (cm, sh, hf, and 3 idxs)
sleep 300 # Wait 5 minutes
salt ${CUSTOMERPREFIX}-\* test.ping
# Repeat until 100% successful

salt ${CUSTOMERPREFIX}-\* saltutil.sync_all
salt ${CUSTOMERPREFIX}-\* saltutil.refresh_pillar
salt ${CUSTOMERPREFIX}-\* saltutil.refresh_modules
salt ${CUSTOMERPREFIX}-\* grains.get environment
salt ${CUSTOMERPREFIX}-\* state.highstate --output-diff
# Review changes from above. Though I've seen indexers get hung. If they do, see note below

# splunk_service may fail, this is expected (it's waiting for port 8000)
# Sensu will not be happy until you apply the cluster bundle!
salt ${CUSTOMERPREFIX}-\* test.version
salt ${CUSTOMERPREFIX}-\* pkg.upgrade ( this may break connectivity if there is a salt minion upgrade! )
salt ${CUSTOMERPREFIX}-\* system.reboot --async
# Wait 5+ minutes

salt ${CUSTOMERPREFIX}-\* test.ping

# Apply the cluster bundle
salt ${CUSTOMERPREFIX}-\*-cm\* state.sls splunk.master.apply_bundle_master --output-diff
# Test connecitivy
salt ${CUSTOMERPREFIX}-\*-idx\* cmd.run 'sudo -u splunk /opt/splunk/bin/splunk cmd splunkd rfs ls volume:smartstore | head -10'
# Verify that there's output and not an empty bucket. Empty bucket could mean the s3 bucket name is misspelled.
exit

Note: If systems get hung on their bootup highstate.

System hangs appear to be because of a race condition with startup of firewalld and its configuration. No known solution at this time. If it hangs, you can reboot the system and apply a highstate again.

Update the salt pillars with the encrypted forms

On salt-master, run (Do not repeat with test=false!):

salt ${CUSTOMERPREFIX}-splunk-sh\* state.sls splunk.search_head --output-diff test=true

Record the new value for pass4SymmKey. It should start with $7$ (if it doesn't, perhaps splunk hasn't started yet?)

Edit the pillars/${CUSTOMERPREFIX}_variables.sls. Replace splunk:idxc:pass4SymmKey and cluster_secret with the hash. Do not replace splunk:idxc:discovery-pass4SymmKey. And then do o a PR all the way to master.

Once merged to master, on the salt master, run:

sudo salt-run fileserver.update
salt ${CUSTOMERPREFIX}-splunk\* saltutil.refresh_pillar
salt ${CUSTOMERPREFIX}-splunk\* state.highstate --output-diff test=true
salt ${CUSTOMERPREFIX}-splunk\* state.highstate --output-diff --batch-size 1 test=false

(Note the batch-size. Since we are restarting indexers, we want to do one at a time.

Step x: Test

Log into https://${CUSTOMERPREFIX}-splunk.pvt.xdr.accenturefederalcyber.com

echo "https://${CUSTOMERPREFIX}-splunk.pvt.xdr.accenturefederalcyber.com"

echo "https://${CUSTOMERPREFIX}-splunk-cm.pvt.xdr.accenturefederalcyber.com:8000"

It should "just work".

  1. UI on the cluster master look sane?
  2. Can you SAML into everything?
  3. Look on the indexers via CLI does everthing look believable

Some helpful sanity check searches

Should see 3 indexers:

index=_* | stats count by splunk_server

Should see all but the HF ( you might see the HF ):

index=_* | stats count by host

Note from Fred: I'm leaving this next one here, copied from the legacy instructions, but I'm not sure where it's supposed to be run. My test on the search head didn't have any results.

Note from Duane: Should work anywhere. Main goal was to see that the cluster bundle got pushed correctly and all the indexes we were expecting to see were listed. I should probably improve this search at some point.

Note from Brad: Donkey! ( see Shrek 2 Dinner scene ). You should see non-default indexes such as app_* from the below search.

| rest /services/data/indexes splunk_server=*splunk-i*
| stats values(homePath_expanded) as home, values(coldPath_expanded) as cold, values(tstatsHomePath_expanded) as tstats by title 
| sort home

Additional tasks:

Add Tenant in Customer Portal

NOTE: This step requires you to be an Admin on the Customer Portal, and Splunk Soar and TQ, and be on the whitelist.

Obtain the Splunk Soar tenet id Log into Splunk Soar > Home > Administration > Product Settings > Multi-tenancy. The ID is in the ID column. Create a new Tenant if needed. Get this from Greg, if not able to create it. It might be in the Jira ticket.

Log in to Portal: https://portal.xdr.accenturefederalcyber.com/choose/login/ You might have to be off the VPN to access the admin site due to IP blocks. https://portal.xdr.accenturefederalcyber.com/admin/

#add the new company
Companies > Add >

Name: Full name Address: TBD Splunk Soar tenant id: Short name: (Upper case CustomerPrefix)
Endpoint min: 0 Endpoint max: 500
Gb max: Get this from customer dossier link can be found in the jira ticket for ohantom tenant jira ticket. Look for Splunk Info and size.
Contract start date: see customer dossier
Contract end date: see customer dossier
Idp name: OKTA

Portal Admin

Portal Lambda Env Var for Sync

In Prod Vault, update the Customer Portal Vars
Vault > portal > lambda_sync_env

Create new fields CUSTOMER{NAME, Splunk_URL, TOKEN, TQ_EXPORT_NAME_DETECT, TQ_EXPORT_NAME_HUNT} input data into values

To get CUSTOMER__TOKEN: You need admin access to customer portal. token is auto generated when customer is created in portal.
Portal Admin Login > Tokens > look for correct customer name.

To get CUSTOMER__TQ_EXPORT* values: TQ login > Settings cogwheel > Exports > Add New Export !!! IMPORTANT !!! The TQ Connection Settings Token MUST match the other customers and the TQ_EXPORT_TOKEN in Vault! The Lambda code depends on it. NOTICE: If no data has been tagged in TQ for this customer, the Lambda sync will have nothing to export. NOTICE: Use these Splunk searches AFTER data has been tagged in TQ:

| inputlookup tip_intel
| inputlookup hunt_intel

TODO: Migrate this to Salt In customer splunk SH create svc-portal-data-sync-lambda user.
Settings > user > new user
Name: svc-portal-data-sync-lambda
Full Name: Portal Data Sync
Set password: (get from vault portal > lambda_sync_env > SPLUNK_PASSWORD) Assign Roles: svc_essjobs (remove "user" role) uncheck: Require password change on first login

Smoke Test

In AWS console, go to mdr-prod-c2-gov, services > lambda > portal_customer_sync > Test > New event > Template "TQSyncTest" > Name "TQSyncTestDOED" Change customer_list to then save changes and hit Test. You should see statusCode 200. In the Log output, you should see Export Fetch Completed and Completed TQ-SPLUNK-SYNC.

Splunk configuration

  • Install ES on the Search Head Version 6.6.2 (Unless George wants a diff version)

  • Download ES app from Splunk using your AFS splunk creds

  • Check hash on your laptop shasum -a 256 splunk-enterprise-security_<version>.spl

  • Temporarily modify the etc/system/local/web.conf to allow large uploads

    [settings]
    max_upload_size = 1024
    

On the salt master...

#CUSTOMERPREFIX=modelclient
salt ${CUSTOMERPREFIX}-splunk-sh* test.ping
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'ls -larth /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'touch /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'chown splunk: /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'echo "[settings]" > /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'echo "max_upload_size = 1024" >> /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'cat /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'systemctl restart splunk'
  • Upload via the GUI ( takes a long time to upload )
  • Choose "Set up now" and "Start Configuration Process"
  • ES should complete app actions on its own, then prompt for a restart ( or need a manual restart; check messages)

Remove the web.conf file

salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'cat /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'rm -rf /opt/splunk/etc/system/local/web.conf'
salt ${CUSTOMERPREFIX}-splunk-sh* cmd.run 'systemctl restart splunk'

Monitoring Console and FM Shared Search ( skip if demo cluster )

  • Add master_uri and pass4symmkey to salt/pillar/mc_variables.sls and salt/pillar/fm_shared_search.sls
  • echo $PASS4KEY
  • Commit to git with the message, "Adds variables for Monitoring Console and FM SH" and once approved, apply the changes.

    # SSH to the salt master
    sudo salt-run fileserver.update
    sudo salt-run git_pillar.update
    salt splunk-mc* state.sls splunk.monitoring_console --output-diff test=true
    salt fm-shared-search-0* state.sls splunk.fm_shared_search --output-diff test=true
    

In both MC and FM SH do the following

  • After applying the code, pull the encypted values of the pass4symmkey out of the Splunk config file and replace them in the salt state. Then create a PR for git. Set the git commit message to, "Swaps pass4symmkey with encrypted value".
  • salt splunk-mc* cmd.run 'cat /opt/splunk/etc/apps/connected_clusters/local/server.conf'
  • Ensure new cluster is showing up in the Settings -> Indexer Clustering ( should see 4 check marks and at least 3 peers ). If not, verify firewall rules.

Make changes to just the Monitoring Console SH

  • Add CM as a search peer by going to Settings -> Distributed Search -> Search Peers -> New Search Peer
  • Input the Peer URI (echo https://${CUSTOMERPREFIX}-splunk-cm.pvt.xdr.accenturefederalcyber.com:8089) and remote admin credentials. For the CM, the remote admin credentials are in Vault at engineering -> customer_slices -> ${CUSTOMERPREFIX} or echo $ADMINPASS
  • Repeat for SH and HF and use the correct Splunk creds. salt ${CUSTOMERPREFIX}-splunk-sh* pillar.get secrets:splunk_admin_password
  • Verify all new customer instances are connected to the search peer by searching for customer prefix in the search peers webpage.
  • Update MC topology ( settings -> Monitoring Console -> Settings -> General Setup -> Apply Changes )
  • Search in the Remote Instances section to ensure all the instances are displayed.

Create New Vault KV Engine for Customer for Feed Management

  • Log into Vault
  • Enable new engine of type KV
  • Change path and enable engine. Naming Scheme: onboarding- Example: onboarding-la-covid
  • Create the LCP Build Sheet if the customer needs LCP nodes

    Go to XDR Documentation Customer Onboarding

    Do you see a customer folder already created? Put the Build Sheet in there. If not, go to Documents > Onboarding > LCP Build Sheets

    Copy the Blank Template LCP Build Sheet and rename with customer prefix find and replace

    Email Other XDR Groups

    TO: "XDR-Feed-Management" xdr.feed.management@accenturefederal.com; XDR-Compliance xdr.compliance@accenturefederal.com; "Starcher, George" george.a.starcher@accenturefederal.com CC: "XDR-Engineering" xdr.eng@accenturefederal.com; XDR-Customer-Success xdr.customer.success@accenturefederal.com

    SUBJECT: CUSTOMERPREFIX AWS Customer Slice Servers Ready

    Hello,
    
    This is notification that the CUSTOMERPREFIX AWS servers are ready for configuration. 
    
    Successfully Completed Tasks
    - Salt highstate completed successfully
    - Splunk installed successfully
    - Splunk ES App installed successfully
    - Servers fully patched and rebooted successfully
    - Servers sending logs to Splunk customer slice successfully
    - Servers sending logs to Splunk Moose successfully
    - Servers connecting to Sensu successfully
    - Servers are showing up in Tenable
    - Login via Teleport is successful
    - TQ Lambda Sync is connecting successfully
    
    These are the new servers that will show up in the inventory:
    <list of servers>
    

    Got LCP nodes?

    Got customer public IPs after you were done standing up the Splunk cluster? This section is for you!

    Not sure on the Public IP? Check the VPC Flow logs. See any Cloudwatch REJECT logs?

    Ensure the eni is correct for PROD salt-master. Adjust src_ip for customer.

    index=app_aws_flowlogs sourcetype="aws:cloudwatchlogs:vpcflow" vpcflow_action=REJECT eni-017d2e433b9f821d8 4506 src_ip=52.*
    |  timechart span=1d count by src_ip
    
    index=app_aws_flowlogs eni-017d2e433b9f821d8 dest_port IN (4505,4506) |  timechart count by src_ip
    

    Got LCP Nodes in Customer AWS Account?

    Need to share the LCP AMIs with a customer in AWS (NOTICE: NOT XDR CUSTOMER SLICE AWS ACCOUNT!) ? Ask Duane for clarification, if needed.
    This section might need to be moved.

    #Share the AMI with the customer's AWS account
    AWS_PROFILE=mdr-common-services-gov update-ami-accounts '*LCP*' <customeraccountID>
    

    This step can be skipped if the LCP is already deployed in the AWS account. If the customer needs to reqularly deploy new LCP images then add them to the packer hcl files.

    #Update the packer files to share future AMIs with the customer's AWS account
    #update one or the other for commerical/govcloud
    ~/msoc-infrastructure/packer/lcp/aws/vars-commercial-prod.hcl
    ~/msoc-infrastructure/packer/lcp/aws/vars-govcloud-prod.hcl
    

    Got a Customer AWS Account?

    The new customer owned ( NOTICE: NOT CUSTOMER SLICE! ) AWS account needs permissions to access the XDR Trumpet S3 bucket. Allowing access for the cloudformation template to run in the customer's AWS account. This only needs to be done if the customer has an AWS account. If the customer has LCPs in an AWS account, the AWS account should be added in case they want to use this service in the future.

    In xdr-terraform-live, edit common/*/partition.hcl file based on where the Customer AWS account is.

    • commerical = aws
    • govcloud = aws-us-gov

    If the customer has both, then edit both files.

    vim ../common/aws/partition.hcl
    # OR/AND
    vim ../common/aws-us-gov/partition.hcl
    

    Update the customer_accounts variable. NOT the account_map!!!

    Run terragrunt-local apply to apply the changes to xdr-terraform-live/common/aws-us-gov/afs-mdr-common-services-gov/300-s3-xdr-trumpet and us-gov-west-1/300-s3-xdr-trumpet.

    Steps to allow LCP nodes through SG

    Splunk IPs

    *** NOTICE: add the IPs to TWO locations in splunk_data_sources.tf. One for Moose and one for customer account. ***

    Look for afs-mdr-prod-c2-gov AND the customer account. Add the customer account if not found.

    cd xdr-terraform-modules/variables
    vim splunk_data_sources.tf  
    

    Add IPs for Salt, etc

    The IPs also need to be allowed for the salt-master, sensu, etc.
    vim xdr-terraform-modules/variables/customer_ips.tf
    vim customer_ips.tf
    Edit the c2_services_external_ips map and be sure to add a description.

    Git Merge

    Open PR and get merged to master.

    terragrunt apply
    

    Apply IPs for Salt, Splunk, etc

    Apply in prod/aws-us-gov/mdr-prod-c2/160-splunk-indexer-cluster for Moose,/mdr-prod-$CUSTOMERPREFIX/160-splunk-indexer-cluster for customer splunk, 095-instance-sensu, 080-instance-repo-server, 071-instance-salt-master, 275-nessus-security-managers or terragrunt-apply-all.

    cd ../../mdr-prod-c2/095-instance-sensu/
    terragrunt apply
    

    Is there going to be LCP nodes?

    These commands will add the pop settings pillar Qualys is Legecy Go to Qualys Dashboard -> Cloud Agent -> Activation Keys -> New Key
    Title name scheme: $CUSTOMERPREFIX-lcp-nodes
    Provision Key for Vuln Management and Policy compliance.
    Create and add a new tag to the activation key with a title called $CUSTOMERPREFIX with parent tag, CustomerPOP. Don't add any Tag Rules. ( Use the create link ). Then click on Generate.

    Copy an existing ${CUSTOMERPREFIX}_pop_settings.sls and rename it. Put the activation key in pillar/$CUSTOMERPREFIX_pop_settings.sls. The qualys_customer_id is the same for all customers.

    CUSTOMERPREFIX=modelclient

    1. add LCP nodes to the pillar top file

      cd salt/pillar
      echo "  '${CUSTOMERPREFIX}* and G@msoc_pop:True':" >> top.sls
      echo "    - match: compound" >> top.sls
      echo "    - ${CUSTOMERPREFIX}_pop_settings" >> top.sls
      
    2. add LCP nodes to the salt top file

      cd ../fileroots/
      echo "  '${CUSTOMERPREFIX}-splunk-syslog*':" >> top.sls
      echo "    - splunk.heavy_forwarder" >> top.sls
      echo "    - splunk.pop_hf_license" >> top.sls
      echo "    - syslog" >> top.sls
      
    3. add Syslog-ng Skeleton config

      cd syslog/files/customers
      cp -r skeleton/ ${CUSTOMERPREFIX}
      

    Commit all the changes to git and open PR. Once the settings are in the master branch, come back and run these commands on the salt-master.

    CUSTOMERPREFIX=modelclient
    CUSTOMERLOC=<customer location grain>  # optional
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" test.ping
    salt -C "${CUSTOMERPREFIX}* and G@location:${CUSTOMERLOC}" test.ping
    #are the LCP images up-to-date on the salt minion version? See Salt Upgrade Notes.md. and salt/fileroots/salt_minion/minion_upgrade.sls 
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" test.version
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" saltutil.sync_all
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" saltutil.refresh_pillar
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" saltutil.refresh_modules
    #did the customer set the roles correctly?
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" grains.get roles
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" grains.get customer
    # lifecycle is used for the RedHat subscription
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" grains.get lifecycle
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" grains.get environment
    
    #Issue with a grain? Look in the LCP troubleshooting section below. 
    #ensure the ec2:billing_products grain is EMPTY unless node is in AWS. ( Do we get the RH subscription from AWS? Not for Vmware/Hyper-v/OCI LCP nodes )
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" grains.get ec2:billing_products
    #make sure the activation-key pillar is available. Lifecycle grain must be set. ( VMware/Hyper-V/OCI Only )
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" pillar.get os_settings:rhel:rh_subscription:activation-key
    #wrong subscription? check the lifecycle grain, pillar.top, and salt/fileroots/os_modifications/rhel_registration.sls
    #OCI/VMware LCP nodes need manual RH Subscription enrollment before removing test=true ensure the command is filled out with the pillar, unless they are in AWS.
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" state.sls os_modifications.rhel_registration test=true
    # try out the os_modifications_redhat then try high state
    salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" state.sls os_modifications_redhat --output-diff
    # Ensure the pillars are correct for the DS. These are needed for successful highstate. You should have added them in a previous step.   
    salt ${CUSTOMERPREFIX}-splunk-ds\* pillar.get serverclass_file
    salt ${CUSTOMERPREFIX}-splunk-* pillar.get deployment_server
    

    Run the highstate, unless you want to configure the syslog-ng drives first.
    Start with ds salt ${CUSTOMERPREFIX}-splunk-ds* state.highstate --output-diff

    Next with syslog servers ( You should see syslog-ng errors due to no syslog-ng partition )
    salt ${CUSTOMERPREFIX}-splunk-syslog-* state.highstate --output-diff

    Run the highstate twice to ensure all the needed changes were made.

    Add 100GB Drive to Syslog Servers

    Manually setup the 100 GB drives for the two syslog servers.

    # CUSTOMERLOC is for customers that have multiple LCP clusters and is optional. 
    CUSTOMERLOC=<customer location grain>
    #Alternate: salt -C "${CUSTOMERPREFIX}*syslog* and G@location:${CUSTOMERLOC}"
    #find the drive name
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* cmd.run 'lsblk | grep 100G'
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* cmd.run 'df -hT'
    CUSTOMERDRIVE=<drivename> # should look like nvme3n1 or sdb
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* cmd.run "lsblk /dev/${CUSTOMERDRIVE}"
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* lvm.pvcreate /dev/${CUSTOMERDRIVE}
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* lvm.vgcreate vg_syslog /dev/${CUSTOMERDRIVE}
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* lvm.lvcreate lv_syslog vg_syslog extents=100%FREE
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* cmd.run 'mkfs -t xfs /dev/vg_syslog/lv_syslog'
    salt ${CUSTOMERPREFIX}-splunk-syslog-\* partition.list /dev/${CUSTOMERDRIVE}
    

    Add the syslog state to the highstate for the customer and apply highstate. syslog state will mount the drive and add to fstab.

    If you do not run the highstate in full, be sure to begin monitoring the new partition by running:

    salt ${CUSTOMERPREFIX}-splunk-syslog-\* state.sls sensu_agent --output-diff
    

    Configure the Customer LCP Git Repository

    Add DS serverclass.conf and Apps. Be sure to name the server class to match the pillar value.

    1. Add the passwd to the Customer DS git repo.

      # cd to Customer git repo on laptop
      DSADMINPASS="`openssl rand -base64 24`"
      echo $DSADMINPASS
      DSADMINHASH="`echo $DSADMINPASS | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
      echo $DSADMINHASH
      echo ":admin:${DSADMINHASH}::Administrator:admin:changeme@example.com:::50000" > passwd
      

    Store the DSADMINPASS in Vault in the engineering/customer_slices/$CUSTOMERPREFIX secret. Create new version with key called echo $CUSTOMERPREFIX-splunk-ds admin.

    Grab Salt Minion user password

    MINIONPASS="`cat ../msoc-infrastructure/salt/pillar/${CUSTOMERPREFIX}_variables.sls | grep minion_pass | cut -d \"\\"\" -f 2`"
    echo $MINIONPASS
    MINIONHASH="`echo $MINIONPASS  | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), rounds=5000))"`"
    echo $MINIONHASH
    echo ":minion:${MINIONHASH}::Salt Minion:saltminion::::50000" >> passwd
    cat passwd
    

    Put these values in the passwd file in the Customer DS git repo (msoc-$CUSTOMERPREFIX-pop) in the root directory. Use the below command to help verify the password hashed correctly (OPTIONAL).

    echo $MINIONPASS  | python3 -c "from passlib.hash import sha512_crypt; print(sha512_crypt.hash(input(), salt='<YOUR-SALT-HERE>', rounds=5000))"
    
    1. Add the appropriate apps to the Customer DS git repo (msoc-CUSTOMERPREFIX-pop). The minimum apps are cust_hf_outputs, xdr_pop_minion_authorize, xdr_pop_ds_summaries.

    update the cust_hf_outputs app ( command specific for MAC OS )
    sed -i '' -e 's/SYSLOG_SERVER/'"${CUSTOMERPREFIX}-splunk-indexers.xdr.accenturefederalcyber.com"'/g' deployment-apps/cust_hf_outputs/local/outputs.conf

    Commit the changes to the git repo.

    git add passwd
    git add deployment-apps/cust_hf_outputs/
    git commit -m "Adds ${CUSTOMERPREFIX} LCP variables. Will promote to master immediately."
    git push origin master
    
    1. If the customer is going to have multiple LCP clusters, then rename serverclass.conf with the correct name in it to the Customer DS git repo. The name should be serverclass_.conf. An alternate to the location grain is the value from the buildsheet.
    2. Move the files to the LCP. You can highstate the minions.

      sudo salt-run fileserver.update
      salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" state.highstate --output-diff
      
      # then patch and reboot ( did you check the salt minion verison? )
      salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" pkg.upgrade
      salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" system.reboot 
      

      Verify Splunk Connectivity

      Can you see the DS logs in the customer slice splunk? If you don't have a DNS address for the deployement server you will not see the LCP syslog node logs.

      Find out the hostnames for the LCP nodes.
      salt -C "${CUSTOMERPREFIX}* and G@msoc_pop:True" cmd.run 'hostname'

      Customer Slice ( only DS will show up due to outputs app not on syslog servers ) Did you remember to add the LCP IPs to the Customer slice SG?
      index=_internal NOT host="*.pvt.xdr.accenturefederalcyber.com" source="/opt/splunk/var/log/splunk/splunkd.log" earliest=-1h

      Moose, Did you remember to add the LCP IPs to the Moose SG? ( should see all three LCP nodes )
      index=_internal earliest=-1h host=<host-from-previous-command>

      Verify HF to DS connectivity

      • Customer HF can connect to DS on port 8089 cmd.run 'nc -vz 8089' network.connect 8089

        Got DNS?

        network.connect 8089 dig.A

        Email Feed Management

        TO: "XDR-Feed-Management" xdr.feed.management@accenturefederal.com; XDR-Compliance xdr.compliance@accenturefederal.com CC: "XDR-Engineering" xdr.eng@accenturefederal.com; XDR-Customer-Success xdr.customer.success@accenturefederal.com

        SUBJECT: CUSTOMERPREFIX LCP Servers Ready

        Hello,
        
        This is notification that the CUSTOMERPREFIX LCP servers are ready for Feed Management to configure for customer use. 
        
        Successfully Completed Tasks
        - Salt highstate completed successfully
        - Splunk and syslog-ng installed successfully
        - Servers fully patched and rebooted successfully
        - Customer DS sending logs to Splunk customer slice successfully
        - Servers sending logs to Splunk Moose successfully
        - Servers connecting to Sensu successfully
        - Customer HF can connect to DS on port 8089 successfully
        - Servers are showing up in Tenable
        - Login via Teleport is successful
        
        Next Step: 
        Request DNS entries from the customer for the DS and syslog servers then update the pillar/git repo with the correct DNS entries. 
        
        These are the new servers that will show up in the inventory:
        <list of servers>
        
        

        LCP Troubleshooting

        REMEMBER: Our Customers are responsible for setting up the salt minion with grains and allow traffic through the outbound firewall. If they have not done that yet, you will get more errors.

        ISSUE: Help, the customer grain is not showing up! This means the customer did not do their job!
        SOLUTION: This command will add a static grain in /etc/salt/minion.d/minion_role_grains.conf. Update variables accordingly.
        salt 'target' state.sls salt_minion.salt_grains_lcp pillar='{"customer": "afs", "location": "Ashburn", "lifecycle": "production"}' --output-diff test=true

        ISSUE: Deployment Server is not running the reload_ds state file correctly and the error, "Client is not authorized to perform requested action" is showing up.
        SOLUTION: ensure the splunk minion user has the correct splunk role assigned in the passwd file.

        ISSUE: [ERROR ][2798] Failed to import grains ec2_tags, this is due most likely to a syntax error SOLUTION: python 3 needed upgrade salt!!

        ISSUE: http://pkg.scaleft.com/rpm/repodata/repomd.xml: [Errno 12] Timeout on http://pkg.scaleft.com/rpm/repodata/repomd.xml: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
        Trying other mirror.
        SOLUTION: Fix connectivity issues to scaleft
        TEMP FIX: yum --disablerepo=okta_asa_repo_add pkg.upgrade
        cmd.run 'yum install python-virtualenv -y --disablerepo=okta_asa_repo_add'

        ISSUE:

        2021-02-16 21:25:51,126 [salt.loaded.int.module.cmdmod:854 ][ERROR   ][26641] Command '['useradd', '-U', '-M', '-d', '/opt/splunk', 'splunk']' failed with return code: 9
            2021-02-16 21:25:51,127 [salt.loaded.int.module.cmdmod:858 ][ERROR   ][26641] stderr: useradd: group splunk exists - if you want to add this user to that group, use -g.
            2021-02-16 21:25:51,127 [salt.loaded.int.module.cmdmod:860 ][ERROR   ][26641] retcode: 9
            2021-02-16 21:25:51,127 [salt.state       :328 ][ERROR   ][26641] Failed to create new user splunk
        

        SOLUTION: Manually create user and add to splunk group OR delete group and create user+group in one command.
        cmd.run 'useradd -M -g splunk -d /opt/splunk splunk'

        ISSUE:

        splunk pkg.install 
        Public key for splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm is not installed
            Retrieving key from https://docs.splunk.com/images/6/6b/SplunkPGPKey.pub
        
        
            GPG key retrieval failed: [Errno 14] curl#35 - "TCP connection reset by peer"
        

        TEMP FIX: cmd.run 'yum --disablerepo=okta_asa_repo_add -y --nogpgcheck install splunk'