# Migration to SmartStore Splunk documentation: https://docs.splunk.com/Documentation/Splunk/latest/Indexer/AboutSmartStore Items of note: * SmartStore data retention is managed cluster-wide * Only `maxGlobalDataSizeMB`, `maxGlobalRawDataSizeMB`, and `frozenTimePeriodInSecs` control when to freeze data. * The most restrictive rule applies. * When buckets freeze, they are removed from both remote and local storage. ## Prerequisites ### Create the Target S3 Bucket Commit [35d2254](https://github.xdr.accenturefederalcyber.com/mdr-engineering/xdr-terraform-modules/commit/34d2254fbde1553e05e4171f148fdc4159e469c6) enabled the creation of a SmartStore (S2) specific S3 bucket for every customer. To add it to existing customers/slices, copy the `145-splunk-smartstore-s3/` directory from `xdr-terraform-live/test/aws-us-gov/mdr-test-modelclient/` to the target customer's directory and update the referenced tag if necessary to v3.2.14 or higher, then run `terragrunt-local apply` to create the customer's SmartStore S3 bucket. ### Ensure the Index Cluster's Search Factor and Replication Factor Are Equal SF must equal RF. This setting is usually found in the Cluster Manager's `/etc/system/local/server.conf` ### Ensure the Value of `maxDataSize` is set to "auto" This has been corrected in the [msoc-skeleton-cm](https://github.xdr.accenturefederalcyber.com/mdr-engineering/msoc-skeleton-cm) configuration for all future customers. See commit [052d212](https://github.xdr.accenturefederalcyber.com/mdr-engineering/msoc-skeleton-cm/commit/052d2120c4d01dd4c5f7e84f7bc13e0eb2bb732d). For existing customers, it may be necessary to remove a 2nd `maxDataSize` entry in `master-apps/all_indexes/local/indexes.conf` (`maxDataSize = 5000`). ### Add `minFreeSpace = 20%` to the `[diskUsage]` Stanza in `server.conf` This can be placed in the CM's `master-apps/_cluster/local/server.conf` or in an app of its own. ```ini [diskUsage] minFreeSpace = 20% ``` ### Create the SmartStore (Remote) Volume Change the value in `path` to match the S3 bucket name created by Terraform. ```ini [volume:smartstore] storageType = remote # Do we want a path or drop everything into '/'? path = s3://xdr-CUSTOMER-ENVIRONMENT-splunk-smartstore/ remote.s3.endpoint = https://s3.us-gov-east-1.amazonaws.com remote.s3.supports_versioning = true remote.s3.encryption = sse-kms remote.s3.kms.key_id = alias/SmartStore remote.s3.kms.auth_region = us-gov-east-1 # SSL settings for S3 communications remote.s3.sslVerifyServerCert = true remote.s3.sslVersions = tls1.2 remote.s3.sslAltNameToCheck = s3.us-gov-east-1.amazonaws.com # https://www.amazontrust.com/repository/SFSRootCAG2.pem remote.s3.sslRootCAPath = $SPLUNK_HOME/etc/auth/SFSRootCAG2.pem remote.s3.cipherSuite = ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-SHA256:AES256-SHA256:AES256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256 remote.s3.ecdhCurves = prime256v1, secp384r1, secp521r1 # SSL settings for KMS communication remote.s3.kms.sslVerifyServerCert = true remote.s3.kms.sslVersions = tls1.2 remote.s3.kms.sslAltNameToCheck = kms.us-gov-east-1.amazonaws.com remote.s3.kms.sslRootCAPath = $SPLUNK_HOME/etc/auth/SFSRootCAG2.pem remote.s3.kms.cipherSuite = ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-SHA256:AES256-SHA256:AES256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256 remote.s3.kms.ecdhCurves = prime256v1, secp384r1, secp521r1 ``` **NOTE**: It may be possible to reduce the cipherSuite value to `TLSv1.2+HIGH:@STRENGTH` rather than specifying each type. The default value is `TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH`. The same commit referenced above for the msoc-skeleton-cm repository added `remotePath = volume:smartstore/$_index_name` to the `master-apps/all_indexes/local/indexes.conf` file. **NOTE**: Before restarting the index cluster the first time, log in to the Cluster Manager, ensure there are no pending fix-up tasks and remove all excess buckets. ### Add the `SFSRootCAG2.pem` to `$SPLUNK_HOME/etc/auth/` This CA is used by Splunk to validate the certificates for the S3 bucket as well as the KMS queries agains us-gov-east-1. See https://www.amazontrust.com/repository/ This file was added to the indexers' Salt state with commit [6d5dc54](https://github.xdr.accenturefederalcyber.com/mdr-engineering/msoc-infrastructure/commit/6d5dc54755830b2b14130aaedac99be4d39d0ec0#diff-63479ad69a090b258277ec8fba6f99419a2ffb248981510657c944ccd1148e97) and can be added to the indexers with `salt customer-splunk-idx-* state.sls splunk.indexer` (it may already be present). ### Update `indexes.conf` and `server.conf` Deployed by the Cluster Manager From the Salt server, run `salt customer-splunk-cm* state.sls splunk.master.apply_bundle_master test=true` to ensure the indexers will receive the SmartStore volume definition and updated `server.conf`. If it looks correct, run without `test=true`. SSH to the Cluster Manager, become the splunk user and run `splunk show cluster-bundle-status` (you may need to authenticate as the minion user) to observe the rolling cluster restart and correct any errors found in the bundle validation stage. ## Test the Indexers' Ability to Communicate with S3 SSH to one of the customer indexers and switch to the Splunk user. You should be able to run `/usr/local/bin/aws s3 ls --region us-gov-east-1 s3://xdr---splunk-smartstore/`. There will be no output if this is successful. Additional tests: ```bash # List all indices and buckets splunk cmd splunkd rfs -- ls --starts-with volume:smartstore # List all buckets in a specific index splunk cmd splunkd rfs -- ls --starts-with volume:smartstore// ``` ## Migrate Indices to SmartStore ### Confirm That the Cluster is Healthy and in the _Complete_ State Set `constrain_singlesite_buckets = false` in the Cluster Manager's `server.conf` under `[clustering]` and restart Splunk on the Cluster Manager. This can be applied via Salt with `salt -splunk-cm* state.sls splunk.master.init test=true` then without `test=true` after validating only the INI change and service restart (perhaps file permissions change for etc/passwd) are the only changes. Check the Bucket Status panel and resolve any pending fixup tasks. ### Test With One Index to Start Add `remotePath = volume:smartstore/$_index_name` to an index such as `_introspection` in the Cluster Manager's copy of `master-apps/all_indexes/local/indexes.conf`, set `frozenTimePeriodInSecs = 0` and `maxGlobalDataSizeMB = 0`, then apply the change via Salt. ### Check Splunk's Log ``` index=_internal sourcetype=splunkd TERM(action=upload) | rex field=cache_id "\w+\|(?[^~]+)" | stats count(eval(status=="attempting")) AS Attempting count(eval(status=="succeeded")) AS Succeeded count(eval(status=="failed")) AS Failed BY indice | addcoltotals labelfield=indice ``` The _introspection index should appear in the search results with values under "Attempting" and "Succeeded". If the value under "Failed" is greater than zero, check splunkd.log on one of the indexers to troubleshoot. Additional Splunk Searches: ``` | rest /services/admin/cacheman/_metrics splunk_server=*-splunk-idx-* | fields splunk_server migration.* | rename migration.* AS * | sort start_epoch | eval Duration = end_epoch - start_epoch, Duration = tostring(Duration, "duration") | convert timeformat="%F %T %Z" ctime(start_epoch) AS Start ctime(end_epoch) AS End | eval Completed = round(current_job/total_jobs,4)*100, End = if(isnull(End), "N/A", End), status = case( status=="running", "Running", status=="finished", "Finished", true(), status ) | eventstats sum(eval(Completed/3)) AS overall | eval overall = round(overall,2) | fields splunk_server Start End Duration status total_jobs current_job Completed overall | rename splunk_server AS "Splunk Indexer" status AS Status current_job AS "Current Job" total_jobs AS "Total Jobs" | appendpipe [ | stats count BY overall | eval "Current Job" = "Overall Completion" | rename overall AS Completed | fields Completed "Current Job"] | fields - overall | eval Completed = Completed . "%" ``` If Splunk restarts before the migration completes, the endpoint data may not be accurate. If that happens, run: ``` | rest /services/admin/cacheman splunk_server=*-splunk-idx-* | search cm:bucket.stable=0 | stats count BY splunk_server ``` IMPORTANT: Do not forget to reconfigure the retention settings after the migration. ### Move Remaining Indices to SmartStore Add `remotePath = volume:smartstore/$_index_name` under the `[default]` stanza as well as under all other index definitions. This is a good time to update `indexes.conf` entries where a stanza is relying on values from `[default]` rather than having them defined per index. For example: ```ini [os] homePath = volume:normal_primary/$_index_name/db coldPath = volume:normal_primary/$_index_name/colddb remotePath = volume:smartstore/$_index_name thawedPath = $SPLUNK_DB/os/thaweddb tstatsHomePath = volume:high_primary/$_index_name/datamodel_summary coldToFrozenScript = "/usr/bin/python3" "/usr/local/bin/coldToFrozenS3.py" frozenTimePeriodInSecs = 31557600 lastChanceIndex = lastchance maxConcurrentOptimizes = 24 maxDataSize = auto maxHotBuckets = 10 maxTotalDataSizeMB = 4294967295 quarantineFutureSecs = 172800 quarantinePastSecs = 604800 repFactor = auto ``` Once all the index definitions have `remotePath` defined, use Salt to apply the bundle change to the indexers. Observe the progress of the bundle application from the Cluster Manager as mentioned above and use the Splunk search to observe data moving to S3. # Thawing Frozen Data DO NOT thaw an archived (frozen) bucket into a SmartStore index! Create a separate, "classic" index that does not utilize SmartStore (no `remotePath`) and thaw the buckets into the `thawedPath` of that index.