|
@@ -168,7 +168,7 @@ FIX: ebs optimize needs to be set to false for t2.small instance size.
|
|
|
|
|
|
Run the salt high state twice
|
|
|
|
|
|
-first time it gets 'stuck' when run with salt-call in the cloud-init.
|
|
|
+first time it gets 'stuck' when run with salt-call in the cloud-init??.
|
|
|
kill it with saltutil.kill_job 20200528224000719269
|
|
|
|
|
|
RHEL subscription failing (Error: Must specify an activation key) Pillar must be bad!
|
|
@@ -305,10 +305,10 @@ in outputs.conf for splunk nodes
|
|
|
# Steps for migraion to PROD
|
|
|
|
|
|
1. Setup IDXC-Discovery on CM
|
|
|
-1.1 prep CM for IDXC by rm -rf the outputs app
|
|
|
+1.1 prep Cluster Master for IDXC by rm -rf the current outputs app
|
|
|
1.1.2 rm /opt/splunk/etc/apps/{{ pillar['cluster_name'] }}_cm_outputs/
|
|
|
-1.5 Run salt state to enable IDXC discovery and enable IDXC outputs
|
|
|
-1.5.1 `salt moose*cm* state.sls splunk.master`
|
|
|
+1.5 Run salt state on CM to enable IDXC discovery and enable IDXC outputs. This will replace the deleted outputs app.
|
|
|
+1.5.1 `salt moose*cm* state.sls splunk.master test=true --state-output=changes`
|
|
|
1.6 adjust SH,HF outputs.conf to point to IDXC Discovery
|
|
|
1.7 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('cluster_name') }}_sh_outputs/local/outputs.conf
|
|
|
1.8 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('splunk:cluster_name') }}_hf_ec2_outputs/local/outputs.conf
|
|
@@ -316,23 +316,42 @@ in outputs.conf for splunk nodes
|
|
|
1.11 `salt moose*sh* state.sls splunk.search_head test=true --state-output=changes`
|
|
|
1.13 `salt moose*hf* state.sls splunk.heavy_forwarder test=true --state-output=changes`
|
|
|
1.14 Update all minions to IDXC discovery
|
|
|
-2. Update all minions to new collectd internal endpoint: iratemoses.mdr.defpoint.com
|
|
|
-2.1 ensure pillar is refreshed ` salt minion pillar.item `
|
|
|
+1.14.1 `salt mail* state.sls internal_splunk_forwarder test=true --state-output=changes`
|
|
|
+2. setup new iratemoses endpoint in TF.
|
|
|
+2.0.1 TF apply in 05-customer_portal (openSGs), 100-moose (create DNS)
|
|
|
+2.0 Update all minions to new collectd internal endpoint: iratemoses.msoc.defpoint.local
|
|
|
+2.1 ensure pillar is refreshed ` salt minion pillar.item collectd:hec_hostname`
|
|
|
2.2 `salt minion network.connect iratemoses.msoc.defpoint.local 8088`
|
|
|
2.3 `salt minion state.sls collectd test=true --state-output=changes`
|
|
|
+2.3 `salt-run survey.diff *.local cp.get_file_str file:///etc/collectd.conf`
|
|
|
+2.4 Ensure collectd metrics are in moose splunk.
|
|
|
+2.4.1 `| mstats count WHERE index=collectd metric_name=* by host, metric_name`
|
|
|
+2.5 Ensure Splunk UFs are in moose splunk
|
|
|
+2.5.1 `index="_internal" sourcetype=splunkd source="/opt/splunkforwarder/var/log/splunk/splunkd.log" | stats count by host`
|
|
|
3. stand up new templates and ASGs
|
|
|
3.1 launch new ASG instances
|
|
|
+3.1.1 run highstate on new indexers
|
|
|
+3.1.2 kill defunct highstate with this
|
|
|
+3.1.3 `salt '*' saltutil.kill_job <job_id>`
|
|
|
+3.1.4 pkg.upgrade to patch the server
|
|
|
3.2 Ensure three green checkmarks in CM
|
|
|
4. change salt master to new outputs (make sure it is working)
|
|
|
4.1 ensure they can connect first
|
|
|
4.2 `salt '*.local' network.connect moose-splunk-cm.msoc.defpoint.local 8089`
|
|
|
-5.
|
|
|
6. silence sensu
|
|
|
6. manually take the non-ASG indexers offline but don't stop instances ( yes you can offline 2 indexers at a time. )
|
|
|
6.1 `/opt/splunk/bin/splunk offline --enforce-counts`
|
|
|
6.1.1 no users? create etc/system/local/user-seed.conf then restart ( https://answers.splunk.com/answers/834/how-to-reset-the-admin-password.html )
|
|
|
+
|
|
|
+`mv passwd passwd.bak`
|
|
|
+
|
|
|
+```
|
|
|
+[user_info]
|
|
|
+PASSWORD = NEW_PASSWORD
|
|
|
+```
|
|
|
+
|
|
|
6.2 disable the service to prevent it startup back up
|
|
|
-7. use tf to destroy the instances then remove the code from TF. <- this is tricky
|
|
|
+7. use tf to destroy the instances then remove the code from TF. <- this is tricky create new git branch
|
|
|
7.1 `terraform destroy -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[0] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[1] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[2]`
|
|
|
8. ensure all LBs are pointing to the new indexers
|
|
|
8.1 Run from laptop NOT on VPN `curl https://iratemoses.mdr-test.defpoint.com:8088 --insecure`
|
|
@@ -377,4 +396,7 @@ moose-splunk-indexer-i-055a31767d05fb053.msoc.defpoint.local - 50G - done termin
|
|
|
|
|
|
|
|
|
|
|
|
+ERROR:
|
|
|
+ERROR IndexerDiscoveryHeartbeatThread - failed to parse response payload for group=afs-cluster, err=failed to extract FwdTarget from json node={"hostport":"?","ssl":false,"indexing_disk_space":-1}http_response=OK
|
|
|
|
|
|
+SOLUTION: indexers had no inputs.conf! the indexers were not listening for incoming connections.
|