How to retry Ansible task that may fail?

Ansible

Ansible Problem Overview


In my Ansible play I am restarting database then trying to do some operations on it. Restart command returns as soon as restart is started, not when db is up. Next command tries to connect to the database. That command my fail when db is not up.

I want to retry my second command a few times. If last retry fails, I want to fail my play.

When I do retries as follows

retries: 3
delay: 5

Then retries are not executed at all, because first command execution fails whole play. I could add ignore_errors: yes but that way play will pass even if all retries failed. Is there a easy way to retry failures until I have success, but fail when no success from last retry?

Ansible Solutions


Solution 1 - Ansible

I don't understand your claim that the "first command execution fails whole play". It wouldn't make sense if Ansible behaved this way.

The following task:

- command: /usr/bin/false
  retries: 3
  delay: 3
  register: result
  until: result.rc == 0

produces:

TASK [command] ******************************************************************************************
FAILED - RETRYING: command (3 retries left).
FAILED - RETRYING: command (2 retries left).
FAILED - RETRYING: command (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["/usr/bin/false"], "delta": "0:00:00.003883", "end": "2017-05-23 21:39:51.669623", "failed": true, "rc": 1, "start": "2017-05-23 21:39:51.665740", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

which seems to be exactly what you want.

Solution 2 - Ansible

Not sure if this is Ansible tower specific, but I am using:

- command: /usr/bin/false
  register: result
  retries: 3
  delay: 10
  until: result is not failed

Solution 3 - Ansible

Consider using [wait_for][1] module. It waits for a condition before continuing, for example for a port to become open or closed, for a file to exist or not, or for some content in a file.

Without seeing the rest of your playbook, consider the following example:

- name: Wait for db server to restart
  local_action:
    wait_for:
      host=192.168.50.4
      port=3306
      delay=1
      timeout=300

You can also adapt it as a handler and obviously change this snippet to suit your use-case.

[1]: https://docs.ansible.com/ansible/wait_for_module.html "wait_for"

Solution 4 - Ansible

For the following task:

- hosts: all
become: yes
tasks:
- name: create the 'myusername' user
  user: name=myusername append=yes state=present createhome=yes shell=/bin/bash

I was not sure weather the remote was ready yet (because this was a newly spinned node). So I had to try those retries and delays stuff. Unfortunately with no luck. For now I ended up creating a wrapper in my bash script to achieve the needed behavior.

#!/bin/bash

STATUS_CODE=1
TRY=1
while [ "$STATUS_CODE" -ge 1 ]
do
  if [ $TRY -gt 5 ];
  then
    echo Retried to connect to node 5 times and failed. Exiting
    exit 1
  fi

  ansible-playbook -i $HOSTS_FILE user.yml
  STATUS_CODE=$?
  TRY=$(( $TRY + 1 ))

  if [ $STATUS_CODE -ge 1 ]
  then
    echo Retry to connect to node in 5 seconds
    sleep 5
  fi
done

Still in hopes to make it a cleaner way using ansible-playbook yml. Anyone got suggestions on this?

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBartosz BilickiView Question on Stackoverflow
Solution 1 - AnsibletechrafView Answer on Stackoverflow
Solution 2 - AnsibleSerialEnablerView Answer on Stackoverflow
Solution 3 - AnsibleMxxView Answer on Stackoverflow
Solution 4 - AnsibleOleksii ZymovetsView Answer on Stackoverflow