How to continue execution on failed task after fixing error in playbook?

AnsibleAnsible Playbook

Ansible Problem Overview


When writing and debugging Ansible playbooks, typical workflow is as follows:

  1. ansible-playbook ./main.yaml
  2. Playbook fails on some task
  3. Fix this task and repeat line 1, waiting for all previous tasks to execute again. Which takes a lot of time

Ideally, i'd like to resume execution on failed task, having inventory and all facts collected by previous tasks. Is it even possible? How to make playbook writing/debugging faster?

Ansible Solutions


Solution 1 - Ansible

Take a look at Executing playbooks for troubleshooting. If you want to start executing your playbook at a particular task, you can do so with the --start-at-task option:

ansible-playbook playbook.yml --start-at-task="install packages"

The above will start executing your playbook at a task named “install packages”.

Alternatively, take a look at this previous answer How to run only one task in ansible playbook?

Finally, when a play fails, it usually gives you something along the lines of:

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/user/site.retry

Use that --limit command and it should retry from the failed task.

Solution 2 - Ansible

Future readers:

The --limit @/home/user/site.retry would not help in such a scenario, the .retry only stores the failed host and nothing more, so will just execute all tasks against failed hosts.

If you are using the latest version (Ansible 2.x) the --start-at-task does not work for tasks defined inside roles.

You can achieve similar effect by just using the --step flag e.g: ansible-playbook playbook.yml --step. The step asks you on before executing each task and you could choose (N)o/(y)es/(c)ontinue.

With this approach you selectively execute tasks when needed and also continue from point where it failed, after fixes.

Solution 3 - Ansible

Future Future readers:

As of Ansible 2.4.2.0 --start-at-task works for tasks defined in roles I created.

The ansible team is not willing to address this issue they suggest you keep your roles idempotent and replay the entire play, I don't have time for this. In my roles I am not using a massive amount of facts like @JeremyWhiting, so for me I can use this --start-at-task feature.

Still however, this is a manual task so instead I wrote some ansible rpm and added a "Resume" feature that follows these basic steps:

  • Enable the ansible log via /etc/ansible/ansible.cfg (uncomment log_path)
  • Clear the log before each run
  • After a failure, the "Resume" feature greps this log for the last "TASK" line, and uses sed to get what is inside the "[]"
  • Then it simply calls the last run play, with --start-at-task="$start_at_task"
  • Ensure that you have "any_errors_fatal: true" in your roles to stop the play at the failing task you wish to resume from

The ansible team is unwilling to create this basic (and very useful) feature so the only choice is to hack it together via some bash scripts.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSergey AlaevView Question on Stackoverflow
Solution 1 - AnsibleMxxView Answer on Stackoverflow
Solution 2 - AnsibleSegmentedView Answer on Stackoverflow
Solution 3 - AnsibleTrentView Answer on Stackoverflow