Re: [make-initrd] [PATCH v6 16/22] bootchain-core: new logic of the daemon main loop

From: Leonid Krivoshein <klark.devel@gmail.com>
To: make-initrd@lists.altlinux.org
Subject: Re: [make-initrd] [PATCH v6 16/22] bootchain-core: new logic of the daemon main loop
Date: Wed, 27 Oct 2021 00:08:36 +0300
Message-ID: <d8f371c1-a82d-6345-6e88-3a4ad532a82b@gmail.com> (raw)
In-Reply-To: <20211026192147.nswnxnuc6b7dzvel@example.org>

26.10.2021 22:21, Alexey Gladkov пишет:
> On Sun, Oct 24, 2021 at 08:22:31PM +0300, Leonid Krivoshein wrote:
>> - Adds the ability to overload the boot chain new steps;
>> - Limits the number of repeated runs of steps-scripts to five;
>> - Adds a switch for allow/disallow to retry steps-scripts;
>> - Introduces the difference between the modes "NATIVE"
>>    and "COMPATIBILITY" with the pipeline;
>> - Offers a new way of ending the daemon main loop;
>> - Saves the names of the steps taken.
>>
>> See README.md for more details.
>>
>> Signed-off-by: Leonid Krivoshein <klark.devel@gmail.com>
>> ---
>>   features/bootchain-core/README.md             | 62 +++++++++++++
>>   .../data/bin/bootchain-sh-functions           |  2 +
>>   features/bootchain-core/data/sbin/chaind      | 91 ++++++++++++++-----
>>   3 files changed, 131 insertions(+), 24 deletions(-)
>>
>> diff --git a/features/bootchain-core/README.md b/features/bootchain-core/README.md
>> index db73c0a..bde5c9b 100644
>> --- a/features/bootchain-core/README.md
>> +++ b/features/bootchain-core/README.md
>> @@ -54,6 +54,11 @@ us to optimize fill in `initramfs` only which we are need.
>>     Such pseudo-steps allow you to control, basically, the internal state of the
>>     daemon and should not be taken into account in the boot chain, as if they are
>>     hidden.
>> +- The `chaind` daemon allows you to overload the chain with a new set of steps,
>> +  thanks to this, you can change the logic of work "on the fly", support loops
>> +  and conditional jumps, in text dialogs it is an opportunity to go back.
>> +- Keeps records of the steps taken at least once and allows you to prevent their
>> +  re-launch.
>>   - `bootchain-sh-functions` extends the API of the original `pipeline-sh-functions`,
>>     see the details in the corresponding section.
>>   - Via resolve_target() supports not only forward, but also reverse addressing,
>> @@ -68,6 +73,11 @@ us to optimize fill in `initramfs` only which we are need.
>>     of the previous step through symbolic links to mount points inside initramfs,
>>     outside the tree the results of the steps, which provides, if necessary, the
>>     overlap mounting mechanism inherent in the program `propagator`.
>> +- Along with the NATIVE mode of operation, the `chaind` daemon can work in
>> +  COMPATIBILITY WITH `pipeline`. In the NATIVE mode of operation, the daemon
>> +  imposes another an approach to processing the status code of the completed
>> +  step and the method of premature completion of the boot chain, see the details
>> +  in the corresponding section.
>>   - The daemon can be configured when building initramfs via the included file
>>     configurations of `/etc/sysconfig/bootchain`, and not only through boot
>>     parameters, see the details in the corresponding section.
>> @@ -81,6 +91,50 @@ Despite the differences, `chaind` is backward compatible with previously
>>   written steps for the `pipelined` daemon and does not require changes for
>>   configurations with `root=pipeline`.
>>   
>> +## Features of the pipelined work
>> +
>> +If the step-script will be finished with code of status 2, the original daemon
>> +`pipelined` will understand it like a must to stop chains and finish work.
>> +(meaning that system is ready to go stage2). If the step-script does not
>> +process this code from an external command, and stage2 is not ready to work
>> +yet, a situation with premature termination of the daemon will arise.
>> +
>> +If the step-script will be finished with non-null code of status (different
>> +from 2), daemon `pipelined` will understand it like a fail and will repeat this
>> +failure-step with pause in one second in infinity cycle (until common timeout
>> +rootdelay=180). But, sometimes repeat steps are unnecessary because the
>> +situation is incorrigible and repeating will just waste of time and make a
>> +system log is filling up. But the daemon `pipelined` don't know how to work
>> +with this situations.
>> +
>> +## New approach in chaind daemon
>> +
>> +For steps-scripts are suggested before finish work with code of status 0 call
>> +break_bc_loop() for tell to the daemon about ready stage2 and needed finish
>> +work this daemon after the current step.In case of a failure in the step-by-step
>> +scenario, the daemon can repeat it, but no more than four times with a pause of
>> +two seconds. In order for a failure in the step-by-step scenario to lead to an
>> +immediate shutdown of the daemon, it is necessary to use the internal step
>> +`noretry`.
>> +
>> +## Daemon operation mode
>> +
>> +### NATIVE mode of operation
>> +
>> +NATIVE mode is activated by the `root=bootchain` parameter. In this mode, the
>> +daemon will perceive the status code 2 from the step script in the same way as
>> +any other non-zero code and then act according to the internal state: if
>> +repetitions are allowed, the step script will be called again with a pause
>> +of 2 seconds, but no more than four times. If repetitions are prohibited,
>> +the daemon itself will immediately terminate.
>> +
>> +### Pipeline COMPATIBILITY mode
>> +
>> +Compatibility mode is activated by the `root=pipeline` parameter. In this mode,
>> +the daemon behaves the same as the original `pipelined`, except that it limits
>> +the number of re-runs of the failed step. He perceives the status code 2 not as
>> +a failure, but as a command to end the main daemon cycle.
>> +
>>   ## Configuration
>>   
>>   The configuration is defined in the file `/etc/sysconfig/bootchain` when
>> @@ -109,6 +163,14 @@ addressing, as if they are hidden.
>>     on the <OUT> of the previous step from the <IN> of the next step, which can
>>     be useful, for example, when we don`t want the results of the `waitdev` step
>>     to be used in the next step, `localdev`, which primarily looks at them.
>> +- `noretry` - prohibits the following steps from ending with a non-zero return
>> +  code, what will lead to the immediate shutdown of the daemon in case of a
>> +  script failure any next step. By default, the steps are allowed to fail,
>> +  the daemon will try to restart them again four times with a pause of two
>> +  seconds.
>> +- `retry` - allows all subsequent steps to be completed with a non-zero return
>> +  code, which will lead to their starting five times, in total. This mode of
>> +  operation of the daemon operates by default.
>>   
>>   ## External elements of the bootchain (steps-scripts)
>>   
>> diff --git a/features/bootchain-core/data/bin/bootchain-sh-functions b/features/bootchain-core/data/bin/bootchain-sh-functions
>> index d1d0cef..8c5a2f2 100644
>> --- a/features/bootchain-core/data/bin/bootchain-sh-functions
>> +++ b/features/bootchain-core/data/bin/bootchain-sh-functions
>> @@ -17,9 +17,11 @@ message_time=1
>>   if [ "${ROOT-}" = pipeline ]; then
>>   	BC_LOGFILE="${BC_LOGFILE:-/var/log/pipelined.log}"
>>   	mntdir="${mntdir:-/dev/pipeline}"
>> +	pipeline_mode=1
>>   else
>>   	BC_LOGFILE="${BC_LOGFILE:-/var/log/chaind.log}"
>>   	mntdir="${mntdir:-/dev/bootchain}"
>> +	pipeline_mode=
>>   fi
>>   
>>   BC_NEXTCHAIN=/.initrd/bootchain/bootchain.next
>> diff --git a/features/bootchain-core/data/sbin/chaind b/features/bootchain-core/data/sbin/chaind
>> index 5623a37..4c9ebaa 100755
>> --- a/features/bootchain-core/data/sbin/chaind
>> +++ b/features/bootchain-core/data/sbin/chaind
>> @@ -2,7 +2,11 @@
>>   
>>   . bootchain-sh-functions
>>   
>> +bcretry=1
>>   pidfile="/var/run/$PROG.pid"
>> +chainsteps="$BOOTCHAIN"
>> +stepnum=0
>> +prevdir=
>>   
>>   
>>   exit_handler()
>> @@ -39,11 +43,7 @@ run mkdir -p -- "$mntdir" "$BC_PASSED"
>>   mountpoint -q -- "$mntdir" ||
>>   	run mount -t tmpfs tmpfs "$mntdir" ||:
>>   
>> -stepnum=0
>> -chainsteps="$BOOTCHAIN"
>> -datadir=
>> -destdir=
>> -
>> +rc=0
>>   while [ -n "$chainsteps" ]; do
>>   	name="${chainsteps%%,*}"
>>   	exe="$handlerdir/$name"
>> @@ -54,53 +54,96 @@ while [ -n "$chainsteps" ]; do
>>   		prevdir=
>>   		message "[0] Step '$name' has been passed"
>>   
>> +	elif [ "$name" = retry ]; then
>> +		chainsteps="${chainsteps#$name}"
>> +		chainsteps="${chainsteps#,}"
>> +		bcretry=1
>> +		message "subsequent steps will restart after failure"
>> +
>> +	elif [ "$name" = noretry ]; then
>> +		chainsteps="${chainsteps#$name}"
>> +		chainsteps="${chainsteps#,}"
>> +		bcretry=0
>> +		message "daemon will be stopped immediately after any step failure"
>> +
>>   	elif [ -x "$exe" ]; then
>>   		assign "callnum" "\${callnum_$name:-0}"
>>   		datadir="$mntdir/src/step$stepnum"
>>   		destdir="$mntdir/dst/step$stepnum"
>>   
>> -		[ "$stepnum" != 0 ] ||
>> -			prevdir=""
>> -
>>   		run mkdir -p -- "$datadir" "$destdir"
>>   
>> -		if ! mountpoint -q "$destdir"; then
>> +		if mountpoint -q -- "$destdir" ||
>> +			[ -s "$destdir/DEVNAME" ] ||
>> +			[ -b "$destdir/dev" ] ||
>> +			[ -c "$destdir/dev" ]
>> +		then
>> +			message "[$callnum] Handler: $exe skipped"
>> +		else
>>   			message "[$callnum] Handler: $exe"
>>   
>>   			export name callnum datadir destdir prevdir
>>   
>> +			for try in 1 2 3 4 5; do
>>   				[ -z "$BC_DEBUG" ] ||
>>   					run "$handlerdir/debug" ||:
>>   				rc=0
>>   				run "$exe" ||
>>   					rc=$?
>> -
>> -			if [ "$rc" != 0 ]; then
>> -				[ "$rc" != 2 ] ||
>> +				[ "$rc" != 0 ] ||
>>   					break
>> -				message "[$callnum] Handler failed (rc=$rc)"
>> -				sleep 1
>> -				continue
>> -			fi
>> -		else
>> -			message "[$callnum] Handler: $exe skipped"
>> +				[ "$rc" != 2 ] || [ -z "$pipeline_mode" ] ||
>> +					break 2
>> +				message "[$callnum] Handler failed (rc=$rc, try=$try)"
>> +				[ ! -f "$BC_PASSED/$PROG" ] ||
>> +					break 2
>> +				[ "$bcretry" != 0 ] ||
>> +					break
>> +				sleep 2
>> +			done
>> +
>> +			[ -r "$BC_NEXTCHAIN" ] ||
>> +				run touch "$BC_PASSED/$name"
>> +			[ ! -f "$BC_PASSED/$PROG" ] ||
>> +				break
>> +			[ "$rc" = 0 ] ||
>> +				break
>>   		fi
>>   
>> -		prevdir="$destdir"
>> +		if [ ! -r "$BC_NEXTCHAIN" ]; then
>> +			callnum=$((1 + $callnum))
>> +			assign "callnum_$name" "\$callnum"
>> +			eval "export callnum_$name"
>> +		fi
>>   
>> -		callnum=$(($callnum + 1))
>> -		eval "callnum_$name=\"\$callnum\""
>> +		stepnum=$((1 + $stepnum))
>> +		prevdir="$(readlink-e "$destdir" 2>/dev/null ||:)"
>>   	fi
>>   
>> -	chainsteps="${chainsteps#$name}"
>> -	chainsteps="${chainsteps#,}"
>> +	if [ ! -r "$BC_NEXTCHAIN" ]; then
>> +		chainsteps="${chainsteps#$name}"
>> +		chainsteps="${chainsteps#,}"
>> +	else
>> +		debug "chain will be reloaded by $BC_NEXTCHAIN:"
>> +		fdump "$BC_NEXTCHAIN"
>> +		. "$BC_NEXTCHAIN"
>> +		run rm -f -- "$BC_NEXTCHAIN"
> Я специально дочитал до конца патчсета, но так и не понял кто и где
> будет формировать BC_NEXTCHAIN ?

В данном патчсете никто не перегружает цепочку, этим занимается altboot. 
Но слон большой, мы решили его есть по частям, отдельными фичами. 
Поэтому 14-й патч почти вхолостую добивает недостающие функции, 
применение их можно будет увидеть в следующих частях.

Замену пропагатора (altboot) можно было бы сделать на скриптах отдельной 
монолитной фичей make-initrd, независимой от pipeline. Сейчас мы львиную 
долю сил тратим на прослойку между make-initrd и altboot. И выбрал я 
этот путь для соблюдения совместимости с pipeline по двум причинам: 
во-первых, допилить недостающее (в плане диалогов) в pipeline, 
во-вторых, pipeline -- концептуально простое будущее, для которого пока 
нет поддержки в stage2, оно же настоящее, если использовать совместно с 
altboot, когда необходимо добиться чего-то необычного. Например, четыре 
джойстика от штурвала в самолёте как символьные устройства должны быть 
подключены до начала традиционной загрузки.

> И как это будет работать с waitdev так как BC_NEXTCHAIN проверяется после
> каждого шага ?

Про это я уже ответил, но вообще waitdev и перегрузка цепочки между 
собой совсем не связаны. Если кому-то понадобится новый маршрут, то до 
выхода он должен сообщить демону, что следующие шаги должны быть 
такими-то: 
http://git.altlinux.org/gears/m/make-initrd-bootchain.git?p=make-initrd-bootchain.git;a=blob;f=bootchain-altboot/data/lib/bootchain/altboot;h=c2ab9e8d16772cd9143f28527084fc7f49ec0b81;hb=d9135c3936ee28b0153746d690724c6f650b5a07#l157

> Получается BC_NEXTCHAIN может сформировать только следующий после waitdev
> шаг.

Да. Вообще любой произвольный шаг может перегрузить не только цепочку, 
но и поменять другие внутренние переменные демона, так задуман процесс 
перегрузки шагов. Другое дело, что пока это нигде не используется.

>
>> +	fi
>>   
>> -	stepnum=$(($stepnum + 1))
>> +	debug "remaining steps: $chainsteps"
>>   done
>>   
>>   [ -z "$chainsteps" ] ||
>>   	message "remaining steps after breaking loop: $chainsteps"
>>   
>> +if [ "$rc" = 2 ] && [ -n "$pipeline_mode" ]; then
>> +	debug "finishing in pipeline mode"
>> +elif [ "$rc" = 0 ] && [ -f "$BC_PASSED/$PROG" ]; then
>> +	debug "finishing in bootchain mode"
>> +else
>> +	fatal "daemon terminated incorrectly (rc=$rc)"
>> +fi
>> +
>>   if [ -z "$BC_DEBUG" ]; then
>>   	grep -qs " $mntdir/" /proc/mounts ||
>>   		run umount -- "$mntdir" &&
>> -- 
>> 2.24.1
>>
>> _______________________________________________
>> Make-initrd mailing list
>> Make-initrd@lists.altlinux.org
>> https://lists.altlinux.org/mailman/listinfo/make-initrd
>>

-- 
Best regards,
Leonid Krivoshein.