Skip to content

Conversation

@elsid
Copy link

@elsid elsid commented Jan 14, 2026

TextureObjectManager and GLBufferObjectManager store sets of objects per profile. When object is deleted it's removed from the set but the manager keeps storing the set. Over time with more profiles being used there are more sets and a lot of them are empty. Still there is overhead on flushing deleted objects because all sets are checked every frame. With this change after objects are deleted from every set we also erase it if it's empty. Overall this increases performance because less number of sets have to be checked. Fixes https://gitlab.com/OpenMW/openmw/-/issues/8918.

The performace test is done with openmw using cross_cell_border_with_pause.zip Lua mod. It moves player across many cells via rectangular spiral and then back to the original position and rotation with pauses at start and finish. Following scripts are used to start openmw and collect profiles:

Benchmark scripts

scripts/run/cross_cell_border_with_pause.sh

#!/bin/bash -ex

export SRC="$(pwd)"

if [[ "${1}" ]]; then
    cd "${1}"
fi

export BENCH_ID=$(date +%Y-%m-%dT%H-%M-%S)

if [[ "${3}" ]]; then
    BENCH_ID="${3}.${BENCH_ID:?}"
fi

export BENCH_DIR="/home/elsid/dev/openmw/generated/benchmarks/${BENCH_ID:?}"

mkdir -p "${BENCH_DIR:?}"

cp "/home/elsid/.config/openmw/settings.cfg" "${BENCH_DIR:?}/"
cp "/home/elsid/.config/openmw/openmw.cfg" "${BENCH_DIR:?}/"
cp -r "/home/elsid/dev/openmw/scripts/mods/cross_cell_border_with_pause" "${BENCH_DIR:?}/"

export ASAN_OPTIONS=halt_on_error=1:strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
export TSAN_OPTIONS=second_deadlock_stack=1
export UBSAN_OPTIONS=print_stacktrace=1
export OPENMW_OSG_STATS_FILE="/tmp/openmw.stats.${BENCH_ID:?}.log"
export OPENMW_OSG_STATS_LIST="times;resource"

rm -f '/home/elsid/.config/openmw/openmw.log'

"${SRC:?}/scripts/other/perf.sh" &

PERF_SCRIPT_PID=$!

trap "pkill -P ${PERF_SCRIPT_PID:?}; kill -SIGKILL ${PERF_SCRIPT_PID:?}" SIGHUP SIGINT SIGTERM

/usr/bin/time -v \
    ./openmw \
        --skip-menu \
        --no-grab \
        --start "${2:-balmora}" \
        --script-run /home/elsid/dev/openmw/scripts/mods/cross_cell_border_with_pause/setup.mwscript \
        --data /home/elsid/dev/openmw/scripts/mods/cross_cell_border_with_pause \
        --content cross_cell_border_with_pause.omwscripts

cp '/home/elsid/.config/openmw/openmw.log' "${BENCH_DIR:?}/openmw.log"
cp "${OPENMW_OSG_STATS_FILE:?}" "${BENCH_DIR:?}/openmw.stats.log"

wait

cd "${SRC:?}"

scripts/other/report.sh

scripts/other/perf.sh

#!/bin/bash -ex

while true; do
    grep -F first_pause /home/elsid/.config/openmw/openmw.log && break || true
    sleep 1
done

sleep 1

timeout 1s perf record -p $(pidof openmw) --call-graph dwarf -o /tmp/perf.data.first.${BENCH_ID:?} &

FIRST_PERF_PID=$!

trap "kill -SIGKILL ${FIRST_PERF_PID:?}" SIGHUP SIGINT SIGTERM

while true; do
    grep -F second_pause /home/elsid/.config/openmw/openmw.log && break || true
    sleep 1
done

sleep 1

timeout 1s perf record -p $(pidof openmw) --call-graph dwarf -o /tmp/perf.data.second.${BENCH_ID:?} &

SECOND_PERF_PID=$!

trap "kill -SIGKILL ${FIRST_PERF_PID:?}; kill -SIGKILL ${SECOND_PERF_PID:?}" SIGHUP SIGINT SIGTERM

wait

cp /tmp/perf.data.first.${BENCH_ID:?} "${BENCH_DIR:?}/"
cp /tmp/perf.data.second.${BENCH_ID:?} "${BENCH_DIR:?}/"

scripts/other/report.sh

#!/bin/bash -ex

hotspot /tmp/perf.data.first.${BENCH_ID:?} &> /dev/null &
hotspot /tmp/perf.data.second.${BENCH_ID:?} &> /dev/null &
scripts/osg_stats.py --regexp_match --timeseries 'Frame duration|.*time taken' --moving_average_window 60 "/tmp/openmw.stats.${BENCH_ID:?}.log" &
scripts/osg_stats.py --regexp_match --stats 'Frame duration|.*time taken' "/tmp/openmw.stats.${BENCH_ID:?}.log"

With https://gitlab.com/elsid/openmw/-/commit/67480503285d4f5019e8cc5429dde82f3e8487d0 measured number of sets or other entities stored by each manager per frame:

managers_before_vs_after texture_manager_before_vs_after

Frame duration moving average with 60 frames window size:

frame_duration_before_vs_after

A collection to store pending orphaned objects is replaced by std::vector because std::list only gives overhead there. Also a mutex lock to check std::vector emptiness before processing pending orphaned objects is replaced by atomic counting the number of items there. Since this check is performed every frame, potential consistency issues do not matter.

Perf profiles:

Before:

perf_before perf_flush_before

With lock:

perf_after_with_lock perf_flush_after_with_lock

With atomic:

perf_after_with_atomic perf_flush_after_with_atomic

@elsid elsid force-pushed the erase_managers_sets branch from 44f08c4 to b4967d2 Compare January 15, 2026 20:43
elsid added 5 commits January 16, 2026 21:49
To avoid endless accumulation over time.
List has allocation overhead on adding every element. But there is no
reason to use it.
handlePendingOrphandedGLBufferObjects and
handlePendingOrphandedTextureObjects do it.
To avoid locking a mutex when the container is empty.
@elsid elsid force-pushed the erase_managers_sets branch from b4967d2 to d645a50 Compare January 16, 2026 20:50
@Capostrophic Capostrophic merged commit ea1d627 into OpenMW:3.6 Jan 17, 2026
1 check passed
@elsid elsid deleted the erase_managers_sets branch January 18, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants