The replay stability tests needed adjustments because hit windows have
been materially changed with the previous commit. What matters in the
replay stability tests is covering the time instants near the hit window
edges and ensuring that re-encode doesn't mutate the resulting
judgements, not what the particular numbers used are.