Recovering Rampant Repetition

The Problem

Historically open weights models have struggled with recovering from repetitive spirals both in the immediate prior few tokens, i.e. repeating the same token endlessly, and more structurally in phrasing or general writing flow.

The Solution

Easy, we train the model on sections of low quality and or repetitive text succeeded by high quality text that maintains the same logical thread as the prior segment. However the real key to this is to mask the low quality section from the loss calculation so as to avoid teaching the model to deliberately be bad before being good.

Some Predicted Pitfalls

My concern right off the bat is harming style matching and the model’s ability to effectively mold to the user’s desired style. To attempt to ward this off my plan is to have the structure of the data to be such that there is a portion of high quality data prefixing the low quality data in the same style as the later unmasked high quality data. This should help the model to learn to maintain the style of the text it is generating while also learning to recover from low quality text. Hopefully it will teach the model to style match better as well since it will be returning to the style present in the initial portion of the text.