A Progress Report on Dans Personality Engine V1.3.0
The PocketDoc Labs flagship model, Dans-PersonalityEngine-V1.2.0-24b, is good to be sure. However, we are very far from the maximum that can be achieved on a given base model. I want V1.3.0 to feel like a noticeable step towards that potential.
Identified Problems in the Prior Versions
The following significant problems have been encountered during testing:
- Repetition, both on a phrase level and on a grander structural level i.e. the third paragraph beginning with the same type of phrase maybe with some thesaurus work.
- Insufficient phrase and word choice diversity and worse a bias towards phrasing that is generally low quality or simply undesirable by human preference.
- When used for narrative roleplay scenarios it has a hard time acting and speaking for solely the instructed character.
- Poor performance deeper into the trained context, especially beginning in the 8k-16k range.
- Will refuse certain tasks against instructions given certain topics or tasks.
- Trouble staying in character when faced with an adversarial user.
How do we Address These Problems?
A key component of the dataset is a set of user submitted claude 3 opus logs, and to be blunt I screwed up the processing in the previous version. There were multiple mistakes working against the quality of the final product that resulted in the phrase balancing being thrown askew in addition to excess refusals leaking in.
Past the simple act of cleaning the existing data better, there are several avenues to address the listed problems, with none being mutually exclusive. A popular and effective suggestion for providing a multifaceted quality boost would be implementing reinforcement learning and/or preference data (whether human or AI-generated). While the long term plan is to implement reinforcement learning it is not simple or easy to scale without negative side effects. Alternatively we can try to engineer novel SFT datasets that directly address the problem areas which is what the plan is for V1.3.0.
So Then What Now?
The opus logs have already been cleaned and reprocessed so that’s one source of issues down. The engineered datasets are the next step and will be the focus of this version. I plan on doing a combination of story writing and roleplay datasets as the mediums to transfer the desired qualities. Assuming all goes well this will result in a dataset that directly reduces repetition, speaking for the user in RP, and in turn receive an increase in creativity.
Yeah, Okay, And then what?
We train PersonalityEngine V1.3.0 on 7B, 12B, and 24B parameter models with 32K context length for the two smaller models and 128K for the 24B model. The timeline for this is largely dependant on sourcing funding through a sponsor or other means.