Lessons Learned from Fine-tuning a Transformer Model on Forex Data

Abiodun Aremu
3 min readSep 11, 2024

--

In this article, I want to share the technical insights and lessons I’ve gained as I continue refining my model. Checkout my previous post in case you are yet to read it https://abiodunaremung.medium.com/my-journey-into-training-transformer-models-lessons-and-insights-40f224273f2f. In my previous post, I achieved a best loss value of 0.6 with the initial model. While this was promising, the model’s predictions were inconsistent when tested across multiple iterations of the same data. It generated a mix of accurate and inaccurate predictions, highlighting the need for further fine-tuning. To improve the model’s learning, I decided to ground it on more relevant data and explore advanced strategies like Dynamic Learning Rate Adjustment, Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO) to enhance prediction consistency.

Fine-tuning and Learning Rate Optimization
The first step in fine-tuning was to work with 20% of the original training data, allowing me to examine how the model’s performance could be optimized with a more selective dataset. A critical aspect of this fine-tuning process was exploring the impact of learning rate adjustments on training outcomes. Previously, I set the learning rate to a fixed value, such as 1e-3, and observed a parabolic training curve. The loss would drop significantly at first, then plateau, and eventually increase again as the model stopped learning effectively — commonly known as overfitting. I tried waiting for it to exhibit double decent but this never happened.

Fig 1 — Parabolic learning curve from constant learning rate

To break through this limitation, I explored dynamic learning rate strategies, including Warmup and Decay, Learning Rate Annealing, and Cyclic Learning Rate. The most effective approach was Warmup and Decay, which allowed the model to avoid local minima and achieve better generalization. By increasing the learning rate when the model’s performance improved and reducing it when the loss increased, I was able to guide the model through periods of stagnation. This approach resulted in lower loss values than if I had continued with a constant learning rate.

Fig 2 — Damped oscillation learning curve showing declining plateau

Here’s an example of dynamic learning rate adjustment code:

def adjust_learning_rate(optimizer, increase=False):
for param_group in optimizer.param_groups:
lr = param_group['lr']
if increase:
param_group['lr'] = lr * 1.1
else:
param_group['lr'] = lr * 0.5

Consistency and Next Steps: RLHF and DPO
After applying this fine-tuning strategy, the model’s inference became more consistent across multiple iterations and datasets, producing more reliable predictions. This is the kind of improvement I would expect to achieve with RLHF or DPO, where user preferences or feedback guide the model’s learning process. I also categorized the model’s predictions into “chosen,” “rejected,” and “prompt” categories as part of a preliminary DPO setup. Though, my application of DPO is still ongoing, the performance is yet to improve when DPO was introduced.

Looking ahead, I plan to continue experimenting with RLHF and DPO to further enhance the model’s predictive power. Alternatively, I may refine the fine-tuning process using an adaptive Warmup and Decay strategy, as this has shown promise in improving performance without the need for extensive computational resources.

Deployment and Next Steps
In my previous article, I promised to share the model’s real-time performance on live Forex data via my social media accounts. While I couldn’t fully commit to that due to the ongoing fine-tuning process, I have now deployed the model on Replicate, and provide a web application where you can observe its predictions in real-time. You can access the model at https://fxchartai.com and see how it performs on Forex predictions.

Fig 3 — Deployed model prediction

I’d love to hear your feedback and insights on its performance, so feel free to reach out to me on social media @abiodunaremung. Together, we can continue improving the model and refining its applications in the world of financial markets.

--

--