Optimizing AI Model Training: How to Use RLHF and RLAIF with Annolive