If you want complete control then I recommend Lua script, but there's a simple alternative. The dev behind "
Paradigm" opted for a solution that involved creating scenes for each of the closeup interactions that way they could display the correct close up character as an animation/static image & a relevant scene background & be able to control which animations were played & so on. If you use narration text instead of the regular display text then you can change the animations out as needed or force the animation to play between frame x & y. There are various approaches you could use for this.
For an upcoming game called "
Minotaur" I created a system using a mixture of Lua tables, functions along with called by other action blocks in the editor so that on each new text it would automatically display a specific animation, expression animation & play the relevant voice recording based on the active language. Anyway, my point is that there are many ways you could go about sorting this out, so decide on which you think will be the easiest method for you.