Visionaire Studio Bugfix Update 5.0.5

  • #10, by dionousSaturday, 02. June 2018, 19:01 6 years ago
    Lipsync is a great little gift indeed, thanks guys!

    How do you link specific frames to A-X?

    Forum Fan

    246 Posts


  • #11, by sebastianSaturday, 02. June 2018, 19:07 6 years ago
    Frame 1=A,2=B,...8=X
    currently its fixed

    Thread Captain

    2346 Posts

  • #12, by afrlmeSaturday, 02. June 2018, 19:18 6 years ago
    You can also edit the exported tsv files in a text editor & replace ABCAX etc with frame numbers instead if you don't want to have them in the same order in VS.

    Anyway, if you want to just use the exported Rhubarb tsv files without having to edit them, then just check out the corresponding mouth shapes on the Rhubarb github site & use that to select & order 8 frames so that they match the images. The 8th frame should be your character with their mouth closed/resting position - same as your idle character animation.

    @Sebastian, yes you can use ogg or whatever you want in the editor. I think the tsv file just has to include the same filename, format name & .tsv, like so... filename.ogg.tsv.

    The video I recorded earlier is using ogg files. I used wav to generate the tsv file though.

    Imperator

    7278 Posts

  • #13, by esmeraldaSaturday, 02. June 2018, 19:31 6 years ago
    Thanks for the answers.
    Sadly I'm still getting no results.

    I named the frames of the talking animation A-x, named my output file exactly like the audio file, added a .tsv to the name and put it in the same folder as the audio.  Output file looks ok.
    I tried it with .ogg and .wav audio.
    The talking animations plays in same order as the frames (A,B,C...X,A,...) - no lip syncing. In the properties it is set to "play forwards" - do I have to change something?



    Edit:  ah, thank you AFRLme!  Name.Format.tsv  that did the trick!

    Key Killer

    513 Posts

  • #14, by afrlmeSaturday, 02. June 2018, 19:39 6 years ago
    No, you don't need to name the animation files. Just need to import 8 relevant frames into the talk animation that match the 8 mouth shapes. As Sebastian said: A = 1, B = 2, C = 3, D = 4, E = 5, F = 6, G = 7, X = 8. The numbers represent the frame number, as in frames 1 through 8.

    Next you create a display text, add your text to it & link your ogg speech file.

    Now using Rhubarb you create the tsv file using a wav version of your speech file. Import the tsv file into the same folder as the ogg speech file & rename it so it's exactly the same name & format as your speech file, but with .tsv on the end.

    https://i.gyazo.com/ca492f5d218960c95d27e372e7513b33.png

    You might need to check your files/folder options on  windows to make sure it displays the format name & not just the filename otherwise the only way to edit format name is via properties for the file.

    Quick tip: when you create the tsv file with Rhubarb you should consider linking in a txt file that contains the dialog being spoken in the wav file. Apparently it helps Rhubarb analyze & generate more accurate lip sync than letting it try generate them from the wav file only.


    rhubarb -o filename.format.tsv -d dialog.txt --extendedShapes GX speech.wav

    I noticed that by default Rhubarb is using all of the extended shapes. H is not desired as Simon programmed it to only display A-G + X, so you need to declare that it only uses GX extended shapes.

    Imperator

    7278 Posts

  • #15, by esmeraldaSunday, 03. June 2018, 00:22 6 years ago
    Thanks a lot for the detailed description! I didn't include the format into the filename, that was the problem. (naming the frames of the animation A-X was a deed of desperation to get it working :-) I'm glad to hear it's not necessary.)

    And good to know that the mouthshape H isn't desired. That explaines why my talk animation is messed up.

    Key Killer

    513 Posts

  • #16, by shicoSunday, 03. June 2018, 16:25 6 years ago
    Thanks a lot for the update Simon!

    Newbie

    21 Posts

  • #17, by darren-beckettMonday, 04. June 2018, 17:18 6 years ago
    I wrote my own LipSync using a Phoneme lookup in a table for the words being spoken.

    But, I like @Sebastian's approach though.

    I will defintely give this a go

    Great Poster

    384 Posts

  • #18, by sebastianMonday, 04. June 2018, 17:50 6 years ago
    @darren-becket :
    you mean having lipsync based on the written text with no audio?
    Do you check for each character then? 

    Was also trying to implement that as a fallback in my script before SimonS came around... 
    (currently my fallback is just randomized) 

    Still did not had enough time to implement that and have no idea if i still should. 
    Idea was to split up the text string and go through each character and then choose a corresponding phoneme. But id like to add also recognition for combined characters which may look different (to recognize phoneme differences for e.g. strings which include "ee" or "eh" )... 

    Thread Captain

    2346 Posts

  • #19, by darren-beckettMonday, 04. June 2018, 18:29 6 years ago
    @Seb:
    I used this website LINK to convert words into Phonemes
    I then have lookup tables to convert phonemes into visemes and then display each corresponding talk animation frame in sequence (using actions after each frame to lookup the next Viseme)

    It works well - but has no timing to the speech
    --Translation of Phonemes into Visemes(Mouth Shapes)
    
    PhonemeViseme = {}
    
    PhonemeViseme["PAUSE"] = "PAUSE" 
    
    PhonemeViseme["AA"] = "A" 
    
    PhonemeViseme["AE"] = "A" 
    
    PhonemeViseme["AH"] = "A" 
    
    PhonemeViseme["AO"] = "W" 
    
    PhonemeViseme["AW"] = "W" 
    
    PhonemeViseme["AY"] = "A" 
    
    PhonemeViseme["B"] = "M" 
    
    PhonemeViseme["CH"] = "U" 
    
    PhonemeViseme["D"] = "U" 
    
    PhonemeViseme["DH"] = "TH" 
    
    PhonemeViseme["EH"] = "A" 
    
    PhonemeViseme["ER"] = "O" 
    
    PhonemeViseme["EY"] = "A" 
    
    PhonemeViseme["F"] = "F" 
    
    PhonemeViseme["G"] = "U" 
    
    PhonemeViseme["HH"] = "E" 
    
    PhonemeViseme["IH"] = "E" 
    
    PhonemeViseme["IY"] = "E" 
    
    PhonemeViseme["JH"] = "U" 
    
    PhonemeViseme["K"] = "U" 
    
    PhonemeViseme["L"] = "L" 
    
    PhonemeViseme["M"] = "M" 
    
    PhonemeViseme["N"] = "TH" 
    
    PhonemeViseme["NG"] = "M" 
    
    PhonemeViseme["OW"] = "W" 
    
    PhonemeViseme["OY"] = "TH" 
    
    PhonemeViseme["P"] = "M" 
    
    PhonemeViseme["R"] = "R" 
    
    PhonemeViseme["S"] = "Y" 
    
    PhonemeViseme["SH"] = "Y" 
    
    PhonemeViseme["T"] = "Y" 
    
    PhonemeViseme["TH"] = "TH" 
    
    PhonemeViseme["UH"] = "W" 
    
    PhonemeViseme["UW"] = "W" 
    
    PhonemeViseme["V"] = "F" 
    
    PhonemeViseme["W"] = "W" 
    
    PhonemeViseme["Y"] = "U" 
    
    PhonemeViseme["Z"] = "U" 
    
    PhonemeViseme["ZH"] = "U" 
    
    
    
    --Visemes (Mouth Shape) - Frame Numbers
    
    VisemeFrame = {}
    
    VisemeFrame["BLANK"] = 1
    
    VisemeFrame["PAUSE"] = 1
    
    VisemeFrame["A"] = 2
    
    VisemeFrame["E"] = 3
    
    VisemeFrame["F"] = 4
    
    VisemeFrame["L"] = 5
    
    VisemeFrame["M"] = 6
    
    VisemeFrame["O"] = 7
    
    VisemeFrame["R"] = 8
    
    VisemeFrame["TH"] = 9
    
    VisemeFrame["U"] = 10
    
    VisemeFrame["W"] = 11
    
    
    
    SpeechPhonemes = {}
    
    SpeechPhonemes["A"] = { "AH" } 
    
    SpeechPhonemes["AGAIN"] = { "AH","G","EH","N" } 
    
    SpeechPhonemes["AM"] = { "AE","M" } 
    
    SpeechPhonemes["AN"] = { "AE","N" } 
    
    SpeechPhonemes["ANOTHER"] = { "AH","N","AH","DH","ER" } 
    
    SpeechPhonemes["ARE"] = { "AA","R" } 
    
    SpeechPhonemes["AWAY"] = { "AH","W","EY" } 
    
    SpeechPhonemes["BELIEVE"] = { "B","IH","L","IY","V" } 
    
    SpeechPhonemes["BONES"] = { "B","OW","N","Z" } 
    
    SpeechPhonemes["BOOKCASE"] = { "B","UH","K","K","EY","S" } 
    
    SpeechPhonemes["BUT"] = { "B","AH","T" } 
    
    SpeechPhonemes["BUTTONS"] = { "B","AH","T","AH","N","Z" } 
    
    SpeechPhonemes["BYE"] = { "B","AY" }
    ...

    Great Poster

    384 Posts

  • #20, by sebastianMonday, 04. June 2018, 18:33 6 years ago
    So you basically have each written word inside a big table which Includes its phonemes? wow

    Thread Captain

    2346 Posts