Visionaire Studio Bugfix Update 5.0.5

#40, by afrlme Tuesday, 05. June 2018, 20:12 8 years ago

Synchro is interesting, but I don't understand in English how it works.
I need a frame for each phonem, right? I need such a text file with a translation between the shape and a word in the sound folder?

For example I have a text: "Hallo, mein Name ist Gregor". I have a sound file with this words, but of course they doesn't take the same time. Often the player jumps forward.
I need three or five phonems (not to every word): perhaps a, o, m, i. Mostly it's open mouth with maybe three shapes and a closed mouth. How I fit this together? Is there a German tutorial?

Meanwhile the character speaks other animations are not possible (or I have to create speech animations wirh a lot of gestures...)?

@machtnix you could create lots of animations for the speech if you wanted or outfits that has the character talking with gestures or you could layer the mouth as a different character to allow the character to gesture while talking.

As for the mouth shapes, did you check out the linked Rhubarb website? If you scroll down the page there is a section about mouth shapes. Technically you don't actually have to use Rhubarb to generate the lip sync frames or timestamps, you can manually create them if you want or edit the files Rhubarb generates as you can replace the letters with frame numbers instead, which means that you could actually insert a few additional timestamps if you wanted to have the character play the same mouth shape but animate in a gesture or mood change.

Um, I believe the creator of Rhubarb Lip Sync is actually German. You could send him a message via twitter & see if he has a German version of the instructions laying around?

https://twitter.com/RhubarbLipSync

#41, by sebastian Tuesday, 05. June 2018, 20:15 8 years ago

arghh i should read until the end razz

@machtnix i'll write later and describe the steps in german when im home. Not much to do. smile

EDIT

Also:
Im Grunde darfst du dich erstmal nicht von den vielen Tabellen zu Phonemes ablenken lassen. Hier ging es um ein vollkommen unabhängiges Skript, welches anhand eines eigens befüllten Wörterbuchs die Phoneme und somit die Animation steuert -anhand des geschrieben Texts, nicht anhand der Audiodatei.

Was SimonS nun eingebaut hat ist die Möglichkeit neben seiner Audiodatei eine .tsv Datei zu haben, welche Zeit und Phoneme-Angaben beinhalten. Diese .tsv-Dateien werden mit einem _externen_ Programm Namens Rhubarb erstellt, können aber auch generell per Hand geschrieben werden. Das Programm analysiert die Audiodatei und spuckt entsprechend die .tsv Datei aus.

Nutzt du also in deinem Charaktertext Aktionsteilen eine Sounddatei (z.B. hallo.ogg), sucht Visionaire nun automatisch nach einer Datei hallo.ogg.tsv im selben Unterordner. Findet es diese, wird die Sprechanimation anhand dieser Datei vorgenommen. Die Sprechanimation muss daher folgende Frames enthalten :

Frame 1: Phoneme A

Frame 2: Phoneme B

Frame 3: Phoneme C

Frame 4: Phoneme D

Frame 5: Phoneme E

Frame 6: Phoneme F

Frame 7: Phoneme G

Frame 8: Phoneme X

Visionaire wählt also anhand der .tsv Datei nach den entsprechenden Zeiten fix einen dieser Frames aus, um synchron mit dem gesprochenen Wort den Mund zu animieren.

Sollte keine Audiodatei eingesetzt sein, wird die Sprechanimation ganz normal von Frame 1-8 abgespielt. Gegebenenfalls macht es hier also Sinn die Frames zufällig abzuspielen.

Grüße

Sebastian

#42, by Machtnix Tuesday, 05. June 2018, 21:19 8 years ago

Vielen Dank für das Mini-Tutorial!

Nochmal langsam zum Mitschreiben:

Ich lege eine Textdatei an, die max. 8 verschiedene Phoneme enthalten darf.

Zu jedem Phonem brauche ich natürlich den exakten Zeitpunkt, zu dem es gestartet werden soll, ggf. auch die Dauer, der mit der Sounddatei übereinstimmen muss.

Das würde in etwa so aussehen (für Abrakadabra)?:

Phonem A: 10 ms (A)

Phonem B: 10 ms (für B, R)

Phonem A: 10 ms (A)

Phonem B: 10 ms (auch für K zu benutzen)

usw.

Beide Files starten quasi gleichzeitig. Das Textfile triggert die entsprechende Animation im Sprechanimation-Ordner.

Dafür würde ich z.B. 8 Animationen erstellen. Mit Einzelframes komme ich nicht gut klar, ich bevorzuge für jedes eine eigene Animation.

Das Programm Rhubarb nimmt mir die automatische Zeiteinteilung ab - ich kann die Millisekunden aber auch selber setzen.

Diese Textdatei gehört in denselben Ordner wie die Sounddatei und muss genauso heißen.

Da ich natürlich meine Texte auf Deutsch einspreche, kann es sein, dass das automatische Programm Rhubarb Vokale wie Ü oder Z nicht richtig erkennt; es wird aber offenbar trotzdem den Startpunkt bei einem Phonemwechsel korrekt eintragen, oder?

Das heißt aber, dass nicht die Sounddatei selbst die Synchro triggert (wie soll das auch gehen? Mit einem 1000Hz-Ton vielleicht, wie bei Verkehrsmeldungen?), sondern der Ton parallel zur Aktion läuft - was natürlich immer mal asynchron werden kann (z.B. wenn der PC hakt, oder der Spieler was anderes klickt).

Habe ich das in etwa richtig verstanden?

---------

Google Translator:

Again, to write down: I create a text file, max. 8 different phonemes may contain. For each phoneme I need of course the exact time at which it should be started, possibly also the duration that must match the sound file.

That would look like this (for Abracadabra) ?:

Phoneme A: 10 ms (A)

Phoneme B: 10 ms (for B, R)

Phoneme A: 10 ms (A)

Phoneme B: 10 ms (also to use for K) etc.

Both files start almost simultaneously. The text file triggers the corresponding animation in the speech animation folder. For that I would e.g. create 8 animations. I do not get along well with single frames, I prefer a separate animation for each one.

The program Rhubarb makes the automatic timing - but I can set the milliseconds myself also. This text file belongs in the same folder as the sound file and must be named the same.

Since, of course, I agree with my texts in German, it may be that the automatic program Rhubarb does not correctly recognize vowels like Ü or Z; but apparently it will still enter the starting point in a phoneme change correctly, right?

But that does not mean that the sound file triggers the synchro itself (how can that be done?), but the sound runs parallel to the action - which of course can always be non-synchronized (eg if the pc hooks, or the player clicks something else). Did I understand that correctly?

Machtnix

#43, by sebastian Tuesday, 05. June 2018, 22:00 8 years ago

du hast ganz normal deine Redeanimation wie zuvor auch (pro richtung). In dieser Animation gibt es 8 Frames, die für die Phoneme stehen.

Dabei steht Frame 1 für A, 2 für B, etc bis 8 für X (Mund zu/stumm).

Wie "A", "B", etc auszusehen haben (also die Mundformen) steht auch auf der Rhubarbseite beschrieben.

Du musst selbst erstmal keine textdatei anlegen. Dies macht Rhubarb für dich, indem du das Programm mit der Audiodatei nutzt, die du im Spiel verwenden willst.

Rhubarb nimmt also deine z.b. .wav Datei und erstellt selbstständig eine Textdatei.

Diese Datei fügst du in den selben Unterordner ein, wie die Sounddatei, die du in Visionaire nutzt. SIe sollte den gleichen namen haben wie die Sounddatei + zusätzlich die Dateiendung .tsv (also audio.wav hat die lipsync datei audio.wav.tsv).

Mehr nicht. Rhubarb nimmt dir also die Zeiteinteilung UND analyse/zuweisung der Phoneme ab.

Phoneme sind nicht direkt Buchstaben oder Laute , sondern eher die Mundform, die der Character beim sprechen hat. Daher hat zB ein B , M und ein P wahrscheinlich das gleiche Phoneme (lippen zusammen) : A (siehe hierzu die Tabelle auf der Rhubarb seite unter "Mouth shapes").

Rhubarb nutzt aber aktuell "englische" regeln für die Spracherkennung. Es kann also sein, dass ggf. das Ergebnis der tsv Datei etwas abgefälscht ist für die deutsche Aussprache, sollte aber dennoch genügen und gut klappen, wenn die genutzte Audiodatei klar und deutlich ist. Es nutzt ja die Laute zum analysieren.

Visionaire selbst triggert beim starten der Textausgabe Aktion und abspielen der Audiodatei (falls verlinkt) intern das Lipsync (falls .tsv Datei vorhanden) und animiert parallel anhand der in der tsv Datei gefundenen Angaben den Charakter.

#44, by afrlme Tuesday, 05. June 2018, 22:38 8 years ago

I think there may be a way to make non-English analyzed better by Rhubarb. You can include a txt file with the text that is being spoken. You could write it out so that it sounds out the words rather than their correct spelling in German. I can't say for sure if it would work, but including a txt file with the text when generating the lip sync file via Rhubarb is supposed to make it much more accurate & it also says that the text doesn't 100% have to match exactly what is being spoken.

Rhubarb generates both timestamps in seconds & a letter A-G or X that represents which mouth shape should be shown. Simon coded the engine to automatically convert A to frame 1, B to frame 2 & so on, but you can as I said replace the letters with numbers. Simon has made the engine more flexible than what Rhubarb can generate on it's own, but Rhubarb is probably more than enough unless you need something more realistic & in that case you can generate your own tsv files & use as many animation frames as you like, the tsv data just has to look like this...

0.00    X
0.03    E
0.13    B
0.27    C
0.41    B

or this...

@Sebastian: did you ask if he has a version that can generate lip sync for German recordings? It would be really nice if his tool supported more languages & allowed you to specify the language as an option in the parameters.

#45, by sebastian Tuesday, 05. June 2018, 23:08 8 years ago

@Sebastian: did you ask if he has a version that can generate lip sync for German recordings? It would be really nice if his tool supported more languages & allowed you to specify the language as an option in the parameters.

nope. Ive read in the issue tracker that he has currently no time to do this but at least posted an idea how to achieve it later in rhubarb. Only time will tell...

#46, by Machtnix Wednesday, 06. June 2018, 00:13 8 years ago

You could write it out so that it sounds out the words rather than their correct spelling in German.

You mean something like eSpeak or Balabolka? Yes I tried it some times ago but it sounds horrible. There are a lot of good voice files (mbrola), but I haven't any idea how to install them into eSpeak or s.th. like that (each speaker variation - female, male, high or low - has more than 60 MB - so it should be good...), I think it's more easier to make it myself...

8 phonems make sense only for very realistic characters - I used only max. 3 phonems for open mouth and 1 for a closed one. Because synchronisation doesn't work I made it randomly.

#47, by sebastian Wednesday, 06. June 2018, 00:27 8 years ago

8 phonems make sense only for very realistic characters - I used only max. 3 phonems for open mouth and 1 for a closed one. Because synchronisation doesn't work I made it randomly.

Rhubarb will not bring you further then. It is used to generate lip sync files based on a pre recorded audio track to imitate the mouth shapes, not the other way around.

Here you can see how it will look on a not so very realistic character (with even no lips):

https://cl.ly/rtQQ

Of course it depends on the audio file if all phonemes are used because not every spoken text contains every sound.

PS: its not "synchronisation" in english for the art of "speaking synchronized" (in films, games, etc). Its "dubbing"

https://www.dict.cc/englisch-deutsch/to+dub.html .

#48, by Machtnix Wednesday, 06. June 2018, 00:35 8 years ago

PS: its not "synchronisation" in english for the art of "speaking synchronized" (in films, games, etc). Its "dubbing" https://www.dict.cc/englisch-deutsch/to+dub.html .

Thanks. I know why I don't like to use English...

#49, by afrlme Wednesday, 06. June 2018, 00:47 8 years ago

I wouldn't call 8 a lot of frames. Well technically 7 & 1 with an idle character animation frame. It's similar to Hanna-Barbera talk animations, which are really basic. A more complex lip sync for a more realistic looking 3D game would require a lot more mouth shapes because there's lots of different sounds & tongue movements too. You can still have 3 frames if you want, just add the same frames multiple times, so frames 1 & 2 use x mouth shape & frames 3 & 4 use y mouth shape, etc. Or just add 3 or 4 frames & edit the generated tsv file & replace the letters with the frame numbers that closely match what you are wanting.

#50, by Machtnix Wednesday, 06. June 2018, 00:58 8 years ago

Yeah, if you want to make a Pixar movie you are right. I think, in Deponia or Edna there were only 2 up to 4, I remember. Because of my characters are very small it doesn't matter how good it fits. I rarely looks at the mouth while speaking...

I used some tools for character voice animation times ago (f.e. Magpie, but the registration fails...), but now I'm starting lazy...