Skip to content

Adding csv with Tempo, Ornamentation, Rhythm, creating txt-based "dialogue" #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

morganrivers
Copy link

Hi,

I wanted to use the data from the paper, but I saw that a lot of the useful categorization had not really been implemented on all the data. I didn't change any of your code, just added some code and datasets building on your stuff.

So first I created an Augmented csv that contains the Rhythm and Ornamentation and Tempo directly in there, so future researchers can just access those values directly for each coda.

Also I like the book of whale pdfs, but I thought it would be nice to condense the existing plots you generate into a text-based format.
Of course it's more useful in training LLM's, but also it's good for human readability in some ways too.
So I made a text-based dialogue that looks like this (also I put the part of the book that this text corresponds with):
ScreenShot_2024-09-09_at_12:34:27-AM

File: sw061b
Whale 1:  r3  r5  c3 \c3 -c3 -c3 \c3.
In chorus, whales 1, 2: -c3  a3.
In chorus, whales 1, 2: -c3  C4.
In chorus, whales 1, 2: /c3  a4.
Whale 2: /a4.
In chorus, whales 1, 2:  c4  a5.
In chorus, whales 1, 2: \c4  a4.
In chorus, whales 1, 2: -c4 \a4.
In chorus, whales 1, 2: \c4  E5.
Whale 2:  a4 /a4.
Whale 1: /c4.
Whale 2: -a4.
Whale 1: -c4.
Whale 2: /a4.
Whale 1: -c4.
In chorus, whales 1, 2: \c4 \a4.
In chorus, whales 1, 2: \c4 \a4.
Whale 2: /a4 \a4 -a4.
Whale 1:  r5.
In chorus, whales 1, 2: \R5 -a4.

(No vocalizations, 25 seconds)

Whale 2:  a3.
In chorus, whales 1, 2:  r5  a4.
In chorus, whales 1, 2:  r4  a3.
In chorus, whales 1, 2:  R5 -a3.
In chorus, whales 1, 2:  r4 -a3.
Whale 1: /r4  r5.
In chorus, whales 1, 2:  c3  r5.
Whale 1: -c3  Q4.

The the / or - or \ indicates Rubato, the letters distinguish the 17 possible Rhythms (a->0,...,r->17), the capitalization indicates ornament, and the number indicates tempo 1 through 5.

I converted the whole dataset into this format.

You can look at the two python files and the csv and txt file I added for more specifics.

@morganrivers
Copy link
Author

Having looked at this in more detail, the sequence of the pickle files does not seem to perfectly match the chronological sequence of timestamped whale data, so while the "script" generally matches the book of whale pdf's, it does have some subtle issues. A colleague and I have been working on re-interpreting the raw ICI's into rhythm and ornamentation categories, and have trained a very small transformer to predict ICI's. This separate repository should soon produce a separate script, but more accurately.
whale-gpt

Incidentally, if you could possibly provide any more data with timestamps, that would be really amazing! LLM's are of course very data hungry. We would love to have more click data (critically, tagged with the whale originating, and timestamp of each click or coda).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant