-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add initial proposal for V2 recording & playback API. (WIP) #791
base: v2-docs
Are you sure you want to change the base?
docs: Add initial proposal for V2 recording & playback API. (WIP) #791
Conversation
52f230c
to
e635d92
Compare
e635d92
to
d430151
Compare
Based on the latest discussion we have agreed that we'd prefer to avoid introducing a new data type for his feature. Use a byte array and change sampling via function in the audio module
One previous suggestion was to create a function in the lines of To be able to implement something like
We'd have to check how it'd affect sound quality, but we could consider decreasing the sampling rate for While this would be the cleanest way to do this for the user API, I'm not seeing a way in which it can be achieved with the current requirements? Does anybody have any ideas to overcomes this issues? Expand AudioFrame to include sampling rateThis option is similar to the original proposals about creating a new By default they should still behave as they do in micro:bit V1 (and older MicroPython versions for V2), which is 32 samples at 8K (?) sampling rate. But this could be changed via constructor parameters to have any size buffer at any sample rate. Unfortunately, that still leaves us with the awkward case of changing playback sampling rate by changing a variable from the samples class, instead of a method in the audio module. my_recording = audio.AudioFrames(length=11000, rate=5500)
my_recording = microphone.record_into(my_recording)
# Or
my_recording = microphone.record(duration=2000, rate=5500)
audio.play(my_recording, wait=False)
while audio.is_playing():
x = accelerometer.get_x()
my_recording.rate = scale(x, (-1000, 1000), (2250, 11000))
sleep(50) |
e57605c
to
e5c1b73
Compare
@dpgeorge the docs have been updated, let me know if something doesn't match our previous conversation. |
@dpgeorge there are couple of issues related to setting the sampling rate, but shouldn't affect the MicroPython implementation and should be fixed (without changes in the API) in the next CODAL release: https://github.com/lancaster-university/codal-microbit-v2/issues?q=is%3Aopen+milestone%3Av0.2.60+label%3Ap0 |
docs/audio.rst
Outdated
@@ -69,6 +72,13 @@ Functions | |||
|
|||
Stops all audio playback. | |||
|
|||
.. py:function:: set_rate(sample_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be made into a static-method of the AudioFrame
class? That way the scope of it is clear, it only acts on these objects. And it would then be possible to write:
AudioFrame.set_rate(8000)
...
my_frame = AudioFrame(...)
my_frame.set_rate(8000)
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as a static method that would affect all AudioFrame playback instead of individual instances?
I agree that having it as part of AudioFrame
makes it a lot more clear that it only affects playback of this type, but might be confusing when accessing the function via instances:
frame_one = AudioFrame(...)
frame_two = AudioFrame(...)
frame_one.set_rate(8000)
frame_two.set_rate(8000)
...
while audio.is_playing():
frame_one.set_rate(scale(accelerometer values...))
# At this point, I'd expect playback for frame_two to still be 8000?
audio.play(frame_two)
Would it'd be more intuitive if each instance hold its playback rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as a static method that would affect all AudioFrame playback instead of individual instances?
Yes, as a static method calling it once affects all AudioFrame instances.
If we went this way, probably best to document it as AudioFrame.set_rate(...)
and never refer to it as frame.set_rate(...)
. Then it's clear it sets the global rate for all audio frames.
Would it'd be more intuitive if each instance hold its playback rate?
I'm not sure... maybe? But then it would be an instance method and you would always call frame.set_rate(...)
to set the playback rate of that instance.
I think that's possible to implement.
docs/audio.rst
Outdated
AudioFrame | ||
========== | ||
|
||
.. py:class:: | ||
AudioFrame | ||
AudioFrame(size=32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be extended to take optional keyword arguments duration
and rate
(with a default), to make it easy to create an AudioFrame
of a given duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, similar to the previous comment about having an AudioFrame.set_rate()
, if rate
was added to the constructor then we'd need to have set_rate()
as a method changing each instance value, instead of a class static function.
I do like the idea to be able to make the arguments of the AudioFrame
constructor simpler to work out if the user just wants a specific "size in time".
What would happen if the constructor is provided an argument for size
and for duration
? Would that throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there's a difference between record and play rates. The rate here is related to the recording rate, although the actual recording rate is set when you call microphone.record_into()
. And that's different again to the playback rate set by .set_rate()
.
What would happen if the constructor is provided an argument for
size
and forduration
? Would that throw an exception?
Yes it could throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion about specifying the size of an AudioFrame
:
docs/audio.rst
Outdated
The ``audio.play()`` function can consume an instance or iterable | ||
(sequence, like list or tuple, or generator) of ``AudioFrame`` instances. | ||
Its default playback rate is 7812 Hz, and uses linear interpolation to output | ||
a PWM signal at 32.5 kHz. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of this linear interpolation, and just output the 7812 Hz signal directly. I did some tests, and the audio quality is better without the interpolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, in that case let's do that and remove this from the docs 👍
docs/microphone.rst
Outdated
@@ -70,11 +119,61 @@ Functions | |||
* **return**: a representation of the sound pressure level in the range 0 to | |||
255. | |||
|
|||
.. py:function:: record(duration=3000, rate=7812, wait=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove this function altogether, and just have record_into()
. That way memory management is explicit, it avoids situations where there's a lot of heap fragmentation, and also avoids any difficulties with this function returning an AudioFrame
object that is growing over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot the context in which AudioFrame would be growing. I take it that was the case when wait=False
, but if we have to have a known/default value for duration
and rate
, wouldn't the AudioFrame be allocated at the start before the recording occurs?
I still quite like the idea of being able to do audio.play(microphone.record(duration=3000))
.
Would it be simpler if wait
was removed from this version and only available in record_into()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it that was the case when
wait=False
, but if we have to have a known/default value forduration
andrate
, wouldn't the AudioFrame be allocated at the start before the recording occurs?
When wait=False
the returned-AudioFrame will be gradually filling up. If we go with the current implementation where the whole buffer is preallocated at the start of the recording, then I guess it's not too bad, the user just sees blank data that's gradually filling up.
So, we can keep this function. And even support wait=False
. Under the hood it will simply allocate an AudioFrame of the desired size, then pass it to record_into()
.
docs/microphone.rst
Outdated
|
||
Example | ||
======= | ||
.. py:function:: record_into(buffer, rate=7812, wait=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this function can set some new internal state in an AudioFrame
object which indicates the length of the recording (as opposed to the total allocated length of the buffer). Then audio.play()
would use this state to only play the amount that was recorded.
Could then add a method to get/set this length, eg AudioFrame.get_recording_length()
, and AudioFrame.set_recording_length()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I completely agree here, playing with the current branch and recording a couple of seconds into a 5 second "buffer" results in a few seconds of "silent playback".
Would record_into()
then also use this marker to continue recording where it left off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would
record_into()
then also use this marker to continue recording where it left off?
I think that would be confusing. It would require the user to explicitly reset the marker to the beginning to reuse the buffer. Maybe instead record_into()
could have an additional argument which lets the user specify the starting location in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1c3fc31
to
acee09a
Compare
102cd8c
to
24bd949
Compare
docs/microphone.rst
Outdated
|
||
.. py:function:: stop_recording() | ||
|
||
Stops an a recording running in the background. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stops an a recording running in the background. | |
Stops a recording running in the background. |
Fixing typo but that also got me wondering what happens if you record more than once with wait=False (before the first has finished).
Do they all work? If so does this function stop them all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one can be recorded at once, so a previously running recording is aborted at the point you call record()
or record_into()
again (as though you called stop_recording()
first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue to discuss some implementation enhancements would cover this as well:
24bd949
to
513ddf8
Compare
26a1a8f
to
874d32c
Compare
72212bc
to
d37bfa0
Compare
@dpgeorge the docs have been updated with the conclusion from microbit-foundation/micropython-microbit-v2#205 (comment). One thing I've changed that we haven't discussed before was to remove the |
-------------- | ||
|
||
.. py:class:: | ||
AudioRecording(duration, rate=7812) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we try with a default rate of 11000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will update the docs.
|
||
:return: The configured sample rate. | ||
|
||
.. py:function:: copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method? The Python way would be to use the constructor:
new_recording = AudioRecording(old_recording)
(I can't remember if we discussed this or not.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we looked at picking between a copy constructor and a copy()
method, in the end we decided to go with the method because the micro:bit API hasn't used copy constructors before and the copy()
method was already a pattern used in classes like Image
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've implemented AudioRecording.copy()
.
(There's actually also the existing SoundEffect.copy()
, so that's nice and consistent.)
|
||
:returns: a copy of the ``AudioRecording``. | ||
|
||
.. py:function:: track(start_ms=0, end_ms=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these keyword-only arguments, or positional?
Ie do we allow recording.track(123)
or require recording.track(start_ms=123)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave them as positional, users can still use keywords for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they are positional at the moment:
>>> audio.AudioRecording(100).track(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: extra positional arguments given
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now made them positional.
|
||
:param buffer: The buffer containing the audio data. | ||
:param rate: The sampling rate at which data will be stored | ||
via the microphone, or played via the ``audio.play()`` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass in an AudioRecording as the buffer and don't specify the rate, should it take the rate of the AudioRecording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the answer to that is "yes" then the signature would need to be:
AudioTrack(buffer, rate=None)
and None
means:
- if buffer is an AudioRecording (or AudioTrack?) then take the rate from that
- otherwise rate is default 7812 (or 11000?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, originally I was thinking to force the default rate, as it's also a simpler implementation and explanation. But after some thought I think most users would expected the rate to be copied. I'll update the docs to the rate=None
method.
``AudioTrack``, ``AudioRecording`` or buffer-like object like | ||
a ``bytes`` or ``bytearray`` instance. | ||
|
||
:param other: Buffer-like instance from which to copy the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the other is smaller than self, should the remaining bytes be left untouched? Probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add to the docs as well.
a ``bytes`` or ``bytearray`` instance. | ||
|
||
:param other: Buffer-like instance from which to copy the data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to document access via subscript and slicing, and len(audio_track)
operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that commonly documented via the dunder methods?
That might be a bit hard to understand for some of our audience, maybe we should add an explanation before or after the class documentation, or the in the class description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see an example of how CPython documents dict
: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Eg it uses:
len(d)
for lengthd[key]
for subscriptd | other
for binary opsd |= other
for inplace ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to document these somehow as otherwise it's trial and error/reading the C.
I think for AudioTrack we have:
@overload
def __getitem__(self, i: int) -> int: ...
@overload
def __getitem__(self, s: slice) -> AudioTrack: ...
def __setitem__(self, i: int, x: int) -> None: ...
def __add__(self, v: AudioTrack) -> AudioTrack: ...
def __iadd__(self, v: AudioTrack) -> AudioTrack: ...
def __sub__(self, v: AudioTrack) -> AudioTrack: ...
def __isub__(self, v: AudioTrack) -> AudioTrack: ...
def __mul__(self, v: float) -> AudioTrack: ...
def __imul__(self, v: float) -> AudioTrack: ...
So slice access but not slice assignment.
62ea073
to
905577a
Compare
14aa028
to
b129d16
Compare
Docs preview:
This initial proposal has been discussed in:
But we have some open question that will likely result and a rework of some of this.
Initial proposal
The initial proposal in this PR was to create a new
AudioBuffer
class to contain the audio data and sampling rate.The
AudioBuffer.rate
property could then be used bymicrophone.record()
andaudio.play()
to configure recording and playback rates.This was done to avoid introducing a new parameter to
audio.play()
to configure the sampling rate, when it could only work with a single type of sound input (as it might not be possible to change the rate of the SoundExpressions or AudioFrames).Disadvantages
However, changing the rate in a buffer type to change the playback rate in real-time is a bit awkward:
An alternative we considered was to have the playback sampling rate modified via the audio module itself:
However, this would have to set the same rate to everything played via the audio module, and Sound Expression have a different default rate (44K) than recordings (11K). So
audio.set_rate(22000)
should slow down Sound Expression and speed up recordings.Alternatively, if we wanted to change the playback rate via the audio module, we could set a ratio instead. Something equivalent to
audio.set_speed(100%)
(with different semantics). But a disadvantage would be that it's removing some of math/physics learning opportunity to directly relate the sampling rate value with the effects that it has in playback speed.Alternative proposal: bytearray as the buffer type
In this case a byte array would be returned by
microphone.record()
and used withmicrophone.record_into()
.As this data type does not include info about the rate, we depend on the
audio.play()
adding an extra argument that might not work with other sound types like Sound Expressions and Audio Frames.However, we still have the issue of updating the playback rate in real time during playback, which means we might would have to use use a similar approach to the previously mentioned
audio.set_speed(100%)
:Alternative proposal: AudioFrames as the buffer type
This would be the same as the bytearray proposal, but using the existing AudioFrames instead.
We might need to tweak the AudioFrame class to let us user larger buffers, as the default is 32 samples. As
audio.play()
can consume an iterable as well, we would need to figure out a good balance between AudioFrame size and number of AudioFrames in a recording buffer.