-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
audio_levels.compute returning -inf values #5
Comments
HI, thank you and sorry for the delay. Meanwhile, here is my current code using # -*- coding: utf-8 -*-
"""
Parts of the code are inspired from the lightshowpi project:
https://bitbucket.org/togiles/lightshowpi/
Third party dependencies:
alsaaudio: for audio input/output
http://pyalsaaudio.sourceforge.net/
decoder.py: decoding mp3, ogg, wma, ...
https://pypi.python.org/pypi/decoder.py/1.5XB
numpy: for FFT processing
http://www.numpy.org/
GPU FFT: for GPU FFT processing
http://www.aholme.co.uk/GPU_FFT/Main.htm
"""
from __future__ import absolute_import
import wave
import alsaaudio as aa
import numpy as np
rfft = np.fft.rfft
log10 = np.log10
frombuffer = np.frombuffer
hanning = np.hanning
np_sum = np.sum
np_multiply = np.multiply
np_abs = np.abs
np_delete = np.delete
int16 = np.int16
float32 = np.float32
# Use a multiple of 8
# 4096 uses less cpu than 2048, but light beats are less accurate
CHUNK_SIZE = 2048
CHANNEL_LENGTH = 6
USE_GPU = True # optimize computing using the GPU FFT library and cython/c
def calculate_channel_frequency(min_frequency, max_frequency):
"""Calculate frequency values for each channel"""
print("Calculating frequencies for %d channels." % CHANNEL_LENGTH)
octaves = (np.log(max_frequency / min_frequency)) / np.log(2)
print("octaves in selected frequency range ... %s" % octaves)
octaves_per_channel = octaves / CHANNEL_LENGTH
frequency_limits = []
frequency_store = []
frequency_limits.append(min_frequency)
for i in xrange(1, CHANNEL_LENGTH + 1):
frequency_limits.append(frequency_limits[-1]
* 10 ** (3 / (10 * (1 / octaves_per_channel))))
for i in xrange(CHANNEL_LENGTH):
frequency_store.append((frequency_limits[i], frequency_limits[i + 1]))
print("channel %d is %6.2f to %6.2f " % (i, frequency_limits[i],
frequency_limits[i + 1]))
return frequency_store
def piff(val, sample_rate):
"""Return the power array index corresponding to a particular frequency."""
return int(CHUNK_SIZE * val / sample_rate)
range_channels = range(CHANNEL_LENGTH)
min_frequency = 20
max_frequency = 19500
frequency_limits = calculate_channel_frequency(min_frequency, max_frequency)
freqs_left = [CHUNK_SIZE * frequency_limits[i][0] for i in range_channels]
freqs_right = [CHUNK_SIZE * frequency_limits[i][1] for i in range_channels]
# will store the frequency bands indexes
bands_indexes_cache = {}
hanning_cache = np.array(hanning(CHUNK_SIZE), dtype=float32)
if USE_GPU:
# Use the GPU FFT lib, with cython/c
gpu_audio_levels = None
def prepare():
global gpu_audio_levels
if gpu_audio_levels is not None:
import wake_pi_up
wake_pi_up.log.error("gpu_audio_levels already initialized!")
from rpi_audio_levels import AudioLevels
size = 11
assert 2 ** size == CHUNK_SIZE
gpu_audio_levels = AudioLevels(size, CHANNEL_LENGTH)
def release():
global gpu_audio_levels
if gpu_audio_levels is None:
import wake_pi_up
wake_pi_up.log.error("gpu_audio_levels not initialized!")
gpu_audio_levels = None # deallocation of the object must
# release underlying resources
prepare()
else:
# else we use only Numpy
def prepare():
pass
def release():
pass
data_float = np.empty(CHUNK_SIZE, dtype=float32)
# @profile
def calculate_levels(data, buffer_data, sample_rate, bands=None):
'''Calculate frequency response for each channel
Initial FFT code inspired from the code posted here:
http://www.raspberrypi.org/phpBB3/viewtopic.php?t=35838&p=454041
Optimizations from work by Scott Driscoll:
http://www.instructables.com/id/Raspberry-Pi-Spectrum-Analyzer-with-RGB-LED-Strip-/
:param bands: list allowing to choose which bands to process
:type bands: `list` of `bool`
'''
if len(data) != 2 * CHUNK_SIZE:
print("len(data) != 2 * CHUNK_SIZE : %d != 2 * %d" % (len(data),
CHUNK_SIZE))
# can be the case at the last audio chunk, let's ignore it
levels = [0 for i in range_channels]
return levels, levels, levels
# create a numpy array from the data buffer
# buffer_data = frombuffer(data, dtype=int16)
# data has one channel and 2 bytes per channel
# np.empty(len(data) / 2, dtype=float32)
# data_float[:] = buffer_data[:]
# data = buffer_data
# if you take an FFT of a chunk of audio, the edges will look like
# super high frequency cutoffs. Applying a window tapers the edges
# of each end of the chunk down to zero.
np_multiply(buffer_data, hanning_cache, out=data_float)
try:
bands_indexes = bands_indexes_cache[sample_rate]
except KeyError:
bands_indexes = bands_indexes_cache[sample_rate] = \
[(int(freqs_left[i] / sample_rate),
int(freqs_right[i] / sample_rate)) for i in range_channels]
# Apply FFT - real data
if USE_GPU:
# all is done in C using the GPU_FFT lib, it's 7 times faster
levels, means, stds = gpu_audio_levels.compute(data_float, bands_indexes)
# TODO: use optional bands to avoid computing some levels for nothing
return levels, means, stds
else:
fourier = rfft(data_float)
# Remove last element in array to make it the same size as CHUNK_SIZE
# np_delete(fourier, len(fourier) - 1)
fourier = fourier[:-1]
# Calculate the power spectrum
power = np_abs(fourier) ** 2
# take the log10 of the resulting sum to approximate how human
# ears perceive sound levels
if bands is None:
# calculate for all frequency bands
levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
for i in range_channels]
else:
# some frequency band indexes are specified, we don't need all bands
levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
if needed else None
for i, needed in enumerate(bands)]
return levels
if __name__ == "__main__":
# @profile
def test():
import sys
path = sys.argv[1]
if path.endswith('.wav'):
musicfile = wave.open(path, 'r')
else:
import decoder
musicfile = decoder.open(path, force_mono=True)
sample_rate = musicfile.getframerate()
print("params: %s" % (musicfile.getparams(),))
total_seconds = musicfile.getnframes() / musicfile.getframerate()
total_minutes = total_seconds // 60
print("duration: %s:%s" % (total_minutes, total_seconds % 60))
output = aa.PCM(aa.PCM_PLAYBACK, aa.PCM_NORMAL)
output.setchannels(1) # mono
output.setrate(sample_rate)
output.setformat(aa.PCM_FORMAT_S16_LE)
output.setperiodsize(CHUNK_SIZE)
# Output a bit about what we're about to play
print("Playing: " + path + " ("
+ str(musicfile.getnframes() / sample_rate) + " sec)")
# read the first chunk of audio data
data = musicfile.readframes(CHUNK_SIZE)
while data != '':
# play the chunk of music
output.write(data)
# read the first chunk of audio data
data = musicfile.readframes(CHUNK_SIZE)
# get a numpy array from the raw audio buffer data
buffer_data = frombuffer(data, dtype=int16)
# Compute FFT in this chunk
levels = calculate_levels(data, buffer_data, sample_rate,
bands=None)
test() |
(also please have a look at #4 if not already done as it may be useful too) |
Thank you for the response. Yes I had a lot of inspiration from the other issue and think what I want to do should be possible ( since he got 4096 Bands and I Use only 250). I retried changed my plot command to use bars and retried with the same file and only 10 Bands. This is the result: So I think this is not the issue. I'm setting up a new installation on my Raspi zero W to rule out any external issues Edit: I set up a new installation of Raspian and still get the exact same issues. At least the code is consistent .. |
Hello, I love the idea of being able to program data analysis etc. in python while using GPUI_FFT in c. Great work of implementing the cython bridge! It makes showcases with the raspberry much easier!
My problem:
I am currently trying to fft some single channel .wav data (48000Hz Sample Rate, 0.2s lenght) with rpi-audio-levels, only analyzing frequencies between 500Hz and 3kHz. When I do a spectrum analysis with audacity of the file I get the follwing spectrum:
However, when I try to analyse it with audio_levles.compute the same file with 250 bands á 10Hz will return:
This for Samplesize of 1024:
This for Samplesize of 2048:
This for Samplesize of 4096
It should be easy to distinguish that these spectrums do not look like one another.
Also there should not be any empty fields. However, the levels vector does contain random values or sometimes the Value -inf or for its spectrum. Am I misunderstanding something?
The code snippets:
Main function:
sf.calculate_bands:
sf.get_c0_data
further information
So I varied every parameter which audio_levels.compute allows for: different frequencies, different audio files and always I get levels and means-arrays with rubbish as content. Has anyone got any Idea what may cause this? Spent now over 15 Hours iterating and retrying with no luck at all. Did I critically misunderstand what kind of Data have to be fed into the FFT?
The text was updated successfully, but these errors were encountered: