Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nominal vs. effective sampling rate #80

Open
cbrnr opened this issue Nov 15, 2021 · 25 comments
Open

Nominal vs. effective sampling rate #80

cbrnr opened this issue Nov 15, 2021 · 25 comments

Comments

@cbrnr
Copy link
Contributor

cbrnr commented Nov 15, 2021

I recently worked with an XDF file containing two streams, an EEG stream and a marker stream. I noticed that when using the nominal sampling rate (e.g. 1000 Hz), the two streams drift over time when the effective sampling rate differs (even slightly). In my data, the effective sampling rate was 1000.01218...Hz. At the end of the recording, this differences adds up to several milliseconds.

This difference can be problematic if I want to match marker stream events to events (e.g. spikes) in a channel in the EEG stream, because their difference will increase with time. Therefore, I was wondering if I should just use the effective sampling rate for the EEG stream. Is this a good idea? What is more precise, the nominal sampling rate claimed by the amp or LSL time stamps (which in most cases will use standard computer clocks)?

@mgrivich
Copy link

I've never seen a situation where nominal sampling rate is precise enough. Generally, if you have a drift problem in effective sampling rate, there is some problem with your underlying data. One simple check is to measure the delta between nominal (not effective) timestamps coming from your eeg amplifier to check for dropped samples. If that is not it, I'd be suspicious of the quality of your spike event times. I discuss much more along these lines here: https://sccn.ucsd.edu/~mgrivich/Synchronization.html

@mgrivich
Copy link

I reread what you wrote. Yes, you should use the effective sampling rate. Only if you have problem with the effective sampling rate should you pay attention to my previous comment.

@cboulay
Copy link
Contributor

cboulay commented Nov 15, 2021

If you mean

Which is more accurate?

Then I don't think there's an answer to that question. The best you can do is "which clock do you trust more?".

By default, EEG timestamps are dejittered, which basically fits the recorded timestamps to a line, assuming the inter-sample interval is identical throughout the continuous segment, and any jitter in timestamps (assumed to be caused by software clocks) is removed. However, this does shift timestamps around, especially if one part of the recording had a faster sampling rate than another part, which could happen with e.g. temperature changes in the hardware. You might want to plot the delta between the original timestamps and the dejittered timestamps to see if it is non-uniform.

If you are experiencing clock drift, then you can choose to not dejitter your timestamps then I expect the delay between the event timestamp and the evoked event in the EEG will be consistent.

BTW, it's not impossible that there's an off-by-one error in this block of code, which would give slightly off effective rates:

pyxdf/pyxdf/pyxdf.py

Lines 655 to 673 in 5384490

# Process each segment separately
for start_ix, stop_ix in zip(seg_starts, seg_stops):
# Calculate time stamps assuming constant intervals within each
# segment (stop_ix + 1 because we want inclusive closing range)
idx = np.arange(start_ix, stop_ix + 1, 1)[:, None]
X = np.concatenate((np.ones_like(idx), idx), axis=1)
y = stream.time_stamps[idx]
mapping = np.linalg.lstsq(X, y, rcond=-1)[0]
stream.time_stamps[idx] = mapping[0] + mapping[1] * idx
# Recalculate effective_srate if possible
counts = (seg_stops + 1) - seg_starts
if np.any(counts):
# Calculate range segment duration (assuming last sample
# duration was exactly 1 * stream.tdiff)
durations = (
stream.time_stamps[seg_stops] + stream.tdiff
) - stream.time_stamps[seg_starts]
stream.effective_srate = np.sum(counts) / np.sum(durations)

It could use a double or triple check.

@agricolab
Copy link
Member

It seems it boils down to whether generally, effective sampling rate is more accurate then nominal sampling rate?

Some EEG devices are medical grade. I find it hard to believe that the sampling rate of a medical grade EEG drops samples or has a higher drift than a OTS computer clock.

@mgrivich
Copy link

However, this does shift timestamps around, especially if one part of the recording had a faster sampling rate than another part, which could happen with e.g. temperature changes in the hardware.

Generally this effect is small enough to be neglected for EEG style experiments. However, if you would like to eliminate it, the most effective solution I've found is to have a rolling linear timestamp fit to linearize the timestamps. Essentially, each timestamp uses only the recent few minutes of data to be corrected. This is the technique that I used in the LabStreamer (https://www.neurobs.com/menu_presentation/menu_hardware/labstreamer) to get precision down to 0.1 ms. Unfortunately, the publicly available XDF readers have not implemented this (as far as I know).

Some EEG devices are medical grade. I find it hard to believe that the sampling rate of a medical grade EEG drops samples or has a higher drift than a OTS computer clock.

When doing science, it is best not to "find it hard to believe" but to verify with data and analysis.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 15, 2021

Thanks for your responses. Let me try to clarify with an example. My goal is to measure the delay from showing a stimulus (a rectangle changing from white to black every 0.5 s) in PsychoPy to the actual stimulus appearing on the screen. I'm recording the following two streams:

  1. EEG stream (stream ID 1) consisting of three channels (only two of which are relevant in this context):
    • Channel AUX_1 contains the photo sensor signal
    • Channel STETriggerIn contains an analog channel with a signal sent out over the serial port on stimulus onset.
  2. Marker stream (stream ID 2) consisting of LSL markers for stimulus onset (0) and offset (1).

Apparently, LSL markers are always first, followed by the signal coming in over the serial port (which includes the amplifier delay), and finally the photo sensor signal change on the screen (which includes the screen delay).

Among other things, I'm interested in the difference between the serial port triggers and LSL markers over time. For the analog serial port signal, I can either use the locations of the peaks assuming the nominal sampling rate (1000 Hz) or the effective sampling rate (1000.0120858111602 Hz):

Figure_1

Notice the drift in the left panel because the clocks from the LSL marker stream and the nominal sampling rate from the EEG amp differ slightly. This is just a 5 minute recording resulting in 3ms difference at the end, but in a longer recording this difference starts to get significant.

Here's the script that produces these figures:

import matplotlib.pyplot as plt
import numpy as np
from pyxdf import load_xdf


streams, header = load_xdf("BCI_Event2021_TimingTest_LA.xdf")

fs = float(streams[0]["info"]["nominal_srate"][0])
first_timestamp = streams[0]["time_stamps"][0]

markers = np.round((streams[1]["time_stamps"] - first_timestamp) * fs)[::2]

data = streams[0]["time_series"][:, 2]
triggers = np.where(data > 0)[0]
triggers2 = np.round((streams[0]["time_stamps"][data > 0] - first_timestamp) * fs)

fig, ax = plt.subplots(1, 2)
ax[0].plot((triggers - markers) / fs * 1000)
ax[0].set_ylabel("Difference (ms)")
ax[0].set_title("Nominal sfreq")

ax[1].plot((triggers2 - markers) / fs * 1000)
ax[1].set_ylabel("Difference (ms)")
ax[1].set_title("Effective sfreq")

I can also share the data file but I don't know what's the best way to do that.

Also, here's a plot of the first two stimulus repetitions. The orange line corresponds to an LSL marker, the black peak in STETriggerIn is the serial port signal, and AUX_1 the photo sensor signal.

Screen Shot 2021-11-15 at 20 59 28

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 15, 2021

@cboulay dejittering or not doesn't make a difference here. I also think it would be a good idea to double-check that code segment though, but I'm pretty sure it's not relevant for this issue.

@agricolab
Copy link
Member

agricolab commented Nov 15, 2021

If i read this correctly, the markerstream has no effective sampling rate, right? I.e. you only send markers when an event occurs, and not at fixed intervals, do you?

And what EEG system did you use?

@agricolab
Copy link
Member

When doing science, it is best not to "find it hard to believe" but to verify with data and analysis.

I fully agree. My argument is based on the fact that medical devices are required to validate and verify their behavior, while an OTS PC clock doesn't follow such strong regulations. That makes it more likely that an OTS PC clock drifts versus an amp clock, but certainly, nothing beats measurements.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 15, 2021

If i read this correctly, the markerstream has no effective sampling rate, right? I.e. you only send markers when an event occurs, and not at fixed intervals, do you?

Yes.

@cboulay
Copy link
Contributor

cboulay commented Nov 15, 2021

@cbrnr
I don't know if it's directly relevant to your issue, but you might enjoy https://link.springer.com/article/10.3758/s13428-021-01571-z

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 15, 2021

Thanks for the link, that's a really good paper (I've seen it before). We actually still use a parallel port in our own recordings, and this example data used a serial port, so no USB involved. @sappelhoff maybe you can chime in?

@agricolab
Copy link
Member

I remember a similar experience of a clock drift between a USB serial trigger and event-based LSL Markerstream too, so at least you are not alone. But there, the Marker time stamps had a lot of jitter in addition to clock drift... You might want to try a continual marker stream at a fixed sampling rate? That allows you to estimate the effective srate of the markerstream posthoc, and could be a way to measure reliability of the clocks, maybe?

@mgrivich
Copy link

Your effective sfreq plot shows what things look like when they are working. Use that.

@mgrivich
Copy link

The whole purpose of effective sampling rate is to get rid of the error in the nominal sampling rate plot, as nominal does not accurately represent what the devices do.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 16, 2021

It is still not clear to me why the effective sampling rate is more accurate than the nominal one. This implies that the computer clock used to timestamp markers is more accurate than the oscillator in the EEG amp. However, I could also correct the drift the other way round by adapting the markers to the nominal sampling rate (which I did in my code example).

These two options get rid of the drift, but they are still different because they affect the duration of the signals. What is the correct approach?

@mgrivich
Copy link

You have two clocks. One runs 10% faster than the other. You want them to be synchronized, but don't care about true seconds. Do you make the fast one slower, or the slow one faster? It doesn't matter. This is what LSL and load_xdf does, by default.

Now, let's say that you really do need to know true seconds (you are measuring the speed of light, or are building a GPS network). First you need a clock you absolutely trust, and then you can synchronize everything to that. You'd need a custom version of load_xdf to handle this, with the "most trusted clock" identified. I don't really buy that EEG amplifier clocks are necessarily more trustworthy than computer clocks. Measurements would have to be done, using a trusted clock.

Going back to biology though, the clocks disagree by 1 part in 100000. When you are measuring latencies in milliseconds, this is not something that is going to affect your publishable results.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 16, 2021

The difference is negligible for most neuroscientific studies, I agree. Maybe it is just wrong to use the time series indices (like I did in triggers), and the correct approach is to use time stamps (like I did in triggers2). Then I can continue to use the nominal sampling rate (not knowing if that is the "correct" clock, but at least it will not suprise people if they see 1000Hz as compared to 1000.01232Hz).

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 17, 2021

OK, forget my last comment, I'm confused. There are two options to correct the drift between marker time stamps and regularly sampled streams (that claim to be sampled at their nominal sampling rate):

  1. Assume the nominal sampling rate is inaccurate and use the effective sampling rate, which is based on timestamps. Marker streams already use time stamps with the same clock, so this gets rid of any drift.
  2. Assume the nominal sampling rate is accurate and the LSL clock is inaccurate, use the nominal sampling rate and correct time stamps in all marker streams accordingly.

Is this an accurate summary of what's going on?

@sappelhoff
Copy link
Contributor

My goal is to measure the delay from showing a stimulus (a rectangle changing from white to black every 0.5 s) in PsychoPy to the actual stimulus appearing on the screen.

I would measure this via the serial port trigger and the photodiode, both signals recorded through the EEG system, which synchronizes streams. That also includes trusting the EEG device clock/hardware/software, because you gotta trust something. Why do you want to measure this using LSL markers in addition?

And is this question of yours: "Among other things, I'm interested in the difference between the serial port triggers and LSL markers over time. " just a result of noticing an (unexpected?) difference/drift? Or was there some consideration you had before measuring?

Generally this effect is small enough to be neglected for EEG style experiments.

agreed

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 17, 2021

@sappelhoff we did include signals from the photo sensor and serial port trigger, but we were also interested in how LSL markers behave. It is much simpler to not having to record the serial port signal, because specifically with mobile amps you don't want to connect a whole bunch of cables and relying on LSL markers works over the network. The drift we noticed in the result lead to these questions, and I think it's interesting to be aware of these properties. So the bottom line for me is to always use the effective sampling rate when working with LSL markers and some regularly sampled stream(s).

@behinger
Copy link

we also stumbled upon this issue. One thing not really discussed here is that the difference between clocks is different between recording sessions (in our setup), which leads to EEG data with 1000.1Hz and 999.9Hz.

In other words, we would have different sampling rates for our EEG analysis for different subjects. Super annoying!

While no clock is better, I'd just like to use the EEG one because it has (typically) even sampling rate. In my Julia setup I fix it, but as I can see this is not possible right now in pyxdf, correct?

@cboulay
Copy link
Contributor

cboulay commented Jul 21, 2022

@behinger that is correct, it is not in the code right now.

I can imagine that we could specify one of the stream's with a nominal rate as being "ground truth". Off the top of my head, there are a couple steps to get that to work.

LabRecorder records the clock offsets between the source computers and itself. During import, everything gets transformed into the recording computer's time base. Step 1: Convert all streams into the EEG computer's time base. Step 2: Untransform ALL streams through the previous linear fit (or rolling linear fit) from dejittering the EEG.

This way everything stays synchronized, and your time base would be such that your EEG effective rate will be the same as its nominal rate.

I think this is a pretty nice feature to add to pyxdf if anyone wants to take it on. I wish I had the time.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 21, 2022

I already have this implemented in MNELAB. You can choose to resample to an arbitrary sampling rate, including the nominal one. The function can also be used directly in Python without the GUI.

@agricolab
Copy link
Member

There is also #1 (sccn/xdf#28) which aims in a similar direction and discusses a few issues with resampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants