------------------------------------------------------------------------------
2020-01-30 Technical info
DAT096 2020, by Christian Križan
------------------------------------------------------------------------------

This text file comprises annotations to the 2020-01-30 technical info
lecture, mainly meant for those who cannot open and view annotations in .ppts.
Also, I was somewhat unsatisfied with the .pdf container export.

------------------------------------------------------------------------------

SLIDE 1
Opening slide.


SLIDE 2
General welcome slide. This lecture will, along with other content, cover
upcoming specifications for your development. The lecture will continue by
outlining details in the Ethernet communications module, as this is the module
of which I have received by far the questions concerning since last time.
Finally, as suggested, this lecture will run through the questions of which
I've gotten and answer them. Hopefully, these answers will bring forth further
questions - something which was not possible in an all-classroom forum from
the Canvas massive-reply.


SLIDE 3
You may expect the following documents. Testing hints were teased last time
and are still being worked on. Given the typical layout of DAT096, it is
expected (and should be IMO) that you will face many new topics of EESD. It
could thus be handy to have a cheat-sheet of some typical topics that you
will have to research. Papers on the more alien modules may also be of use to
you.

Given the size of the project, you might run into unforeseen issues that in
turn prompts prioritisation efforts. There will be a last of what is the
bare minimum system, and in what order you may forfeit implementing modules
when time is short.

Given the questions from last time, I have also considered writing a document
on tips and ideas regarding the handling of dropped packages.


SLIDE 4
Clarification on the DAC: the final output of 4 GSa/s is achieved by spamming
a set of eight registers. These registers in turn are clocked at 500 MHz,
thus you will have to treat eight ensuing samples on every strike of the
main system clock. Note that the main system clock chimes at 100 MHz on the
Nexys 4.


SLIDE 5
See text in slide.


SLIDE 6
On the topic of a control word host, I have assumed that generating a graphic
of such a host would be of lesser use. Although it might clarify that
there will be a module in your system, with the sole purpose of listening to
to the other_data_tx / other_data_rx buses. Should it spot a control word,
then it will parse said word and for instance set the NCO to 1 GHz or similar.

The ensuing slides will detail the ARP functionality in greater focus.


SLIDE 7
See text in slide.


SLIDE 8
If I myself where given this project, these are likely the first steps I would
take. In turn, this implies that I would begin by getting the Ethernet
communications module up and running first. That way, I have some way of
communicating with the board in case I wish to run on-hardware development
and functional verification.

Also, it would allow for not focusing on the AXI bus handling at this early
stage of the project, until at least one module is verified as working.

SLIDE 9
The ensuing slides will answer questions as given to me since last time.


SLIDE 10
This slide contains the questions as answered to the class over Canvas, I have
included this answer here as reference. Although, given the response in class,
I got the impression that you read your Canvas messages in a detailed
manner (very good).

> < Could you expand on the stream ID bus between the Ethernet communications and the Central stream controller modules? >
> This field contains what FIFO is targeted for the DAC upstream / ADC downstream, aka. it's intended for the central stream controller's management. The inner workings of the controller are in turn fully up to you to decide.
> 
> The stream ID field in the received payload should be kept super simple. Index 16b0 could for instance point to ADC / DAC FIFO 1 (0...). I realise that whatever you decide here will have to be reproduced in the generated test sequences, thus to be frank the question remains to be asked whether this field should be hard-specified. For now, assume that this field contains the address to the specific FIFO counting from 16b0 onwards.
> 
> < Where does the received data become RLE encoded? >
> It will with 95%+ certainty already be RLE'd from the PC. Because, if you are to receive a stream of 100 identical packets, you will typically receive a control word (in the frame packet) stating "play 'this' payload 100 times." I acknowledge that there should probably be a specification on this rather than going with the original intent that the groups decide on how the RLE is done themselves, as it allows me to append control words in some defined format to the test strings that I will generate for your testing. I believe this will be useful for all groups in the long run.
> 
> < What is the packet sequence number? >
> Ah, so packet sequence numbering is related to the handling of dropped frames. It is simply a running number, which provides an implementer with a relatively easy way to detect that an entire packet was lost in transmission. I believe the module description document specified that you should preferably be able to handle dropped frames, in order to reduce corrupted data.
> 
> My follow-up question is:
> What are you really supposed to do if a packet goes missing? Because, the entire Ethernet communications module has been laid out to lessen FPGA->PC communication as much as possible (which is why it is not using TCP/IP etc.). Slowing (halting, even?) the stream in order to ask the PC for re-sending the lost frame is not particularly feasible when the QPU is already spinning its qubits. More and more quantum coherence is lost every picosecond, aka. time is a crucial constraint.
> 
> Perhaps interpolating the last held sample at the targeted FIFO to the next sample received? Or just holding the current sample where it's at? I widely speculate that there are even papers on how to do this the 'best' way to minimise data loss in the quantum regime.
> 
> If it proves too hard to handle dropped frames, Mats and I have said that we'll consider allowing the assumption that the data transmission from the PC is absolutely perfect.
> 
> < How do we target a DAC channel? >
> The DAC in question, more precisely which of all DAC channels is targeted, is given by the port number in the received Ethernet frame. A data package received on port 30000 will for instance target the first channel of all DAC channels available.
> 
> If this comes as news to you, I dearly apologise as I may or may not have announced after the first technical lecture that there is a module description document in the Project specifications \ Qubit control.
> 
> Regarding its structure, the packet sequence number will simply be a running 16-bit number starting at 16b0 (0000 0000 0000 0000). I notice now that the documentation on this number is slim, I'll get to updating this asap.
> 
> < Which parts of the Ethernet data structure is under our control? Can we decide on packet structure? >
> Pretty much none to be frank.
> 
> I and Mats wanted originally that you should be able to decide pretty much whatever in this regard, but this puts a lot of work on generating test sequences for every single group for the final verification. Thus, the amount of free reign in the host-PC<->FPGA interchange will likely be cut down so that every group conforms to the same host-PC format. Our goal is that as much of the Ethernet documentation as possible should come from standard docs available online.
> 
> Your general approach should be, that if a certain bit sequence is specified to for instance contain the destination MAC, then you may rest assured that the destination MAC will be contained within that bit sequence.
> 
> < Who decides on what samples to be thrown out during windowing? >
> The operator decides, typically you will receive some control word stating to drop the first 4000 samples in some received signal. And, some control word stating to keep 5000 samples in total (aka. samples 4000 to 8999). These two together should be sufficient to format some running pointer in the skip/store block.
> 
> As you see, there will have to be supplementary information on control word syntax, depending on how detailed test sequences the Q-groups want as a whole.
> 
> More information will ensue.
> 
> < Could you describe the sum block in further detail? >
> Let's say that sample A is stored in the upconversion stage FIFO 1 (0), and sample B in FIFO 2 (1). The operator specifies via control word that it is the will to sum A and B.
> 
> The destination output of this operator will then be stage 1 (0) in the original, meaning that stage 2 (1) will go dormant.
> 
> Note: the amount of readout channels M is not necessarily related to the amount of DAC control channels N. Let's say that the operator has hooked up the physical QPU readout line on ADC channel 1 (0), ie. this is were you'll receive the calculated result from the QPU after running operations dictated on control line 1 (0) aka. upconversion line 1 (0).
> 
> The specification is 'element-wise summation,' which would mean that if A = [8 5 4] and B = [2 6 2], the received data in the downconversion lane FIFO 1 will be the result of a computation done on string [10 11 6]. I will ask Mats to comment so that this information is correct, I'm almost certain that it is. 
> 
> < Can you suggest research papers for the modules? More particularly the NCO? >
> Sure thing, I'll send this to all groups once I dig them up. On the top off my head VHDL surf has good starter tutorials on how to get started on the NCO.
> 
> Both Mats and I however do believe that a simple LUT-readout generator will not be sufficient on its own to handle every scenario. You will absolutely certainly not be able to store every single waveform for every thinkable conversion frequency. If you take a look into the datasheets for Xilinx's IP block NCO's, you'll see that things can can really complicated. I should put a link to this datasheet in Canvas.
> 
> Element 14 has been running a series on CORDICs for a while now, maybe those may be of interest?
> 
> < How much time do we approximately need to invest in the respective modules? >
> Super hard to say because it depends on the group and your previous experience. I can give rather hand-wavy values assuming a system ready for final testing. Aka. you have decided not to cut anything down to prioritise some other module. A doc on what (not) to cut is on its way.
> 
> For the Ethernet communications module, say < 15%. This should be the easiest module to get working right and definitely the easiest one to verify.
> 
> For the Central stream controller, < 40%. This portion will contain close to no DSP work etc. which is expected to be a big time sink in DAT096 unless you feature previous experience from a non-stock MPEES-course. The central stream controller module is also rather verifiable in simulation without having to write scripts to export data from Modelsim to Matlab for frequency content analysis etc. Although, Mats is a strong believer in that this module will be the hardest one to implement. And it is frankly more deadline-constrained than all other blocks.
> 
> For the upconversion, say ~30%, of which the IQ mixing will take up a very large portion. Tricky verification, as in a lot of steps (exporting from Modelsim, Vivado or some tool of preference), importing to Matlab and performing FFT analysis or what is determined necessary to verify that the frequency upconversion was successful. There is a document in the works already giving hints on how to do testing in this project.
> 
> For the downconversion, say < 15%, it's pretty much a copy-paste of the upconversion *but* this carries its own hills and valleys. Do not underestimate its easiness just because it's upconversion in reverse, I have seen this mistake many times.
> 
> Now, what I don't want you to do now is to sit down and split your time in chunks according to what I just said. The amount of hands flailing in this estimate was beyond reasonable. Your mileage will vary.
> 
> Now, there are paragraphs in this message which may end with "Yeah, but where is the documentation on that?" - The answer is that it will be available sooner than later. Your questions help influence what documentation is needed and should be prioritised.
> 
> Cheers

SLIDE 11
Lena have received a question regarding the minimum viable system which may
execute the final device testing stages.

This slide outlines what would be expected in such a minimal system.

In fact, such a bare-bones system was recently used to verify a set of
3D-cavity qubits at QTL. One key difference is that data was not streamed
real-time from the PC.

Lena also received a question regarding what makes a module usable in the end.
(Forgive me if I mis-interpreted this question)

The short answer is that completing any of the four major modules yields
something which is usable by someone after this project has ended.


SLIDE 12
The upcoming slides will outline how qubits are controlled at QTL. More
specifically, which waveforms are used and how are they generated.

The purpose is to illustrate what waveforms are contained within the data
in your test signals. Also, what waveforms are expected from the QPU.


SLIDE 13
Illustrated: screenshot from the very program which we use to generate the
qubit control and readout signals. This configuration is for two qubits.

The yellow and red cosine/gaussian-esque pulses are envelopes of control
pulses. These would typically rotate qubit 1 (and 2 respectively) about its
X-axis pi radians. The envelope-part is important: I have chosen not to
add the carrier wave into these envelopes. See slide 16.

The cyan pulse is a so-called DRAG pulse, as is the sinusoidal white pulse
under the red envelope (qubit 2's DRAG pulse).

For reference, I have included a readout pulse. This is the white pulse
which touches the time-axis. To it, I have included the carrier pulse.
See slide 16.

The bluish-pulse is the Q-component of the readout pulse, meaning in turn
that the white readout pulse is the so-called I-component. I have included
an artificial readout trigger in the beginning of this pulse, to show
that the operator has a large degree of control over these pulses.


SLIDE 14
Illustrated: focus on the I-component of qubit 1's control pulse, a simple
cosine envelope. Yet again, the carrier wave is missing. The actual signal
as would be sent onto the QPU would contain a sine-wave underneath
this envelope, a quick search online would easily illustrate to you
how a carrier vs. pulse envelope looks like. Or, see slide 16.


SLIDE 15
I have also included a slide on the readout pulse.

To initiate QPU readout, ie. read the analogue qubit data and decide whether
the qubit is in state |0> or |1>, the operator sends a so-called readout
pulse on the readout transmission line. Often, this line is a separate
wire in the periphery of the QPU core chip. It is in turn connected to
the qubits one might want to read from using some waveguide resonator,
see literature for further details.


SLIDE 16
The readout pulse typically looks like this: a square pulse with some carrier
wave.


SLIDE 17
In this slide, I illustrate the kind of data you receive from the QPU readout.
Do note that in no instance here do we receive a simple 0 or 1. The output
from the QPU is an analogue RF signal.

To the left, top and bottom, we have two VNA sweeps showing that the two qubits
in this particular DUT is alive, as given by the shift in the blue line as
I increase the power of the VNA sweep. This shift is known as a dispersive
chi-shift, and stems from purely quantum-mechanical effects. This test
gives the operator a frequency of which one may listen to the qubit.

The top-centre image is known as a qubit punch-out. The qubit control line
is swept in frequency to determine at what frequency one may control the qubit.
Different qubits will all have different frequencies in this manner.
More particularly, this frequency is the "0-1 transition frequency."
Other spikes will be seen at other transitions between other states, such as
|1> -> |2>.

Top right is a Rabi-oscillation. The qubit is stimulated at the 0-1-transition
frequency with increasing amounts of signal power. The qubit (see bottom-right)
will spin its so-called Bloch-vector |psi> about the X (or Y) axis to different
positions depending on how much energy the operator put on it. This will
yield the energy with which the operator may steer it pi radians, ergo
switching it from |0> to |1>.

Once this is done, we may set it to |1> and see for how long it stays there.
Noise from the environment will decohere the qubit exponentially.
The rate of the energy decay is known as the energy relaxation time,
illustrated in the bottom-centre image. When normalised to 1, the rate of
exponential decay yields a value known as T1. This value is used as a benchmark
between different architectures. There are more values than this,
however we are straying out-of-topic for this project.

The take-home-message is that the operator will receive a lot of useless signal
content (see top-centre) and always receives an analogue waveform.

Digitisation, determining whether the received wave was a |0> or |1>, is
typically something one would include in a QPU interface unit. Although,
we have not included it in this project.


SLIDE 18
I'm working on getting permission to show you the labs :)
Of course, if you're interested.


SLIDE 19
A contract is being worked on, it will be available to you asap.
The wrap-up meeting between me and Mats which will finalise said contract
takes place Wednesday morning the 5th of Feb.


SLIDE 20
Time for open-class questions, if you got any.

Or send these to me at any time,
krizan@chalmers.se


Question: is it OK to swap places of the windowing- and downconversion blocks?
Ans: Yes, if you think this is a better approach. In the digital domain,
you will not be as limited (as compared to the analogue domain) what goes
where. This is the whole premise of so-called translation filters.

Question: may we use IP-blocks?
Ans: Mats has said yes. But, these often come with a lot of bloat. You may
find that these do not fit on the Nexys 4 in the end. Also, it is OK to
write VHDL in such a way that you do 'inferred design' -
ie. selectively using FIFO's on-silicon via the way the synthesiser parsed
your VHDL. Mats has said that he will have to modify this later if you do
inferred design, "but doing it this way is not wrong."


SLIDE 21
Final slide.