Massive Data Streams Course

Prof. Daniel Keren, dkeren@cs.haifa.ac.il

You can bring ONE DOUBLE-SIDED A4 page to the exam.
Also -- choice will be 2 of 4, not 3 of 5.

Notes on the exam.

Summary (intro, useful inequalities, reservoir sampling,
sketches -- count-min, AMS)

Brief summary of Bloom filter

Bounding the incremental change in a model, e.g. classifier

Stanford course slides. Sampling: general, random, Reservoir
Sampling (p. 19-21), exponential histograms (p. 22-end).

More Stanford slides: Bloom filter, distinct count (Flajolet Martin),
AMS sketches (very basic)

A long, excellent summary by Prof. Minos Garofalakis: basics,
tail inequalities, sampling, AMS sketches, count-min sketch, Flajolet Martin,
Sliding Window, Exponential Histograms, Distributed Data Streaming.

Monitoring distributed streams: Arnon Lazerson's PHD lecture. Especially relevant
for us is the material is slides 25-32 and 36-47. If you're interested, a recently accepted
paper on "Lightweight Monitoring", with many details, can be found here . I expect
that you understand the ideas of convexity, and how they can be used for dynamic,
distributed monitoring.

Monitoring distributed graphs: Gal Yehuda's lecture from IPDPS 2017. I expect that you
understand the basic ideas in the presentation (no need to know the proof of the theorem on
p. 14), up to slide 21 (no need to know theorem on p. 22 and the material on monitoring the
number of triangles).