|
|
Instructor |
Sepehr Assadi |
Credits |
3 units |
Schedule |
Tuesdays 12:00 PM - 3:00 PM in
TIL-116 (Livingston campus) |
Prerequisites |
Undergraduate courses on algorithms, complexity theory, discrete mathematics, and probability; mathematical maturity. |
Syllabus |
The full course syllabus is available here. This webpage contains the highlights of course syllabus that are potentially updated as the semester progresses. |
With the emergence of massive datasets across different application domains, there is a rapidly growing
interest in solving various problems over immense amounts of data. However, even most basic algorithms can become computationally prohibitive when processing massive datasets as the inputs are often too large
to be stored in one place or read even once. As a result, a new set of algorithmic tools and ideas are needed for computing with exteremly constrained resources. This is the focus of
sublinear algorithms, namely, algorithms whose resource requirements (e.g. time or space) are substantially smaller than the size of the input that they operate on.
We will study various advanced algorithmic ideas through the lens of sublinear algorithms in this course. In particular, we consider two most canonical models of sublinear algorithms, namely, sublinear time algorithms and streaming algorithms,
and cover several key algorithmic techniques in these (and related) models, as well as discuss limitations inherent to computing with constrained resources.
Important Update: Following
Rutgers's response to COVID-19 situation, all in-person instructions for this course are suspended
and we will continue our course through online lectures and online meetings for office hours. Please see the
Canvas page for the
course for more information (send me an email if you do not have access to Canvas).
- Instructor: Sepehr Assadi
- Instructor Email: sepehr.assadi@rutgers.edu
- Lecture Schedule: Tuesdays 12:00 PM - 3:00 PM
(
TIL-116 -- online lectures)
- Office hours: Mondays 3:00 PM to 4:00 PM, or by appointment (
CoRE 310 -- online meetings)
This course has no recitation sections.
The following is a tentative list of topics that will be covered in this course.
- Sublinear Time Algorithms: Which problems can be solved in time faster than even reading the entire input once?
We will cover sublinear time algorithms for property testing, distribution testing, and graph problems. We will also examine query complexity as a main tool for proving lower bound on
the performance of sublinear time algorithms.
- Streaming Algorithms: Which problems can be solved in space smaller than what is needed to store the entire input? We will cover streaming algorithms for statistical estimation, numerical linear algebra, and graph problems. We will also examine communication complexity as a main tool for proving lower bound on the performance of streaming algorithms.
Along the way, we will learn about various key ideas such as probabilistic analysis of algorithms, compressed sensing, dimensionality reduction, sparsification, sketching, coresets, etc. that are used extensively in algorithm design as a whole and sublinear algorithms in particular.
The final grade for the course will be based on the following weights:
- 40% Problem sets
- 40% Project
- 20% Scribe notes and participation
Students are expected to follow
Rutgers academic integrity policy for all their work in this course.
See the
course syllabus for more information.
Problem sets: There will be
three two problem sets in the course and a tentative schedule of release and due dates are available on the
course calendar.
Solutions must be typeset in LaTeX and submitted via Canvas by 11:59pm EST on
Tuesday the problem set is due. Problem sets can (and probably should) be done in teams of up to three students. However, (1) the students should write their solutions completely independently (in particular, you should understand and be able to explain everything that is written in your solution); (2) you should include the name of your collaborators in your solutions.
Update: Due to the recent changes in the course to account for the COVID-19 situation, we are going to only have two problem sets. Instead, each lecture will also contain
a single practice problem on the topics of the lecture.
You do not need to turn in a solution for these practice problems.
Project: There is a final project that will consist of exploring a topic of interest related to this course. This particularly involves reading one or two recent research papers in complete details to get a sense of the background on a research problem and then exploring ideas for addressing this problem. More details on the project will be released later in the semester
in the
Project section.
Scribe notes and participation:
For each lecture, there will be one team (of one or two students) in charge of taking detailed notes, typing them in LaTeX, preparing any needed figures, and sending them to the Instructor
by 11:59pm on Friday after the lecture to be posted on the course website. When preparing scribe notes, please use this LaTeX
template -- just edit it to
include your notes. Further instruction on scribing the notes is available in the template.
The
course syllabus has further information about each of these assignments.
The schedule below the
red line is
tentative and subject to change.
# |
Date |
Topics |
References |
Lecture notes and Remarks |
1 |
Tue 01/21 |
Introduction, Course Policy, Probabilistic Analysis |
-- |
Lecture Notes 1 |
2 |
Tue 01/28 |
Sublinear Time Algorithms: Connected Components, Average Degree |
CRT05, F06, GR08, S15 |
Lecture Notes 2 |
3 |
Tue 02/04 |
Query Complexity: OR Function and Connectivity |
BW02 |
Lecture Notes 3 -- Pset 1 release: [PDF] |
4 |
Tue 02/11 |
Property Testing: Testing Sortedness, Uniform Distribution Testing |
EKKRV98, BFRSW00 |
Lecture Notes 4 |
5 |
Tue 02/18 |
Local Computation Algorithms (LCA): Maximal Independent Set |
PR07, RTVX11 |
Lecture Notes 5 |
6 |
Tue 02/25 |
Compressed Sensing and Sparse Recovery |
BHRRS18, RSW18 |
Lecture Notes 6 -- Pset 1 due
|
7 |
Tue 03/03 |
Streaming Algorithms: Frequency Moments Estimation |
AMS96, BJKST02 |
Lecture Notes 7 --
Pset 2 release: [PDF] |
8 |
Tue 03/10 |
Communication Complexity: Equality, Index |
A96, T16 |
Lecture Notes 8
|
|
Tue 03/17 |
No Class: Spring Recess |
|
|
9 |
Tue 03/24 |
Streaming Algorithms: Regression via Dimensionality Reduction |
CW09 |
Lecture Notes 9 --
Pset 2 due
|
10 |
Tue 03/31 |
Streaming Algorithms: Clustering via Coresets |
GMMMO03, G09 |
Lecture Notes 10,
[Practice Problem] --
Pset 2 due |
11 |
Tue 04/07 |
Graph Streaming Algorithms: Connectivity, Shortest Paths, Coloring |
FKMSZ04, ACK19 |
[Practice Problem]
|
12 |
Tue 04/14 |
Detour: The Multiplicative Weight Update Method (MWU) |
AHK12 |
|
13 |
Tue 04/21 |
Multi-Pass Streaming Algorithms for Bipartite Matching via MWU |
AG11 |
|
14 |
Tue 04/28 |
Graph Sketching: AGM Sketch for Connectivity |
AGM12 |
|
The project can take one of the following forms:
- Solve an open theory problem, formulate a new problem, or make some other contribution to the study of sublinear algorithms and/or lower bounds.
- Write a survey on a few related papers. A good approach is to first try to solve an open problem, which generally requires reading several background papers first, then switching to a survey
if the problem evades solution. However, even in this case, recording any partial progress is very important.
A list of project ideas (including open theory problems and some directions to explore) will be posted sometime in March. However, you are strongly encouraged to approach the Instructor with any project idea you have on sublinear algorithms before this date to pick as your own project -- note that your project does not need to be limited to the topics discussed in class as long as it is loosly related to sublinear algorithms.
Project policies:
- The projects are done in teams of three students. Each team needs to submit a single write-up for the project and the presentation time will be split evenly between the team members.
The team members receive the same grade for their project. You are encouraged to discuss your progress on your project with the Instructor and get feedback.
- The write-up for the project is due by the end of the semester
and the presentation is done in the last lecture.
Updated Project Expectations and Requirements
- You should learn a significant amount about your chosen project topic. This will involve closely and carefully reading literature on your specific project topic (likely to be a paper or two).
You will demonstrate this aspect of your project in the "Background" section of your project report, which should be a clear exposition of the topic in your own words. You can format this according to the scribe notes of the class -- basically, for this portion of the project you should turn in a polished and high-quality set of scribe notes, as if you had been the scribe for a lecture on your chosen topic.
- You should gain research experience in this area; i.e. make a serious effort to contribute to the state of knowledge on your project topic by (i) identifying an interesting open question or direction for future research related to your project topic; (ii) coming up with an approach to make progress; and (iii) working to carry out your approach. You will demonstrate this aspect of your project by explaining in detail what you did for (i), (ii) and (iii) in the rest of your project report
and during your project presentation.
- New: Instead of the problem set three and the project presentation, you are now required to provide weekly reports on your project following the
updated timetable below. Exactly one report per group is required and it should be emailed to the Instructor by 11:59PM on the date specified in the timetable.
Updated Timetable for Projects:
- Tuesday, March 17: For those of you who are planning to work on the suggested project ideas, send me an email listing your top three favorite projects.
- Tuesday, March 24: For those of you who have sent their favorite projects from the list of suggested project ideas, I will send out an assignment of the project.
For the remainder of you, turn in a brief (1-2 paragraph) project proposal to me by email, describing your chosen topic, the sources you will use, and the portions of those sources that you will cover.
Update: The following parts are updated to account for the new changes in the course structure.
- Tuesday, April 7: A one to two page document on the following: (1) The formal description of the problem you are going to work on during your project; (2) a quick literature review including the main results known for your problem and the main paper(s) in the literature you are going to read for this project; (3) a summary of concrete goals you would like to pursue during the project.
- Tuesday, April 14: A five to ten page document on the following: A technical summary of the main prior work in the literature, including (1) the formal statement of their result, (2) a high level overview of their proof idea, (3) a lower level description of their proof including the main technical lemmas and claims, and (4) either complete proofs or proof sketches of these technical parts in your own words -- this document should be at a level that one could get the full picture of the result you are describing solely based on your write-up.
- Tuesday, April 21: A two to ten page document on the following: (1) A high level description of a concrete plan for approaching your problem and the type of result you hope to obtain, (2) a lower level set of lemmas and claims that if proven would imply your desired result.
-
Tuesday, April 28: Presentations in the class.
- Tuesday, April 28: A two to ten page document on the following: (1) The proofs of the technnical lemmas and claims from the previous part, or (2) concrete and technical reasons for why your attempts failed to prove the desired technical results and any possible update that you came along the way that may allow you to bypass these barriers. This document should be at a level that one (including yourself in near future!) could pickup the project from this part and continue making progress on the main problem.
- Wednesday, May 6: Email the instructor your final project report by combining the reports above and make any final changes (including any new results since the last report). Your report would be held to the standard of a formal publication.
There is no official textbook for this course and all required materials will be posted on this webpage.
The following is a list of some helpful supplementary materials (this list is by no means comprehensive):
- Background on Randomized Algorithms:
- Randomized Algorithms by Motwani and Raghavan;
- The Probabilistic Method by Alon and Spencer;
- Concentration of Measure for the Analysis of Randomised Algorithms by Dubhashi and Panconesi.
- Useful Books and Surveys:
- Related Courses:
And last but not the least, you should definitely check the
List of Open Problems in Sublinear Algorithms as one of the best places to get recent pointers on sublinear algorithms.
This is a (rather incomprehensive) list of the papers related to the topics discussed in the lectures. The list will be updated after each lecture to add the new relevant papers.
A96 |
Farid M. Ablayev,
Lower Bounds for One-Way Probabilistic Communication Complexity and Their Application to Space Complexity. Theor. Comput. Sci. 1996, ICALP 1993. |
AG11 |
Kook Jin Ahn, Sudipto Guha,
Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem. ICALP 2011. |
AGM12 |
Kook Jin Ahn, Sudipto Guha, Andrew McGregor,
Analyzing Graph Structure via Linear Measurements. SODA 2012. |
AMS96 |
Noga Alon, Yossi Matias, and Mario Szegedy,
The space complexity of approximating the frequency moments. STOC 1996. |
AHK12 |
Sanjeev Arora, Elad Hazan, Satyen Kale,
The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing 2012. |
ACK19 |
Sepehr Assadi, Yu Chen, Sanjeev Khanna,
Sublinear Algorithms for (Δ+1) Vertex Coloring. SODA 2019. |
BFRSW00 |
Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, Patrick White,
Testing that distributions are close. FOCS 2000. |
BJKST02 |
Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, Luca Trevisan,
Counting Distinct Elements in a Data Stream.
RANDOM 2002. |
BHRRS18 |
Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, Makrand Sinha,
Edge Estimation with Independent Set Oracles.
ITCS 2018. |
BW02 |
Harry Buhrman, Ronald de Wolf, Complexity Measures and Decision Tree Complexity: A Survey.
Theor. Comput. Sci., 2002. |
CW09 |
Kenneth L. Clarkson, David P. Woodruff,
Numerical Linear Algebra in the Streaming Model. STOC 2009. |
CRT05 |
Bernard Chazelle, Ronitt Rubinfeld, Luca Trevisan,
Approximating the Minimum Spanning Tree Weight in Sublinear Time. SIAM Journal of Computing 2005, ICALP 2001. |
EKKRV98 |
Funda Ergün, Sampath Kannan, Ravi Kumar, Ronitt Rubinfeld, Mahesh Viswanathan,
Spot-Checkers. STOC 1998. |
F06 |
Uriel Feige, On Sums of Independent Random Variables with Unbounded Variance and Estimating the Average Degree in a Graph. SIAM Journal of Computing 2006, STOC 2004. |
FKMSZ04 |
Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, Jian Zhang On Graph Problems in a Semi-streaming Model. ICALP 2004. |
G09 |
Sudipto Guha, Tight Results for Clustering and Summarizing Data Streams. ICDT 2009. |
GMMMO03 |
Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan, Clustering Data Streams: Theory and Practice. IEEE Trans. Knowl. Data Eng. 2003, FOCS 2000. |
GR08 |
Oded Goldreich, Dana Ron, Approximating average parameters of graphs. Random Structures and Algorithms 2006, APPROX-RANDOM 2006. |
PR07 |
Michal Parnas, Dana Ron, Approximating the Minimum Vertex Cover in Sublinear Time and a Connection
to Distributed Algorithms. Theor. Comput. Sci., 2007. |
RSW18 |
Aviad Rubinstein, Tselil Schramm, S. Matthew Weinberg, Computing Exact Minimum Cuts Without Knowing the Graph. ITCS 2018. |
RTVX11 |
Ronitt Rubinfeld, Gil Tamir, Shai Vardi, Ning Xie, Fast Local Computation Algorithms. I(T)CS 2011. |
S15 |
C. Seshadhri, A simpler sublinear algorithm for approximating the triangle count. available on arXiv. |
T16 |
Tim Roughgarden, Communication Complexity (for Algorithm Designers). Foundations and Trends in Theoretical Computer Science 2016. |
You can download LaTeX for free
here. For the purpose of this course, you do not even need to install LaTeX and
can instead use an online LaTeX editor such as
Overleaf.
Two great introductory resources for LaTeX are
A Short Introduction to LaTeX by
Allin Cottrell
(for general purpose LaTeX)
and
LaTeX for Undergraduates by
Jim Hefferson (for undergraduates mathematics)
accompanied by the following
cheatsheet (note that this document use "\( MATH \)" notation compared to the perhaps more
widely used "$ MATH $" -- both are completely fine in LaTeX).
You can also use this wonderful tool
Detexify by
Daniel Kirsch for finding the
LaTeX commands of a symbol (just draw the symbol!).
If you are interested in learning more about LaTeX (beyond what is needed for this course), check the
Wikibook on LaTeX and
the
Wikibook on LaTeX for Mathematics.