Posts by Collection

publications

Toward Collaboration Sensing: Applying Network Analysis Techniques to Collaborative Eye-tracking Data

Published in International Conference on Learning Analytics and Knowledge, 2013

We can predict if a learner is above or below the median, as measured by an assessment test to follow a diagram-studying session, using their eye-gaze.

Recommended citation: Schneider, B., Abu-El-Haija, S., Reesman, J., Pea, R. (2013). "Toward Collaboration Sensing: Applying Network Analysis Techniques to Collaborative Eye-tracking Data." International Conference on Learning Analytics and Knowledge. 2013. https://dl.acm.org/doi/10.1145/2460296.2460317

YouTube-8M: A Large-Scale Video Classification Benchmark

Published in ArXiv, 2016

Dataset for video classification where videos. Each video is encoded as frame features, extracted using pre-trained image and audio networks.

Recommended citation: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S. (2016). "YouTube-8M: A Large-Scale Video Classification Benchmark." ArXiv. 2016. https://arxiv.org/abs/1609.08675

Detecting events and key actors in multi-person videos

Published in Computer Vision and Pattern Recognition, 2016

Bi-LSTM with attention to detect major events in videos of basketball games.

Recommended citation: Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L. (2016). "Detecting events and key actors in multi-person videos." Computer Vision and Pattern Recognition. 2016. https://arxiv.org/abs/1511.02917

Proportionate gradient updates with PercentDelta

Published in ArXiv, 2017

Slight modification of SGD – divides loss gradient wrt layer parameters over norm of parameter matrix.

Recommended citation: Abu-El-Haija, S. (2017). "Proportionate gradient updates with PercentDelta." ArXiv. 2017. https://arxiv.org/abs/1708.07227

Learning Edge Representations via Low-Rank Asymmetric Projections

Published in ACM Conference on Information and Knowledge Management, 2017

Trains DeepWalk-style embeddings jointly with an edge neural network.

Recommended citation: Abu-El-Haija, S., Perozzi, B., Al-Rfou, R. (2017). "Learning Edge Representations via Low-Rank Asymmetric Projections." ACM Conference on Information and Knowledge Management. 2017. https://arxiv.org/abs/1705.05615

Collaborative deep metric learning for video understanding

Published in SIGKDD Knowledge Discovery and Data Mining, 2018

Learns a neural network that can map a video, from its audio-visual content, onto a metric space that is useful for a number of tasks in video understanding, including classification and recommendation.

Recommended citation: Joonseok Lee, Sami Abu-El-Haija, Balakrishnan Varadarajan, and Paul Natsev (2018). "Collaborative deep metric learning for video understanding." SIGKDD Knowledge Discovery and Data Mining. 2018. http://www.joonseok.net/papers/cdml.pdf

A Higher-Order Graph Convolutional Layer

Published in NeurIPS 2018 Workshop, 2018

Extends GCN layer: in addition to utilizing features of immediate neighbors, also include information from further neighbors.

Recommended citation: Abu-El-Haija, S., Alipourfard, N., Harutyunyan, H., Kapoor, A., Perozzi, B. (2018). "A Higher-Order Graph Convolutional Layer." NeurIPS 2018 Workshop. 2018. http://sami.haija.org/papers/high-order-gc-layer.pdf

Watch Your Step: Learning Node Embeddings via Graph Attention

Published in Advances in Neural Information Processing Systems, 2018

Combines the two-step node embedding process of DeepWalk, consisting of random walk simulation the word-embedding learning, into one step, that allows us to push gradients for updating the context distribution that corresponds to the probability mass that each node assigns to its neighbors utilized during the walk sampling.

Recommended citation: Abu-El-Haija, S., Perozzi, B., Al-Rfou, R., Alemi, A. A. (2018). "Watch Your Step: Learning Node Embeddings via Graph Attention"Advances in Neural Information Processing Systems. 2018. https://papers.nips.cc/paper/2018/hash/8a94ecfa54dcb88a2fa993bfa6388f9e-Abstract.html

N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification

Published in Uncertainty in Artificial Intelligence, 2019

Runs various GNNs in parallel, each on the normalized adjacency raised to different power. Then, combines the output of GNNs into a final node-classification layer.

Recommended citation: Abu-El-Haija, S., Kapoor, A., Perozzi, B., Lee, J., (2019). "N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification." Uncertainty in Artificial Intelligence. 2019. http://auai.org/uai2019/proceedings/papers/310.pdf

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

Published in International Conference on Machine Learning, 2019

Extends GCN layer: in addition to utilizing features of immediate neighbors, also include information from further neighbors. Provably learns a class of functions that are not realizable by vanilla GCN.

Recommended citation: Abu-El-Haija, S., Perozzi, B., Kapoor, A., Harutyunyan, H., Alipourfard, N., Lerman, K., Ver Steeg, G., Galstyan, A. (2019). "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing." International Conference on Machine Learning. 2019. http://proceedings.mlr.press/v97/abu-el-haija19a/abu-el-haija19a.pdf

Human Languages in Source Code: Auto-Translation for Localized Instruction

Published in Learning at Scale, 2020

Translate source-code from one human language to another.

Recommended citation: Piech, C., Abu-El-Haija, S. (2020). "Human Languages in Source Code: Auto-Translation for Localized Instruction." Learning at Scale. 2020. https://dl.acm.org/doi/10.1145/3386527.3405916

Graph embedding with personalized context distribution

Published in Companion Proceedings of the Web Conference, 2020

Learns context distribution per node, while learning node embeddings.

Recommended citation: Huang, D., He, Z., Huang, Y. and Sun, K., Abu-El-Haija, S., Perozzi, B., Lerman, K., Morstatter, F., Galstyan, A. (2020). "Graph embedding with personalized context distribution." Companion Proceedings of the Web Conference. 2020. https://dl.acm.org/doi/fullHtml/10.1145/3366424.3391263

End-to-end learning of compressible features

Published in IEEE International Conference on Image Processing, 2020

Encodes videos in a compact binary representation that preserves discriminative label information.

Recommended citation: Singh, S., Abu-El-Haija, S., Johnston, N., Balle, J., Shrivastava, A., Toderici, G. (2020). "End-to-end learning of compressible features." IEEE International Conference on Image Processing. 2020. https://arxiv.org/abs/2007.11797

Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Published in ArXiv, 2020

Unifies many models for machine learning on graphs under one taxonomy, providing a broad summary of graph embedding methods and a tool to reason about their similarities and differences.

Recommended citation: Chami, I., Abu-El-Haija, S., Perozzi, B., Re, C., Murphy, K. (2020). "Machine Learning on Graphs: A Model and Comprehensive Taxonomy." ArXiv. 2020. https://arxiv.org/abs/2005.03675

Identifying and Analyzing Cryptocurrency Manipulations in Social Media

Published in IEEE Transactions on Computational Social Systems, 2021

Mines and analyzes financial timeseries and social network data (Twitter and Telegram) to predict if a spike in price of a cryptocurrency is due to pump-and-dump scheme.

Recommended citation: Mirtaheri, M., Abu-El-Haija, S., Morstatter, F., Ver Steeg, G., Galstyan, A. (2021). "Identifying and Analyzing Cryptocurrency Manipulations in Social Media." IEEE Transactions on Computational Social Systems. 2021. https://arxiv.org/abs/1902.03110

Identifying botnet IP address clusters using natural language processing techniques on honeypot command logs

Published in SIAM Workshop on Data Mining for AI/ML for Cybersecurity 2021, 2021

Clusters Honeypot sessions, where SSH sessions with similar logic should be in the same cluster.

Recommended citation: Crespi, V., Hardaker, W., Abu-El-Haija, S., Galstyan, A. (2021). "Identifying botnet IP address clusters using natural language processing techniques on honeypot command logs." SIAM Workshop on Data Mining for AI/ML for Cybersecurity. 2021. https://arxiv.org/abs/2104.10232

Fast Graph Learning with Unique Optimal Solutions

Published in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021

Fast Graph Learning with Unique Optimal Solutions.

Recommended citation: Abu-El-Haija, S., Crespi, V., Ver Steeg, G., Galstyan, A., (2021). "Fast Graph Learning with Unique Optimal Solutions." ICLR 2021 Workshop on Geometrical and Topological Representation Learnings. 2021. https://openreview.net/forum?id=YIloSPZFeGe

Zero-shot Synthesis with Group-Supervised Learning

Published in International Conference on Learning Representations, 2021

Can synthesize simple images with novel attribute combinations. E.g. having seen “red trucks” and “blue boats” our method could synthesize “blue truck” (or “red boat”) even if these combinations are not presented during training. The latent space is disentangled among attributes by designing an auto-encoder that “swaps” latent subspaces during training.

Recommended citation: Ge, Y., Abu-El-Haija, S., Xin, G., Itti, L.,. "Zero-shot Synthesis with Group-Supervised Learning." International Conference on Learning Representations. 2021. https://openreview.net/forum?id=8wqCDnBmnrT

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

Published in International Conference on Learning Representations, 2021

Meta-algorithm that can be used to re-implement a variety of machine learning algorithms on graphs. Once re-implemented in GTTF, algorithms automatically scale to large graphs. The meta-algorithm accepts two functions (BiasFn and AccumulateFn) and it repeatedly samples walk forests from graph, invoking BiasFn and AccumulateFn along the walks. Certain choices of these two functions will recover unbiased learning for a variety of machine learning algorithms on graphs, including many message passing (graph convolution) methods as well as node embedding methods.

Recommended citation: Markowitz, E. S., Balasubramanian, K., Mirtaheri, M., Abu-El-Haija, S., Perozzi, B., Ver Steeg, G., Galstyan, A. (2021). "Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning." International Conference on Learning Representations. 2021. https://openreview.net/forum?id=6DOZ8XNNfGN

Implicit SVD for Graph Representation Learning

Published in Advances in Neural Information Processing Systems, 2021

Find solutions in closed-form to linearized GNN models then use the solution to initialize and fine-tune deeper GNNs.

Recommended citation: Abu-El-Haija, S., Mostafa, H., Nassar, M., Crespi, V., Ver Steeg, G., Galstyan, A. (2021). "Implicit SVD for Graph Representation Learning." Advances in Neural Information Processing Systems. 2021. http://sami.haija.org/papers/isvd.pdf

teaching

Fall 2009: Practical Programming @ UofT

Miscellaneous, University of Toronto, 2009

Designed and taught an supplement course on practical programming, where we covered a number of applied skills, such as, databases and access them through code, web programming, network programming, and standard template library of C++. Course page

Fall 2013: Intro to Machine Learning @ UMich

Undergraduate, University of Michigan, 2013

Taught the tutorial sessions for undergraduate course in Machine Learning (main instructor was Honglak Lee). I covered mathematical foundations required for the course, created the assignments for the course, co-created exams, and supplement handouts.

Spring 2019: [CSCI 544] Applied Natural Language Processing @ USC

Master, University of Southern California, 2019

Served as a co-instructor while TA-ing CS544, where I taught 2-hour lectures per week for 5 weeks, on Deep Learning (DL) for NLP. Slides are on 1, 2, 3 and 4. Created assignment 1 and assignment 2 with auto-grader code. The ranking of assignment 2 is online.

Fall 2019: [CSCI 699] Representation Learning @ USC

Doctorate, University of Southern California, 2019

Co-designed and co-taught a broad course on Machine Learning. Course webpage