I am interested in applied Machine Learning and Artificial Intelligence. Specifically:
- Improving how data is represented internally by models.
Understanding the data-to-signal transformation as data enters the representation space (what we gain/lose).
Improving the types of operations we can perform on representations.
- Improving inference (across modalities). This spans from pretraining and finetuning, to zero/few-shot learning and prompt-based learning.
How is the inference capacity of a model affected by variations in data, task, tuning, prompting?
What is the role of context, e.g. RAG, in inference?
- Evaluation at large: benchmarks, evaluation metrics and practices.
How do we grapple with data leakage, overfitting, subpar metrics, or ground truth challenges?
How do we reduce the gap between benchmarking distributions and real-life "in the wild" distributions?
How do we reduce bias and incorporate human ethics & morality in evaluation at scale?
- Efforts to better interpret model representations and output (explainability).
How to generate and measure explanations in reliable ways?
How to feed explanations back into inference?
How to communicate explanations to different audiences?
- Responsible AI and ML, in terms of regulation and codes of practice. How do we reduce misalignment of model inference & output with respect to human instruction?
How do we encode human values in representation and inference? How to communicate risks and safeguards to stakeholders and the public at large?
- Sustainable AI and ML, in terms of resource efficiency and altenative forms of energy.
Efficiency of models, data structures and hardware at scale. Computation on edge devices. Hybrid forms of energy and compute configurations. Reduction of architectural redundancy and brute force GPU scaling.
Measurement of environmental sustainability.
- Practical applications of the above, for instance to Natural Language Processing, Recommendation, Information Retrieval and Neurophysiological Multimodal Learning.
I am also a strong proponent of open source and reproducible research.