Papers

Peer-reviewed and workshop publications. Full list on Google Scholar.

Conference · 2026

Scaling Few-Shot Spoken Word Classification With Generative Meta-Continual Learning

Louise Beyers, Batsirayi Mupamhi Ziki, Ruan van der Merwe
Interspeech 2026

Asks whether a spoken word classifier can sequentially learn to distinguish 1000 classes from only five examples per class. We train a model with the Generative Meta-Continual Learning (GeMCL) algorithm and compare it against strong baselines for this large-scale, few-shot continual setting.

GeMCL delivers exceptionally stable performance and adapts roughly 2000× faster than a frozen HuBERT encoder with repeatedly retrained classifiers, though it does not always surpass fully fine-tuned baselines. The result points toward practical, rapidly-adaptable keyword systems that keep accumulating new classes without retraining from scratch.

arXiv

@inproceedings{beyers2026gemcl,
  title     = {Scaling Few-Shot Spoken Word Classification With
               Generative Meta-Continual Learning},
  author    = {Beyers, Louise and Ziki, Batsirayi Mupamhi and
               van der Merwe, Ruan},
  booktitle = {Interspeech},
  year      = {2026},
}

Preprint · 2026

Does Language Matter for Spoken Word Classification? A Multilingual Generative Meta-Learning Approach

Batsirayi Mupamhi Ziki, Louise Beyers, Ruan van der Merwe
arXiv preprint arXiv:2605.13084

Meta-learning outperforms supervised learning for few-shot monolingual spoken word classification, yet remains under-explored in the multilingual setting. We apply the Generative Meta-Continual Learning (GeMCL) algorithm to spoken word classification — its generative nature makes it viable in application, while the meta-learning component promotes the generalisation that multilingual use demands.

We train monolingual models on English, German, French, and Catalan, a bilingual model on English and German, and a multilingual model on all four languages. Although the multilingual model performs best, the differences between models are unexpectedly small. We also find that the hours of unique data seen during training is a stronger performance indicator than the number of languages in the training data.

arXiv

@article{ziki2026multilingual,
  title         = {Does Language Matter for Spoken Word Classification? A
                   Multilingual Generative Meta-Learning Approach},
  author        = {Ziki, Batsirayi Mupamhi and Beyers, Louise and
                   van der Merwe, Ruan},
  journal       = {arXiv preprint arXiv:2605.13084},
  year          = {2026},
  eprint        = {2605.13084},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
}

Blog Track · 2026

The Effect of Feature Resolution on Embedding Dimension

Louise Beyers, Ruan van der Merwe
ICLR 2026 Blog Track

Investigates how feature resolution — the constraint that each data point exposes only a subset of the available features — affects the embedding dimension needed to identify individual features. Using feature composition and poset theory, we derive upper bounds on the dimensionality required for reliable feature identification.

We show that even a slight uniform dependence between features can be exploited to reduce the required embedding dimension by at least a third while still keeping individual features identifiable. Published as a peer-reviewed post in the ICLR 2026 Blog Track, with an accompanying poster.

Read

Conference · 2023

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning

Ruan van der Merwe, Herman Kamper
Interspeech 2023, pp. 441–445

Proposes MAMLCon, an extension of model-agnostic meta-learning (MAML) for continual few-shot spoken word classification. Each inner learning loop concludes with a consolidation gradient step that uses stored templates (one per class) drawn from all previously seen classes, letting the model rehearse without a large replay buffer.

MAMLCon consistently outperforms OML (Online-aware Meta-Learning) across varying numbers of shots and final class counts on Google Speech Commands and the FACC dataset. It targets the practical setting of user-defined keyword systems where new words are added incrementally, showing that meta-learning can mitigate catastrophic forgetting in genuinely on-device regimes.

arXiv

@inproceedings{vandermerwe2023mamlcon,
  title     = {Mitigating Catastrophic Forgetting for Few-Shot Spoken Word
               Classification Through Meta-Learning},
  author    = {van der Merwe, Ruan and Kamper, Herman},
  booktitle = {Interspeech},
  pages     = {441--445},
  year      = {2023},
}

Workshop · 2022

Manifold Characteristics That Predict Downstream Task Performance

Ruan H. van der Merwe, Gregory Newman, Etienne Barnard
First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022

Introduces the Representation Manifold Quality Metric (RMQM), a principled geometric measure of learned representation quality that correlates positively with downstream task performance.

We characterise representation manifolds by tracking how data points move under sequentially larger perturbations — white-noise injection and PGD adversarial attacks — revealing that self-supervised methods learn smoother manifolds with larger but more consistent step sizes. RMQM gives a framework for comparing pretraining methods beyond the usual linear-probe accuracy, enabling more detailed structural analysis of embedding spaces.

arXiv

@inproceedings{vandermerwe2022rmqm,
  title     = {Manifold Characteristics That Predict Downstream Task Performance},
  author    = {van der Merwe, Ruan H. and Newman, Gregory and Barnard, Etienne},
  booktitle = {First Workshop on Pre-training: Perspectives, Pitfalls,
               and Paths Forward at ICML},
  year      = {2022},
}