Recommendations for practical introductions to PEG parsing?

15

Recommendations for practical introductions to PEG parsing? ☶ ask education programming
authored by telemachus 11 months ago (hidden by 1 user) | 23 comments

I’m hoping that people can point me to practical introductions to PEG parsing. Article length (i.e., 25-50 pages) would be ideal, but book suggestions are welcome too. This may be obvious, but by “practical” I mean that I’m interested in material with a lot of examples and usable tips rather than theoretical accounts. A good example of the kind of thing I’m interested in is Roberto Ierusalimschy’s Mastering LPeg (PDF link).

Brief background on why: I’m a self-taught, amateur programmer, and I’m interested in using LPeg (especially within neovim) for text manipulation and scanning. I don’t have the kind of math or CS background to deal with the theory, but I find LPeg appealing as an alternative to vim regexes or Lua patterns.

1. 9
  
  ianthehenry 11 months ago
  
  Hey! I know a little about this. There’s a programming language called Janet that is very heavily inspired by Lua, and it has a PEG library built-in that was very heavily inspired by LPeg. It uses a different notation, but the ideas all translate, and it uses a lot of the same names/operators, just with a prefix flair.
  
  I wrote a book with a chapter on PEGs that covers, well, kinda just the basics, but has an example of writing a bare-bones HTML parser.
  
  I did Advent of Code last year in Janet because almost every problem starts out with a little parsing puzzle, and I thought it would be a good way to practice writing a bunch of diverse PEGs. Recently I’ve been recording myself solving some of them again, mostly to highlight the PEG-parsing parts. Pretty low-effort videos, but it might be interesting to watch the beginnings of them to see a few examples of PEGs. None of them are very complicated – AoC inputs are pretty easy to parse – but there’s some variety.
  
  I’ve contributed a little bit to Janet’s PEG engine, and I think you might be delightfully surprised by how simple the execution model is – you’re basically just moving a cursor through a buffer and keeping track of a couple stacks. It kinda, I dunno, kinda demystified it a lot and made it easier to both write and debug PEGs.
  1. 2
    
    telemachus 11 months ago
    
    Thanks so much. I will take a look at your book—it also gives me a good excuse to learn at least a little about Julia.
2. 5
  
  kevinmehall 11 months ago
  
  Guido van Rossum wrote a series of blog posts about PEG before using it for the CPython parser.
  1. 5
    
    telemachus 11 months ago
    
    Thanks: if parsing Python isn’t practical, then I don’t know what is.
    
    For anyone who wants the posts but doesn’t feel like fighting with Medium:
    
    Overview
    
    Building a PEG Parser
    
    Generating a PEG Parser
    
    Visualizing PEG Parsing
    
    Left-recursive PEG Grammars
    
    Adding Actions to a PEG Grammar
    
    A Meta-Grammar for PEG Parsers
    
    Implementing PEG Features
    
    PEG at the Core Developer Sprint
3. 4
  
  rseymour 11 months ago
  
  One outside the box option is to play live with a parser, the pest parser for rust has a live page that lets you write grammars and see what they parse out. There’s a drop down to the right of the test input box that lets you select which rule is being ‘looked for’. I find reading about parsers is a lot like reading about running or baseball or chess, it can be very helpful, but the real learning comes ‘on the field’
  
  https://pest.rs/?g=N4Ig5gTghgtjURALhFANgBwBZQAQF5dhcByKEgOgpIC8TcAfUgQUuoC16BfAHQDsAJgEswQgC4EipAAxsSATm79%2BQgQFM%2BEwsQAU6bHibDRYgJQBqXFxAAaEEL4YArmOSoARgHYAzCC5A#editor
  1. 3
    
    ngp 11 months ago
    
    I’ll second Pest. It’s really a great library. I’ve used a number of times and had a pleasant experience every time!
  2. 2
    
    schneems edited 11 months ago
    
    Why do you like pest more than nom? (I’ve used nom but not pest).
    1
    
    rseymour edited 11 months ago
    
    To me the biggest issue with nom was the versions jumping. Using it on purpose once left the code un-update-able because one of the used methods disappeared in a later version. I think the jumping in major versions has also made the tertiary docs (ie blog posts, comments, etc) less useful because they may not refer to the latest version. So it’s not that nom is bad, it’s that nom is a bunch of parsers under one name. https://crates.io/crates/nom/versions
    
    I’ve only used pest for a personal project and it was pretty straightforward and fun. The docs make sense, the code just works, and it saved me a lot of pain doing more manual work.
    
    For serious serious stuff (read working at a major security vendor) I used untrusted https://github.com/briansmith/untrusted which is the best for parsing untrusted binary input.
  3. 2
    
    telemachus 11 months ago
    
    I find reading about parsers is a lot like reading about running or baseball or chess, it can be very helpful, but the real learning comes ‘on the field’
    
    Agreed. I use Lua’s REPL with LPeg to experiment as I read, but I will try out the pest web interface as well.
    2
    
    rseymour 11 months ago
    
    Pest has the slight advantage of being less characters to define a grammar, but it’s not exactly the original PEG characters, I think | instead of / which is fine by me. Given that you are targeting neovim + Lua (w/ ~25 years of vi/vim experience I still find neovim to be … a bit of a hassle to get compiled and running) your method of using lpeg in the lua repl is probably ideal. I’d only try the pest fiddle if you wanted to quickly figure out why something wasn’t working.
    
    1
    
    telemachus 11 months ago
    
    Thanks for the follow-up. I’ll keep the fiddle in mind for debugging.
    
    Re building neovim, I’ve had good lucky building nightly for a few years now with this (relatively minimal) script. Every now and again, I need to kickstart ninja by running make distclean in neovim’s source directory, but otherwise this works well for me on both macOS and Debian. (If you try it, remove the nbundleup section. That invokes Paq with a local script to update plugins.) Then again, after 25 years, you may be perfectly happy with just vim. (I switched primarily because I like Lua a lot and Vimscript not at all.)
4. 2
  
  schneems 11 months ago
  
  Not sure if helpful or not but I wrote this article on using a PEG parser library to implement a hash-like (dictionary like) https://www.schneems.com/2018/04/30/how-to-implement-a-ruby-hash-like-grammar-in-parslet/
  1. 1
    
    telemachus 11 months ago
    
    https://www.schneems.com/2018/04/30/how-to-implement-a-ruby-hash-like-grammar-in-parslet/
    
    Thanks: I will take a look.
5. 2
  catwell 11 months ago
  For LPEG specifically, I think Roberto’s new tutorial that you linked is the best, but here are a few older resources:
  
  Dirk Laurie’s Brief Introduction to LPEG
  
  An introduction to Parsing Expression Grammars with LPeg
  
  Daurnimator’s patterns
  1. 3
    
    spc476 11 months ago
    
    I don’t have an introduction, but I do have a repo of many LPEG based parsers that you can look through.
  2. 1
    
    telemachus edited 11 months ago
    
    Agreed about Roberto’s tutorial being very good. Thanks for the other tips. Those names remind me of the Lua mailing list. (Speaking of which: long time, no talk. I hope you are good.)
    1
    
    catwell 11 months ago
    
    Yes! I had not noticed it was you when I replied :)
6. 1
  
  safinaskar 11 months ago
  
  Please, don’t. Use context free grammars (CFG) instead. I wrote a long answer to you. But decided to publish it as a blog post: https://lobste.rs/s/nybhsl/this_is_why_you_should_never_use_parser . (The post also lists situations, when PEG may be a good thing.)
7. 1
  
  manuel 11 months ago
  
  jsparse is a simple implementation that’s useful to learn about PEGs. Two posts about it:
  
  https://bluishcoder.co.nz/2007/10/31/javascript-packrat-parser.html https://bluishcoder.co.nz/2007/10/03/javascript-parser-combinators.html
  1. 1
    
    telemachus 11 months ago
    
    Thanks for the tips. It will be fun to look the same basic concepts in libraries written in different languages.
8. 1
  
  snej 11 months ago
  
  Do you want to learn how to use (write a grammar using) a PEG parser generator, or do you want to learn how PEG parsing works? Those are different things :) I used a PEG tool to create a parser for a SQL dialect, but I only vaguely know the theory behind PEG.
  
  Mostly I learned by reading the tool’s docs, then examining several existing grammar files, then a bunch of trial and error…
  1. 3
    
    fanf 11 months ago
    
    There are actually (at least) two answers to how PEG parsing works!
    
    PEGs were invented by Bryan Ford who also invented packrat parsing, so the first PEG parsers were packrat parsers. But packrat parsers are (as their name suggests) memory-hungry.
    
    LPeg works more like a backtracking regex engine, so it’s more frugal with memory but can’t always parse in linear time. However, the recognizer described by a PEG has less nondeterminism than a regex, so a PEG parser is less likely to get into an exponential backtracking search than a regex engine. Another way of thinking about it is that the kind of recognizer described by a PEG is very close to a recursive descent parser, so as the author of a grammar you have a lot of control over how much backtracking can occur, more than in most other grammar formalisms.
  2. 1
    
    telemachus 11 months ago
    
    Do you want to learn how to use (write a grammar using) a PEG parser generator, or do you want to learn how PEG parsing works?
    
    I want to learn how to write grammars (and scanners) using a PEG generator like LPeg.