Next: , Previous: Motion and Syntax, Up: Syntax Tables


35.6 Parsing Balanced Expressions

Here are several functions for parsing and scanning balanced expressions, also known as sexps. Basically, a sexp is either a balanced parenthetical grouping, or a symbol name (a sequence of characters whose syntax is either word constituent or symbol constituent). However, characters whose syntax is expression prefix are treated as part of the sexp if they appear next to it.

The syntax table controls the interpretation of characters, so these functions can be used for Lisp expressions when in Lisp mode and for C expressions when in C mode. See List Motion, for convenient higher-level functions for moving over balanced expressions.

A syntax table only describes how each character changes the state of the parser, rather than describing the state itself. For example, a string delimiter character toggles the parser state between “in-string” and “in-code” but the characters inside the string do not have any particular syntax to identify them as such. For example (note that 15 is the syntax code for generic string delimiters),

     (put-text-property 1 9 'syntax-table '(15 . nil))

does not tell Emacs that the first eight chars of the current buffer are a string, but rather that they are all string delimiters. As a result, Emacs treats them as four consecutive empty string constants.

Every time you use the parser, you specify it a starting state as well as a starting position. If you omit the starting state, the default is “top level in parenthesis structure,” as it would be at the beginning of a function definition. (This is the case for forward-sexp, which blindly assumes that the starting point is in such a state.)

— Function: parse-partial-sexp start limit &optional target-depth stop-before state stop-comment

This function parses a sexp in the current buffer starting at start, not scanning past limit. It stops at position limit or when certain criteria described below are met, and sets point to the location where parsing stops. It returns a value describing the status of the parse at the point where it stops.

If state is nil, start is assumed to be at the top level of parenthesis structure, such as the beginning of a function definition. Alternatively, you might wish to resume parsing in the middle of the structure. To do this, you must provide a state argument that describes the initial status of parsing.

If the third argument target-depth is non-nil, parsing stops if the depth in parentheses becomes equal to target-depth. The depth starts at 0, or at whatever is given in state.

If the fourth argument stop-before is non-nil, parsing stops when it comes to any character that starts a sexp. If stop-comment is non-nil, parsing stops when it comes to the start of a comment. If stop-comment is the symbol syntax-table, parsing stops after the start of a comment or a string, or the end of a comment or a string, whichever comes first.

The fifth argument state is a ten-element list of the same form as the value of this function, described below. The return value of one call may be used to initialize the state of the parse on another call to parse-partial-sexp.

The result is a list of ten elements describing the final state of the parse:

  1. The depth in parentheses, counting from 0. Warning: this can be negative if there are more close parens than open parens between the start of the defun and point.
  2. The character position of the start of the innermost parenthetical grouping containing the stopping point; nil if none.
  3. The character position of the start of the last complete subexpression terminated; nil if none.
  4. Non-nil if inside a string. More precisely, this is the character that will terminate the string, or t if a generic string delimiter character should terminate it.
  5. t if inside a comment (of either style), or the comment nesting level if inside a kind of comment that can be nested.
  6. t if point is just after a quote character.
  7. The minimum parenthesis depth encountered during this scan.
  8. What kind of comment is active: nil for a comment of style “a” or when not inside a comment, t for a comment of style “b,” and syntax-table for a comment that should be ended by a generic comment delimiter character.
  9. The string or comment start position. While inside a comment, this is the position where the comment began; while inside a string, this is the position where the string began. When outside of strings and comments, this element is nil.
  10. Internal data for continuing the parsing. The meaning of this data is subject to change; it is used if you pass this list as the state argument to another call.

Elements 1, 2, and 6 are ignored in the argument state. Element 8 is used only to set the corresponding element of the return value, in certain simple cases. Element 9 is used only to set element 1 of the return value, in trivial cases where parsing starts and stops within the same pair of parentheses.

This function is most often used to compute indentation for languages that have nested parentheses.

— Function: syntax-ppss &optional pos

This function returns the state that the parser would have at position pos, if it were started with a default start state at the beginning of the buffer. Thus, it is equivalent to (parse-partial-sexp (point-min) pos), except that syntax-ppss uses a cache to speed up the computation. Also, the 2nd value (previous complete subexpression) and 6th value (minimum parenthesis depth) of the returned state are not meaningful.

— Function: syntax-ppss-flush-cache beg

This function flushes the cache used by syntax-ppss, starting at position beg.

When syntax-ppss is called, it automatically hooks itself to before-change-functions to keep its cache consistent. But this can fail if syntax-ppss is called while before-change-functions is temporarily let-bound, or if the buffer is modified without obeying the hook, such as when using inhibit-modification-hooks. For this reason, it is sometimes necessary to flush the cache manually.

— Variable: syntax-begin-function

If this is non-nil, it should be a function that moves to an earlier buffer position where the parser state is equivalent to nil—in other words, a position outside of any comment, string, or parenthesis. syntax-ppss uses it to supplement its cache.

— Function: scan-lists from count depth

This function scans forward count balanced parenthetical groupings from position from. It returns the position where the scan stops. If count is negative, the scan moves backwards.

If depth is nonzero, parenthesis depth counting begins from that value. The only candidates for stopping are places where the depth in parentheses becomes zero; scan-lists counts count such places and then stops. Thus, a positive value for depth means go out depth levels of parenthesis.

Scanning ignores comments if parse-sexp-ignore-comments is non-nil.

If the scan reaches the beginning or end of the buffer (or its accessible portion), and the depth is not zero, an error is signaled. If the depth is zero but the count is not used up, nil is returned.

— Function: scan-sexps from count

This function scans forward count sexps from position from. It returns the position where the scan stops. If count is negative, the scan moves backwards.

Scanning ignores comments if parse-sexp-ignore-comments is non-nil.

If the scan reaches the beginning or end of (the accessible part of) the buffer while in the middle of a parenthetical grouping, an error is signaled. If it reaches the beginning or end between groupings but before count is used up, nil is returned.

— Variable: multibyte-syntax-as-symbol

If this variable is non-nil, scan-sexps treats all non-ASCII characters as symbol constituents regardless of what the syntax table says about them. (However, text properties can still override the syntax.)

— User Option: parse-sexp-ignore-comments

If the value is non-nil, then comments are treated as whitespace by the functions in this section and by forward-sexp.

The behavior of parse-partial-sexp is also affected by parse-sexp-lookup-properties (see Syntax Properties).

You can use forward-comment to move forward or backward over one comment or several comments.

— Function: forward-comment count

This function moves point forward across count complete comments (that is, including the starting delimiter and the terminating delimiter if any), plus any whitespace encountered on the way. It moves backward if count is negative. If it encounters anything other than a comment or whitespace, it stops, leaving point at the place where it stopped. This includes (for instance) finding the end of a comment when moving forward and expecting the beginning of one. The function also stops immediately after moving over the specified number of complete comments. If count comments are found as expected, with nothing except whitespace between them, it returns t; otherwise it returns nil.

This function cannot tell whether the “comments” it traverses are embedded within a string. If they look like comments, it treats them as comments.

To move forward over all comments and whitespace following point, use (forward-comment (buffer-size)). (buffer-size) is a good argument to use, because the number of comments in the buffer cannot exceed that many.


Xah Signet