LaTeX’ Unicode Set Minus Woes

ljrk

2022-12-21

You may want to try to use the unicode-math package from CTAN to provide Unicode input for math s.t., you can write instead of \sum. Also, perhaps more importantly, copying text from a PDF created using unicode-math will give you proper Unicode Math symbols, independent of whether you chose to write \sum or in the input.

However, if you try to use this neat package in a short sample document such as:

\documentclass{article}

\usepackage{unicode-math}

\begin{document}
\[ \mathbb{N}\setminus\{0\} \]
\end{document}

The output will be rendered as:

ℕ {0}

With a silent error that the requested character isn’t found in Latin Modern Math:

Missing character: There is no ⧵ in font [latinmodern-math.otf]/OT:script=math; language=dflt;!

This is to some parts down to history, and some to some perhaps “idealogical” choice on how fonts and encodings should behave.

The \setminus vs. \smallsetminus problem.

Historically, the \setminus command produced the backslash and \smallsetminus a differently sloped “smaller” slash 1:

\setminus is in plain.tex, Knuth’s original, and reuses the backslash (space considerations). When the smaller, differently sloped form was requested, it was included in the amsfonts under the name \smallsetminus in amssymb.

There are two different “idealogies” surrounding the correct approach w.r.t. fonts & Unicode:

  1. Whether the set minus symbol should be small or large is a font decision. This implies only having either “variant” in the symbol and the user makes their choice b/w both variants by choosing their font. This way, only \setminus would exist or make any meaningful sense. This is further highlighted by arguing that semantically, there can only be one set minus, and independent of how the actual glyph looks like, any set minus glyph should be encoded as one and only one symbol in the output. However, if a user wants to use both symbols they have to use two different fonts.
  2. The decision should be made at the user level, mandating that fonts encode both characters and allow the user to choose. This way, both commands exist, however switching fonts cannot be used for switching between both variants anymore.

While the GUST team providing Latin Modern Math and other LaTeX fonts seem to hold opinion (1), the Unicode consortium standardised (2). They chose the names “REVERSE SOLIDUS OPERATOR” for the “traditional” TeX style since it uses the reverse solidus, and the name “SET MINUS” for symbol that LaTeX users were used to calling smallsetminus. This is perhaps confusing to LaTeX users.

Worsening the situation, since the GUST team is of opinion (1), they did not only implement only one “variant”, they also mapped this to just one of the two provided code points, leaving the other one unmapped. As mentioned in (1), possibly because they see “SET MINUS” as a semantic description whereas “REVERSE SOLIDUS OPERATOR” is not. Arguably, independent of the visual result, any glyph representing a substraction of two sets should be encoded as “SET MINUS” only.

The result is that \setminus using unicode-math will lead to a lookup of the “REVERSE SOLIDUS OPERATOR” which doesn’t exist in LM Math (the default font).

UNICODE HTML TeX/amssymb LM (Math) unicode-math
U+2216 SET MINUS smallsetminus smallsetminus
U+29F5 REV SOLIDUS OP setminus setminus N/A
———————– ————— —————- ———– —————-
U+005C REV SOLIDUS bsol backslash \

Fix

Since the GUST team seems to not want to provide both symbols, the easiest approach for them would be to simply map the same code point to the same symbol. This would not break any existing code. It only would forestall any future plans to give “REVERSE SOLIDUS OPERATOR” a different symbol. As indeed, the “SET MINUS” symbol provided by GUST looks different to the “REVERSE SOLIDUS”, this may indeed pose an issue. Finally, the “missing” character could be seen as a warning to the user that their setup isn’t complete/correct yet and they need to make a choice how “SET MINUS” is supposed to look and entered into the source code. The more appropriate fix could be to build an actual “REVERSE SOLIDUS OPERATOR” based on the existing “REVERSE SOLIDUS” glyph.

Since this issue isn’t fixed at time of writing (Dec 2022), there are three solutions with slight differences to the problem.

In any case, we first provide the missing character when directly part of the source. In this case, we assume that indeed a REVERSE SOLIDUS OPERATOR is requested and we construct it using the REVERSE SOLIDUS and adding binary math operator spacing:

Now we have three choices.

  1. Route \setminus to this newly created glyph:

    \AtBeginDocument{\renewcommand{\setminus}{^^^^29f5}}

    This will yield virtually the same optical output that LaTeX users would expect. However, this way, \setminus would semantically be typeset using the Unicode code point REVERSE SOLIDUS.

  2. Always use SET MINUS. This forces \setminus to emit SET MINUS and not REVERSE SOLIDUS OPERATOR as before, effectively making it behave exactly like \smallsetminus.

    \AtBeginDocument{\renewcommand{\setminus}{^^^^2216}}

    This way, both commands yield the same output and there’s no way to enter REVERSE SOLIDIUS OPERATOR other than by direct Unicode input.

  3. Always use REVERSE SOLIDUS. This is the same as (1) but additionally, also overrides \smallsetminus to \setminus, thus making them behave the same and both emit REVERSE SOLIDUS.

    \AtBeginDocument{\renewcommand{\setminus}{^^^^29f5}}
    \AtBeginDocument{\renewcommand{\smallsetminus}{\setminus}}

I just wanted to provide this for completeness’ sake, but I have no idea why you would want to do that.

Other solutions could include simply overriding the \setminus command to \smallsetminus but this would have the disadvantage of being “wrong” any time the font chose to provide a SET MINUS symbol which is not equivalent to a REVERSE SOLIDUS (as, in fact, LM MATH does).

\AtBeginDocument{\renewcommand{\setminus}{\smallsetminus}}

Similarly, one could simply map \setminus directly to the custom created backslash glyph. The only difference to (1) would be that entering the unicode code point directly would still emit REVERSE SOLIDUS OPERATOR and thus result in a failing lookup.

\AtBeginDocument{\renewcommand{\setminus}{\mathbin{\backslash}}}

Recap

Sometimes, finding the “correct” choice is quite difficult. The GUST team mostly follows the “semantic” reading that disregarding how the output “looks”, a set minus operator should always be encoded as the set minus code point.

The unicode-math team is simply consistent with the naming that HTML and others chose for the two characters and provides both commands but for accessing different code points. Whether these code points only differ in looks or also in semantics is the big debate.

Unfortunately, knowing this doesn’t help the troubled user. In my case, at least, it forced me to go down this rabbit hole for too long of a time and I did learn something. So, thanks, guess?


  1. https://tex.stackexchange.com/questions/140279/which-unicode-math-fonts-support-setminus/140343#comment1383759_523798↩︎