Mathgen

What?

Mathgen is a program to randomly generate professional-looking mathematics papers, including theorems, proofs, equations, discussion, and references. Try Mathgen for yourself! It’s a fork of SCIgen, a program which generates random papers in computer science.

Why?

Mostly because it’s funny! But there are some other possible uses:

1. Impress your friends, colleagues and/or tenure committee with your prolific research output.
2. There are a lot of shady journals out there. I bet one of them would accept a randomly generated paper. Try it, and let me know what happens!
3. Cheat on your Erdős number.
4. As a way of producing something possibly worthwhile from this project, I am offering randomly generated books for sale via lulu.com, and will donate $5.00 from each copy sold to the American Mathematical Society, in support of (actual, non-random) mathematical research. This would make a great gag gift for a mathematically inclined friend! 5. A great way to come up with thesis topics for your grad students! 6. More seriously, I think this project says something about the very small and stylized subset of English used in mathematical writing. This program only knows a handful of sentence templates, and yet I think its writing style is not far off from many published papers. You could argue this is bad (shows a lack of creativity) or good (makes papers more accessible to those with a limited knowledge of English), but I think we could stand to pay more attention to our writing styles, instead of unthinkingly relying on stock phrases. How? Mathgen uses a handwritten context-free grammar, essentially starting from a basic template and filling in blanks with textual elements of various types. Those elements could in turn contain other blanks, so the process continues recursively. The generator itself is written in Perl. The text is then processed by$\LaTeX\$ and BibTeX to produce the final output file.

The source code is available through Github at:

https://github.com/neldredge/mathgen

If you don’t want to mess with Git, you can just get a zip file containing the code.

Mathgen is free software and released under the terms of the GNU General Public License, version 2.0.

Who?

Mathgen was written by Nate Eldredge, incorporating code from SCIgen, by Jeremy Stribling, Max Krohn, and Dan Aguayo, without whom this project would not exist.
Jordan Eldredge wrote most of the web interface (the parts that are slick and work well; the ugly awkward parts are mine).

A list of names of famous mathematicians, used in the program, was extracted from the web site The Greatest Mathematicians of All Time by James Dow Allen, and is used by permission. A list of countries and other place names was taken from Wikipedia.

33 thoughts on “Mathgen”

1. Hello,
I am first year PhD student at LSE. I just knew about this awesome idea. I think there is a very interesting space in Economics to test a couple of hypothesis. For instance, it seems to me the “Network” in Economics is incredibly important. (more than other sciences, i would say). I understand this as the chance to be accepted in a conference having famous co-authors is greater than not having famous co-authors, (maintining constant the quality of the paper (the same paper random process)
what do you think?, How difficult would be to make this code for Economics?

2. The output looks impressive. I wonder if it would really pass muster with real mathematicians. Of course, there is a history to these efforts, in particularly the Australian story of Mark V. Shaney

http://en.wikipedia.org/wiki/Mark_V_Shaney

which used Markov chains (hence the name)

Cheers
MichaelW

• In 1990 I was looking at using Markov chains to generate post-modern management papers, which I would try submitting. But I was too lazy to collect a large enough corpus. The great “advantage” of Markov chains is that most of your sentences won’t even be grammatical sentences of English, so getting those past reviewers is a stronger demonstration.

Anyway, once Alan Sokal had his hand-crafted paper published in Social Text, reviewers probably got a bit more skeptical.

3. I am no expert mathematician and don’t have experience in Perl; however, I think it might lend realism if the summation (or integration, product, coproduct, union, intersection, n-fold tensor product, …) dummy variables occasionally appeared in the summand (or integrand, etc.). Not all the time, of course; then it would make too much sense. But might it be possible to heighten the probability of them appearing?

• It’s a context-free grammar, so when it goes to select the variables that appear in the summand, it has no memory of what dummy variable went in the subscript.

• Cuando paso de tres variables o me fluctúa a negativo la variable del subíndice

4. I just tested it on Mac OS X 10.8.2 in the terminal app, and it worked fine without modification.

Also, update the list of mathematicians to include more contemporaries (within the last 50 years) so can include names like Tao, Woodin, Ribet, and Wiles.

Just a suggestion.

6. I thought I might create a context-free grammar to generated website comments to comment more appropriately here, but …
(1) I am too lazy.
(2) I think it’s already been done, and has been deployed on YouTube.

7. The first obvious problem I see is that the variables used in the equations aren’t defined beforehand. Other than that, it looks great!

8. Plz I can haz Biogen?

9. This looks impressive 🙂
Is there a possibility that the php files and source for the website be made open source later in the future?

• Maybe. The main reason I haven’t done this is paranoia that I missed some input validation somewhere, and if so I’d rather not make it too easy to exploit. It’s quite simple code, anyway. But I’ll keep this in mind.

10. Amazing and humorous Math generator!

11. I have a lot of fun with it. Maybe it could be better if you reduce the alphabet in each section, sometimes from a line to another there is no common letters even if they appear as equal, in the other hand there is matrix together with sets or sets symbols in operations.
Thanks for your work (I really like it)

12. If you have access to a large database of real papers it might be possible to lift sentences as referenced quotes from them, thus producing a randomly generated paper with correct references. I suppose it would make it more believable but reduce the randomness of it somewhat. Not sure it would fit in with the current script though.