Wordo Word Generator

A place to generate words to spec.

This requires some programming

If you are programming impaired, check out awkwords (which secretly is also a mini-programming language). That said, Wordo uses a domain specific language. It lacks the parts to be a general purpose programming language, so it is possibly easy enough for non programmers to use. And a typical program is less than 50 lines of code.

What is phonotactics

Typically, a conlang description will start with a phoneme inventory, usually in IPA, possibly a practical alphabet (the non-IPA one), and either sample words or a rule, such as, "all syllables follow the format (C)CCV(C)(C)" where C means any consonant, V means any vowel, parenthesis means optional, no parenthesis means obligatory.

If you don't have a phonetics inventory, you can generate one. You will need to pick from the IPA possibilities because just "s" or "b" gives little guidance on exactly how to pronounce it. Once you've picked an inventory, and created a practical orthography, you need to pick a syllable structure, they come in 4 varieties (ref. WALS)

  • Simple: C + V (C)
  • Moderately Complex: C + Glides or Liquids + V + (C)
  • Complex: Consonant Cluster + V + Consonant Cluster
  • Really Complex: (C) + Obstruents + (C)
  • All vowels: V

The first part is onset, the middle is the nucleus, i.e. vowel(s), the last part is the coda. The nucleus and the coda together make up the rhyme. If there is a glide or liquid after the initial consonant and before the nucleus, sometimes that is called a medial, especially for the "moderately complex" word. Otherwise it is just another consonant in an initial cluster.

Typically only certain categories of sounds can go in a given slot. The categories can be determined by in what row or column the sound is on the IPA chart.

The Wordo syntax supports a lot of commands. Let's start with just enough commands to let you generate words from a typically conlang's description. The "Tokens" command is followed by a name, then a list of letters. If a line starts with // then it is a comment and is ignored. Rules are the things that are executed and in general a rule either prints something to the screen, or executes another rule. The first rule executed is "StartingRule" Identifiers are not case sensitive.

Here is the smallest word generator program

//Rule for CVC
Tokens Consonants b l t
Tokens Vowels a e i
Tokens Codas n m h

StartingRule word 50
Rule Syllable {
  Tokens Consonants
  Tokens Vowels
  Tokens Codas
}

This generates 50 words in a strict CVC pattern. As an aside note, the resulting logotome (the set of all possible words) is going to be full of minimal pairs. Minimal pairs are words different by a single sound.

Phonotactics with Random Variation

When things are optional, you will need to assigned weighed probabilities to them.

//Rule for (C)V(C), with 50% probabilities of presence of consonants.
Tokens Consonants b l t
Tokens Vowels a e i
Tokens Codas n m h

StartingRule word
Rule syllable {
  Loop 1[0] 1[1] {
     Tokens Consonants
  }
  Tokens Vowels
  Loop 1[0] 1[1] {
     Tokens Codas
  }
}

The Loop command will cause a rule to be executed as many times as indicated in the brackets as determined by the weights. So 1[0] 1[1] means, execute 0 times 1/(1+1)% of the time, execute 1 time, 1/(1+1)% of the time.

Words then build up out of syllables:
The number before Rule, when inside a loop, is a weighting.
Check by ensuring that the numbers in brackets increase.
Rule word {
  Loop 5[1] 2[2] 1[3] 1[4]
  {
   1 Rule syllable
  }
}

The probabilities represented here are:
5/(5+2+1+1) % chance of 1 syllable word
2/(5+2+1+1) % chance of 2 syllable word
etc

Tips:

Sometimes it is useful to create an Tokens list that has no elements.
StartingRule word 50 means create 50 words.
StartingRule can occur multiple times, eg.
StartingRule noun 75
StartingRule verb 25
means create 75 nouns and 25 verbs

Next lesson, morphology and dictionaries.