Gurmukhi UniCode . . .
It is important that you have a font that is capable of displaying the characters that you need and, displaying them in the right way.
We've seen how to input the UniCode values directly onto a page but there are alternative means that can give you more control if you are prepared to make the effort.
If you are writing your own HTML code, you can write the codes into the page directly. You start off with an ampersand (&) and then a hash (#) followed by the UTF-8 code and then a semicolon. So, the code for a 'ਪ' would look like 'ਪ'.
Gurmukhi lies between 2560 and 2679 but there are two other characters that are important that are usually included in the Devanagari range - they are for:
- end-of-sentence (danda &2404; ।); and,
- end-of-paragraph (double-danda &2405; ॥).
Below is a table of the UTF-8 values in the Gurmukhi range . . .
| || xxx0 || xxx1 || xxx2 || xxx3 || xxx4 || xxx5 || xxx6 || xxx7 || xxx8 || xxx9 |
Above, you can see the codes for each of the alphabetical characters and the vowel sounds - along with the vowels with their respective characters - and other diacritical marks. To get a 'ਪ', find it on the table and then look at the row ('260x') and the column ('xxx2'), substitute the 'x's with the numbers and you get the numerical value you need.
The table is organised so that it starts off with a few diacritical marks at the beginning of the 256x line and then it gets into the vowel carriers with their respective vowels.
Then, at 2581, it starts off with line 2 of the alphabet with ਕ and proceeds through the alphabet, until it gets to 2600, skips a code and then continues with the sixth line, all the way to 2606 (ਮ). Then, it starts off with the following line, 2607 ਯ 2608 ਰ, skips one, 2610 ਲ, then has the paer-bindi form, skips one then has 2613 ਵ and then we get forms of sassaa and haha and then we go into some other marks.
When we get down to 2649 ਖ਼, we start with some paer-bindi forms and then we get ੜ at 2652. Finally, we get the numbers and some other characters - ੴ 2676 is 'ek onkar' which is a Sikh symbol meaning 'one god'.
The other marks between 2620 and 2637 need to have a letter to attach themselves to. You can see that they are in their normal pairs and that they are in the same order as the vowels with the carriers at the top of the table. So, here they are...
The bottom two lines of the table above are interesting because:
This leads us onto the question of how do we use these to form a word? So, let's form the word 'Drink'.
- the paer-bindi allows us to generate characters that aren't in the set above. You might see a paer-bindi form of ਵ - ਵ਼ - which is sometimes used to differentiate between 'w' and 'v' in some books and there are others that are used - in dictionaries, you might see ਕ਼ and some others in words of Arabic origin. Essentially, the paer-bindi is used to accommodate sounds form other languages; and,
- The virama is borrowed from another script (Devanagari - ्). In Punjabi, there is an implicit 'a' sound after each letter so ਪਰ sounds like 'para'. The virama cancels out the implicit 'a' of the letter it is joined to. However, the only time this happens explicitly in Punjabi is where you have a paer letter such as a paer-rarra. So the clever people who designed the UTF-8 standard decided that you can form a paer-rarra by sticking a virama between the two letters so, where ਪਰ sounds like 'para', ਪ੍ਰ sounds like 'pra'
|Start with the 'English' 'd'||ਡ||ਡ|
|Add the paer-rarra so now we have 'dr' ||ਡ੍ਰ||ਡ੍ਰ|
|Add a sihari like so||ਡ੍ਰਿ||ਡ੍ਰਿ|
|Now a tippee||ਡ੍ਰਿੰ||ਡ੍ਰਿੰ|
|Finally a kakkaa||ਡ੍ਰਿੰਕ||ਡ੍ਰਿੰਕ|