Unicode + PHP

Unicode, UTF-8, ASCII, BOM, ISO 10646, multibyte, collation, charsets, etc... there's a lot of technical jargon when it comes to characters.

With beautiful slides, animated GIFs, and most importantly, in plain English, we will discover character encodings that every programmer must know, and how we can handle Unicode characters in PHP.

There are various languages used in the world, and each language has different scripts and glyphes of various lengths, heights and rules. With Emojis getting popular (with their own movies no less!), it is important to accommodate all these weird looking characters, understand how they are represented, possible gotchas, and how to process them.

During the talk, we will take a look at a flawed snippets, and how they can be fixed to process Unicode characters properly.

We will also take a look at other IO operations such as file read/write and database connections where we must pay attention to make sure everything works nice with Unicode characters.
What you'll learn from this talk:

  • How characters are stored in computers
  • How characters are represented
  • Different character sets
  • Character encoding
  • Unicode, Unicode plains, and different flavors of Unicode character encodings
  • Multi-byte characters in programming: PHP and database systems
  • How to properly accept, sanitize, store, and present all sorts of characters.

You can download the slides from links below. Note that there are several animations in the slides, which won't appear in the PDF. If possible, use the PPTX version. You can alternately view it online from link below.

PPTX | PDF | View online