Documentation


Introduction

The library IdnaConvert allows to convert internationalized domain names (see RFC 3492, RFC 5890, RFC 5891, RFC 5892, RFC 5893, RFC 5894, RFC 6452, for details) as they can be used with various registries worldwide to be translated between their original (localized) form and their encoded form as it will be used in the DNS (Domain Name System).

The library provides two classes (ToIdn and ToUnicode respectively), which expose three public methods to convert between the respective forms. See the Example section below. This allows you to convert host names (simple labels like localhost or FQHNs like some-host.domain.example), email addresses and complete URLs.

Errors, incorrectly encoded or invalid strings will lead to various exceptions. They should help you to find out, what went wrong.

Unicode strings are expected to be UTF-8 strings. ACE strings (the Punycode form) are always 7bit ASCII strings.


Installation

Via Composer

composer require algo26-matthias/idna-convert

Official ZIP Package

The official ZIP packages are discontinued. Stick to Composer or Github to acquire your copy, please.


Upgrading to a newer version


3.0

The library has been broken down into various specific classes, thus more closely following SOLID principles.

As such the single class IdnaConvert has been broken down into ToIdn and ToUnicode respectively. Their naming reflects the format of the outcome, so it's more clear to distinguish, what you need. This should be easier to grasp then the old method names encode() and decode(). Usually you will only need one conversion direction per script run, so why bother loading and parsing all the other unused code, then?

Also the handling of host names (simple labels like my-hostname or FQHNs like some-host.my-domain.example) is now separated from that of email addresses and URLs.
Both classes offer the same set of public methods:

convert() To convert host names
convertEmailAddress() To convert email addresses
convertUrl() To convert the host name of an URL

There's no "strict mode" anymore, this is achieved by the separate methods above. The IDN version is selected when instantiating the class, no more setting during runtime. Also, the encoding (for the Unicode side of things) is now always UTF-8. Use TransCodeUnicode or EncodingHelper for converting to and from various encodings to UTF-8.

All actual sub classes like that for NamePrep and the actual Punycode transformation are put in their own namespaces under Algo26\IdnaConvert, e.g. Algo26\IdnaConvert\NamePrep. Interfaces and Exceptions also have their own namespace to declutter the class structure even more.

The class EncodingHelper is now called separated into the two classes ToUtf8 and FromUtf8 respectively and lies under the namespace Algo26\idnaConvert\EncodingHelper. The class UnicodeTranscoder is now called TransCodeUnicode under the namespace Algo26\idnaConvert\TransCodeUnicode.

All examples are updated to reflect the new usage. See the ReadMe for more details.

Also the minimum PHP version is now 7.2.


2.0

The library has been handed over to actively maintained GitHub and Packagist accounts. This lead to a change in the namespace.
Replace all occurrences of
Mso\IdnaConvert or PhlyLabs\IdnaConvert to Algo26\IdnaConvert.
There's no further changes to the class signatures.


1.0

BC break

As of version 1.0.0 the class closely follows the PSRs PSR-1, PSR-2 and PSR-4 of the PHP-FIG. As such the classes' naming has been changed, a namespace has been introduced and the default IDN version has changed from 2003 to 2008 and minimum PHP engine version raised to 5.6.0.


0.8.0

As of version 0.8.0 the class fully supports IDNA 2008.
Thus the aforementioned parameter is deprecated and replaced by a parameter to switch between the standards. See the updated example 5 in the ReadMe.


0.6.4

BC break

As of version 0.6.4 the class per default allows the German ligature ß to be encoded as the DeNIC, the registry for .DE allows domains containing ß.


0.6.0

ATTENTION: As of version 0.6.0 this class is written in the OOP style of PHP 5.
Since PHP 4 is no longer actively maintained, you should switch to PHP 5 as fast as possible.
We expect to see no compatibility issues with the upcoming PHP 6, too.


Examples


Example 1.

Say we wish to encode the domain name nörgler.com:

                
<?php
// Include the class
use Algo26\IdnaConvert\ToIdn;
// Instantiate it
$IDN = new ToIdn();
// The input string, if input is not UTF-8 or UCS-4, it must be converted before
$input = utf8_encode('nörgler.com');
// Encode it to its punycode presentation
$output = $IDN->convert($input);
// Output, what we got now
echo $output; // This will read: xn--nrgler-wxa.com
                
            

Example 2.

We received an email from a internationalized domain and are want to decode it to its Unicode form.

                
<?php
// Include the class
use Algo26\IdnaConvert\ToUnicode;
// Instantiate it
$IDN = new ToUnicode();
// The input string
$input = 'andre@xn--brse-5qa.xn--knrz-1ra.info';
// Encode it to its punycode presentation
$output = $IDN->convertEmailAddress($input);
// Output, what we got now, if output should be in a format different to UTF-8
// or UCS-4, you will have to convert it before outputting it
echo utf8_decode($output); // This will read: andre@börse.knörz.info
                
            

Example 3.

The input is read from a UCS-4 coded file and encoded line by line. By appending the optional second parameter we tell enode() about the input format to be used

                
<?php
// Include the class
use Algo26\IdnaConvert\ToIdn;
use Algo26\IdnaConvert\TranscodeUnicode\TranscodeUnicode;
// Instantiate
$IDN = new ToIdn();
$UCTC = new TranscodeUnicode();
// Iterate through the input file line by line
foreach (file('ucs4-domains.txt') as $line) {
    $utf8String = $UCTC->convert(trim($line), 'ucs4', 'utf8');
    echo $IDN->convert($utf8String);
    echo "\n";
}
                
            

Example 4.

We wish to convert a whole URI into the IDNA form, but leave the path or query string component of it alone. Just using encode() would lead to mangled paths or query strings. Here the public method encode_uri() comes into play:

                
<?php
// Include the class
use Algo26\IdnaConvert\ToIdn;
// Instantiate it
$IDN = new ToIdn();
// The input string, a whole URI in UTF-8 (!)
$input = 'http://nörgler:secret@nörgler.com/my_päth_is_not_ÄSCII/');
// Encode it to its punycode presentation
$output = $IDN->convertUrl($input);
// Output, what we got now
echo $output; // http://nörgler:secret@xn--nrgler-wxa.com/my_päth_is_not_ÄSCII/
                
            

Example 5.

Per default, the class converts strings according to IDNA version 2008. To support IDNA 2003, the class needs to be invoked with an additional parameter.

                
<?php
// Include the class
use Algo26\IdnaConvert\ToIdn;
// Instantiate it, switching to IDNA 2003, the original, now outdated standard
$IDN = new ToIdn(2008);
// Sth. containing the German letter ß
$input = 'meine-straße.example';
// Encode it to its punycode presentation
$output = $IDN->convert($input);
// Output, what we got now
echo $output; // xn--meine-strae-46a.example<

// Switch back to IDNA 2008
$IDN = new ToIdn(2003);
// Sth. containing the German letter ß
$input = 'meine-straße.example';
// Encode it to its punycode presentation
$output = $IDN->convert($input);
// Output, what we got now
echo $output; // meine-strasse.example
                
            

Encoding helper

In case you have strings in encodings other than ISO-8859-1 and UTF-8 you might need to translate these strings to UTF-8 before feeding the IDNA converter with it. PHP's built in functions utf8_encode() and utf8_decode() can only deal with ISO-8859-1.
Use the encoding helper class supplied with this package for the conversion. It requires either iconv, libiconv or mbstring installed together with one of the relevant PHP extensions. The functions you will find useful are toUtf8() as a replacement for utf8_encode() and fromUtf8() as a replacement for utf8_decode().

Example usage:

                
<?php
use Algo26\IdnaConvert\ToIdn;
use Algo26\IdnaConvert\EncodingHelper\ToUtf8;

$IDN = new ToIdn();
$encodingHelper = new ToUtf8();

$mystring = $encodingHelper->convert('<something in e.g. ISO-8859-15', 'ISO-8859-15');
echo $IDN->convert($mystring);
                
            

Transcode Unicode

Another class you might find useful when dealing with one or more of the Unicode encoding flavours. It can transcode into each other: - UCS-4 string / array
- UTF-8
- UTF-7
- UTF-7 IMAP (modified UTF-7)
All encodings expect / return a string in the given format, with one major exception: UCS-4 array is just an array, where each value represents one code-point in the string, i.e. every value is a 32bit integer value.

Example usage:

                
<?php
use Algo26\IdnaConvert\TranscodeUnicode\TranscodeUnicode;
$transcodeUnicode = new TranscodeUnicode();

$mystring = 'nörgler.com';
echo $transcodeUnicode->convert($mystring, 'utf8', 'utf7imap');
                
            

Run PHPUnit tests

The library is supplied with a docker-compose.yml, that allows to run the supplied tests. This assumes, you have Docker installed and docker-compose available as a command.

Just issue docker-compose up in you local command line and see the output of PHPUnit.


Reporting bugs

Please use the issues tab on GitHub to report any bugs or feature requests.