Protocol Buffers, for data exchange with a server beyond JSON

Protocol Buffers is a data definition language created by Google that can be compared to IDL, but is much simpler. Its syntax, based on the C language, evokes that of JSON, with the difference of the use of typed variables.
Google has defined this language for use on its own servers that store and exchange big quantities of structured data, and in 2008 decided to make it open source. It is used in Android to speed up exchanges with the server (in Marketplace for example).

The proto files have a dual format, the human readable source and the binary that can be handled quickly by the machine.
It may be used for three reasons among other:

It is an alternative to XML, much more compact, with a processing time considerably decreased.
It is a means of storing structured data and exchange them between software, possibly written in different programming languages, and between a server and a client.
A library of functions is included to assist in the use of Web services.
It allows a cross-language serialization of classes. The serialization produces compact and easy to process binary code.

A simple format with advanced tools

First, some definitions to see more clearly:

Protocol Buffers: name of the language and name of units of data encapsulated into a proto file.
Proto: a data definition file in the PB language, with the .proto extension.
Protoc: name of the compiler that produces classes or binaries.

Features of the language

Object oriented language, each message inherits the Message class.
Typed data language.
Textual and binary formats.
The Protoc compiler generates, from the data definition, a class in the choosen language.
The compiler provides C++ or Java classes and is intended to be compatible with all languages.
The class is serialized into a binary file. Protoc can also produce the binary file from the PB language.
A unit is called "message". A .proto file can contain several messages.
Supports namespaces.
The structure of a message is recursive, a proto structure may have elements that are other proto structures.
repeated fields, as in XML, their definition can be reused in the same message.
Dynamically extensible.

Syntax

Each source has the form:

message name     {     
  ...list of data fields...   
 }

The main scalar types are string, int32, int64, float, double, bool.
Variables may be declared with a modifier: required, optional, repeated.

A sequence number is assigned to each variable, which is a directive to the compiler and not a value for the variable.

required string x = 1     // 1 is not a value

An initial value may be assigned with the default directive:

required string x = 1 [default="Some text"];

In addition to primitives, nested types are added by embedding a message into another message:

message container
{
  required int32 number = 1;  
  message contained
  {
    repeated string x = 1;
 }
}

The "contained" object and its variables can be accessed through a string as container.contained.x

Enumerations with the type enum can be included in messages.

When we defined the structure of a message, it is used in a program by creating an instance. To it are associated methods specific to the class and produced by the compiler int C++ or Java generated files.

container myinstance;
myinstance.set_number(18);

Sample code

Simple message.

message hello
{
    required string = 1 [default="the message"];
    optional int32 = 2;
}

See the definition of the PB programming language for more details.

Download the Protoc compiler and get the full documentation included in the archive.

See also...

Protocol Buffers tutorial, A short manual for a first use.
JSON. A simpler format for JavaScript and server-side languages.
FlatBuffers. By Google again a very fast serialization framework, enabling the use of data stored without loading the file into memory. An example of use is the data used by a game such as maps, sprites, etc.