Ć is a programming language which can be automatically translated to C, Java, C#, JavaScript, ActionScript, Perl and D. The translator is called cito.

Ć is useful for writing very portable programming libraries. The idea is that you should be able to write code once and run it anywhere: on a desktop computer, on a smartphone, in your web browser. Yet the situation is as follows:

  • Desktop applications are usually written in C, C++, Java, C# or Objective-C and the language choice is largely biased by the target operating system.

  • Mobile applications are usually written in Java, Objective-C or C#, sometimes C and C++. Operating system vendor limits the available languages by exposing OS API in one language or another.

  • Web (client-side) applications are usually written in JavaScript, with ActionScript (Flash Player), Java (applets) and C# (Silverlight) losing field.

Now suppose you have invented a new, revolutionary compression algorithm. How would you implement it so it is widely available? If you answer "in C", here are the reasons why Ć might be a better choice:

  1. Web browsers will not execute C code downloaded from the internet. You may either use a browser plugin (non-portable and sometimes unsafe) or some kind of emulation such as NestedVM (ineffective). In contrast, Ć can be translated to pure JavaScript, ActionScript, Java and C#. Java lets you target Android without resorting to JNI.

  2. Libraries which aim to be very portable contain bindings (wrappers) for languages other than C. This requires some additional work to keep the bindings synchronized as the library is updated. With Ć there’s no need for bindings, because an implementation in Ć is translated entirely to the target language. cito even ensures that naming conventions match the target language — for example in Java a method gets called doFoo and a constant FOO_BAR whereas in C# they will be called DoFoo and FooBar respectively. Documentation comments are translated as well.

  3. C is old. Moreover, the 2012’s Microsoft compiler doesn’t even support C99, so there’s no standard bool type and all local variables must be defined at the beginnings of blocks. Ć resembles C# or Java, and cito can translate it to C99 as well as C89.

There are of course limitations of Ć:

  1. Ć has no own library except for a few simple built-in methods. There is no standard I/O or GUI. Ć is meant for low-level tasks, operating directly on bits and bytes. It’s been proved that compression algorithms, emulation of microprocessors and other chips, sound and bitmap synthesis can all be programmed in Ć.

  2. At this moment there’s no single program written entirely in Ć. Owing to the above point at least the user interface must be programmed in a different language.

Source files

Even in the twenty first century programmers tend to avoid Unicode in filenames. So, you may use the .ci filename extension instead of .

Note
In Polish "ci" is pronounced identically to "ć".

Source file contents must be UTF-8 encoded with an optional (but recommended) BOM.

Most of the time whitespace is insignificant in Ć source code. Let’s continue indentation style flame wars!

There are single-line comments from // till the end of line. Documentation comments are described below. There are no /* multiline comments */. If you have a good reason to use them, speak up and they may appear in the next release.

Data types

Integer types

int is 32-bit signed integer (what else would you expect?). Literals may be written in decimal or hexadecimal (0x12ab). I don’t want octal numbers (0123) in Ć. I may change my mind later.

Character literals (such as 'a') represent the Unicode codepoint of the character, as an int (not char because there’s no such type in Ć). You may also use the following escape sequences:

  • '\t' — horizontal tab

  • '\r' — CR

  • '\n' — LF

  • '\\' — backslash

  • '\'' — apostrophe

  • '\"' — double quote

You should be already familiar with the binary operators for int: + - * / % & | ^ << >>, compound assignments (such as *=), incrementations and decrementations (x++ ++x x-- --x), negation (-), bitwise complement (~) and comparisons (== != < <= > >=).

The only other integer type in current version of Ć is byte.

Note
Unlike in Java, byte in Ć is unsigned.

There’s an implicit conversion from byte to int so you can use the above operators on byte values.

x.LowByte is an explicit conversion of x of type int to byte. Casting by specifying type name in parentheses is not supported.

x.MulDiv(y, z) means (x * y / z) with a reduced risk of overflow. It translates to Java and C# as ((int) (long) x * y / z).

x.SByte for x of type byte returns a signed byte. That is, for x in range 128..255 it returns a corresponding value in range -128..-1.

Boolean type

The boolean type is called bool and its literals are true and false.

Boolean operators are ! (not), && (and), || (or) and the ternary operator x ? y : z (if x then y else z).

Translation of the boolean type to C depends on the selected dialect of C. Every translation of Ć to C99 starts with:

#include <stdbool.h>

then refers to bool, true and false.

A translation to C89 includes the more verbose:

typedef int cibool;
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif

with later uses of cibool, TRUE and FALSE.

Note
These were only the translations to C. In Ć code true is not a synonym for 1 nor false for 0.

Enumerations

This is a simple definition of an enumerated type:

enum DayOfWeek
{
    Monday,
    Tuesday,
    Wednesday,
    Thursday,
    Friday,
    Saturday,
    Sunday
}

Elements of enumerations cannot be assigned integer values. There are no conversions between enumerated and integer types.

enum may be preceded with the keyword public to extend the visibility of the type outside Ć, that is, make the enumerated type part of the public interface of the library implemented in Ć.

When referencing a value of an enumerated type you need to include the type name, for example DayOfWeek.Friday. Note to Java programmers: this includes the case clauses.

Strings

For text processing please choose Perl and not Ć. Really.

In Ć there are two string data types:

  • Pointer to a string, written simply as string. It’s translated to C as const char *.

  • String storage, written as string(n), where n is an integer constant. The constant specifies maximum string length. "string(15) s" in Ć translates to C as "char s[16]" because of the trailing NUL character in C.

The above distinction is only visible in the translation to C. In other languages the same String type will be used for both Ć types.

String literals are written in double quotes: "Hello world". You may use \n and the other escape sequences as in character literals. String literals may be concatenated with the + operator. However, in this version of Ć you cannot use this operator for string variables.

Speaking of operators, compare strings with == and !=. cito will translate them to strcmp in C and x.equals(y) in Java.

A pointer (to a string or the described below array and object) may have the value null.

Conversions of storage (of string, array, object) to pointers are implicit. There’s no & operator known from C.

If you feel any fear hearing "pointer", you really shouldn’t. Ć pointer is same as Java reference. There is no pointer arithmetic. You cannot take a pointer to an int. Do you feel safer now?

s.Length returns string length. s[i] returns the code of the character at position i. s.Substring(startIndex, length) returns a substring, which must be assigned to a string storage. Beware that the current translation to C is not Unicode-aware. Yes, cito is at version 0.4.0.

When assigning a string to a string storage it is programmer’s responsibility to assure that string length doesn’t exceed storage size. String assignment translates to C’s strcpy. Similarly for the += operator translated to strcat.

Arrays

There are two kinds of array types:

  • Pointer to an array: T[], where T is the type of array elements.

  • Array storage: T[n], where n is array size.

array[i] represents an array element that can be read and written to.

byte arrays can be copied (sourceArray.CopyTo(sourceIndex, destinationArray, destinationIndex, length)) and converted to strings (sourceArray.ToString(startIndex, length)). The latter will understand Unicode some day.

Array size is only available for array storage, via array.Length. You can zero all elements of byte and int array storage with array.Clear().

You may declare constant byte arrays, which contain compile-time file contents. For example BinaryResource("foo.bar") is an array consisting of bytes of the file foo.bar read while running cito.

Arrays of size not known at compile time should be allocated using the syntax new T[n] and assigned to T[] variables. Since there is no garbage collection in C, you must free the dynamically allocated array with delete. delete translates to C as free and nothing in other (garbage-collecting) languages. Example:

int[] a = new int[1000000];
// do something with a...
delete a;

Classes

You shouldn’t be surprised by the syntax of class definition:

class HelloCi
{
    // class contents (members) goes here
}

Do not place a semicolon after the closing brace. But do so in C++ (would you expect an advice on Ć in a C++ manual?).

Ć now supports single inheritance. Put base class name after a colon:

class Derived : Base
{
}

You may define many classes in one .ci source file. There’s no restriction on class vs source file names.

A class defined as above is only visible to Ć code. To make it visible from the outside, add the public modifier.

By default, members are visible to all Ć code. If they aren’t accessed from other classes, cito will translate their visibility to private. That is, cito infers the minimum needed visibility from the usage. On the other hand, public defines the interface of your library and must be written by you. The public modifier can be used on methods and constants but not fields.

Class members can be:

  • constants

  • fields

  • methods

  • a constructor

  • macros

You cannot define properties known from C#. However, we already saw some built-in properties, such as Length. Let’s face inequality of user-defined types to built-in types.

For each class C there are the following types:

  • Pointer to an object: written simply as the class name: C.

  • Object storage: the syntax is class name followed by a pair of parentheses: C(). The semantics is one new object of the given class, which cannot be replaced with another one.

For example:

MyObject() foo;

translates to Java as follows:

final MyObject foo = new MyObject();

new C() allocates new instance of class C and returns a pointer to it. If you target the C programming language, delete the allocated object some time later.

Delegates

A delegate combines a pointer to a method with a pointer to an object, just like in C#:

public delegate void ByteWriter(int data);
...
public class MyWriter
{
    public void WriteTo(ByteWriter bw)
    {
        ...
        bw(x);
        ...
    }
}

Support for delegates is incomplete in this version of Ć.

Constants

Here are some examples of constant definitions:

public class ConstDemo
{
    const int Foo = 5;
    public const int Bar = Foo + 3;
    public const string Hello = "Hello, " + "world!";
    const byte[] Rainbow = { 0xff, 0xff, 0x00, 0x06, 0x08, 0x06, 0xad, 0x0b, 0xd4, 0x8d, 0x1a, 0xd0, 0x4c, 0x00, 0x06 };
}

The translated code only contains public constants and constant arrays. Uses of other constants are expanded to their values.

cito performs constant folding, that is operations on constants are computed during the translation. For example, 2*3 gets translated to 6.

Fields

Fields are defined as follows:

public class FieldDemo
{
    int Field1;
    SomeObject AnObjectPointer;
    SomeObject() AnObjectStorage;
}

Fields cannot be public — instead, define public methods to get and set values of fields.

Methods

By default, methods have an access to an instance of the class they are defined in — via the this pointer. An exception are static methods (defined with the static modifier).

In method definition the method name must be preceded with the returned type. void means that method returns no value.

Ć doesn’t support method overloading. This means that method names must be unique within the class.

class MethodDemo
{
    int Counter;
    static int Sum(int x, int y)
    {
        return x + y;
    }
    public void Increment()
    {
        Counter++;
    }
    public int Test(MethodDemo md, int x)
    {
        md.Increment();
        return MethodDemo.Sum(md.Counter, 3);
    }
}

By default, methods are not virtual. Use abstract, virtual and override modifiers as in C#:

abstract class Base
{
    abstract void Method1();
    virtual void Method2()
    {
        // implementation
    }
}

class Derived : Base
{
    override void Method1()
    {
        // implementation
    }
    override void Method2()
    {
        // implementation
    }
}

Constructor

A class may contain one parameterless constructor. The constructor initializes new instance of the class.

class ConstructorDemo
{
    int Counter;
    ConstructorDemo()
    {
        Counter = 0;
    }
}

Expected destructors? Nope, there aren’t any.

Statements

Statements are used in methods and constructors. Their syntax in Ć is nearly identical to the languages you already know.

In Ć there’s no empty statement consisting of the sole semicolon. Such a statement could be a programmer’s typo. You may use an empty block instead: { }.

Blocks

A block is a sequence of statements wrapped in braces.

Variable definitions

Each variable must be defined separately, like this:

int x;
int y;

and not that:

int x, y; // ERROR

Variable definition may include an initial value:

int x = 5;
int[4] array = 0; // this array is initialized to contain zeros

Expressions

Not all expressions may be used as statements. These are valid statements:

DoFoo(4, 2); // method call
i++;

These are not valid statements:

4 + 2; // ERROR, useless expression
++i; // ERROR, use i++ instead

There is no comma operator in Ć.

Assignments

An assignment is a statement and not an expression. Therefore expressions cannot include assignments.

x = y;
y = 5;
x = y = 3; // you may combine assignments
x += 4; // adds 4 to x
if ((x = Foo()) != null) { Bar(); } // ERROR

Constants

Constant definitions in classes were already described. Constant definitions are also allowed at the level of statements. Such a definition has a scope limited to the containing block. For this reason statement-level constant definitions cannot be public.

Returning method result

A method can end its execution with a return statement. return must be followed with the returned value, except for void methods of course.

Conditional statement

That’s the familiar if with an optional else clause.

if (x == 7)
    DoFoo();
else
    DoBar();

Loops

There are three kinds of loops:

  • while checking the condition at the beginning of each run.

  • do/while checking the condition after the first run.

  • for which contains an initial statement and a statement executed after each run.

Inside loops you may use:

  • break to leave the loop (the inner one as there are no loop labels).

  • continue to skip to the next run.

Switch statement

case clauses must be correctly terminated, for example:

switch (x) {
case 1:
    DoFoo();
    // ERROR: something's missing here
case 2:
    DoBar();
    break;
}

Correct termination means a statement that doesn’t fall to the next statement: break, continue, return, throw or C#-style goto case / goto default, which jumps to the next case.

switch (x) {
case 1:
    DoFoo();
    goto case 2; // now it's clear what the programmer meant
case 2:
    DoBar();
    break;
}

The default clause if present, must be specified last.

Exceptions

Ć can throw exceptions, but cannot handle them at the moment. The idea is that the exceptions will be handled by the code using the library written in Ć.

An exception can be thrown with the throw statement with a string argument. You cannot specify the class of the exception, it’s hardcoded in cito (for example java.lang.Exception).

Translation of exceptions to C needs an explanation. The string argument is lost in the translation and the throw statement is replaced with return with a magic value representing an error:

  • -1 in a method returning int.

  • NULL in a method returning a pointer.

  • false in a void method. The method will be translated to bool and true will be returned if the method succeeds.

Native blocks

Code which cannot be expressed in Ć can be written in the target language using the following syntax:

native {
    printf("Hello, world!\n");
}

Generally, native blocks should be used inside #if (see below).

Macros

Macros in Ć are similar to the ones known from C, yet the syntax is different. There are two kinds of macros:

Expression macro

Defined as:

macro MY_EXPR_MACRO(arg1, arg2) (expr) // all parentheses are required

Expression macro is always expanded with its outer parentheses.

Statement macro

Defined as:

macro MY_STATEMENT_MACRO(arg1, arg2) { statement; }

Statement macro is expanded without the outer braces or the trailing semicolon. This way the expansion of MY_STATEMENT_MACRO(1, 2); doesn’t end with two semicolons.

Unlike in C and C++, multi-line macro definitions don’t include trailing backslashes. The end of the macro is determined by matching the parentheses or braces.

Macros can have any (but constant) number of arguments. The parentheses surrounding the parameters in the definition and the arguments in the reference are required. You should not use macros for constants — use const instead.

Macros can be defined as class members and at the statement level. This limits their scope to classes and blocks respectively.

There’s a known from C token pasting operator: ##.

Conditional compilation

Conditional compilation in Ć is modeled after C#. Conditional compilation symbols can only be given on the cito command line and are unrelated to the macros described above. Conditional compilation symbols have no assigned value, they are only present or not.

Example:

#if MY_SYMBOL
    MyOptionalFunction();
#endif

A more complicated one:

#if WINDOWS
    DeleteFile(filename);
#elif LINUX || UNIX
    unlink(filename);
#else
    UNKNOWN OPERATING SYSTEM!
#endif

The operators allowed in #if and #elif are !, && and ||. You may reference true, which is a symbol that is always defined. false should be never defined.

Documentation comments

Documentation comments can describe classes, enumerated types, constants, methods and their parameters. They start with three slashes followed by a space and always immediately precede the documented thing, including the method parameters:

/// Returns the extension of the original module format.
/// For native modules it simply returns their extension.
/// For the SAP format it attempts to detect the original module format.
public string GetOriginalModuleExt(
    /// Contents of the file.
    byte[] module,
    /// Length of the file.
    int moduleLen)
{
    ...
}

Documentation comments should be full sentences. First sentence, terminated with a period at end of line, becomes summary. Next sentences (if any) introduce more details.

There are limited formatting options: fixed-width font, paragraphs and bullets.

fixed-width font text (typically code) is delimited with backquotes:

/// Returns `true` for NTSC song and `false` for PAL song.

In long comments, paragraphs are introduced by blank documentation comment lines:

/// First paragraph.
///
/// Second paragraph.

Bullets are introduced with an asterisk followed by space:

/// Sets music creation date.
/// Some of the possible formats are:
/// * YYYY
/// * MM/YYYY
/// * DD/MM/YYYY
/// * YYYY-YYYY
///
/// An empty string means the date is unknown.
public void SetDate(string value)
{
    CheckValidText(value);
    Date = value;
}

Syntax and implementation of documentation comments is still work in progress.

Naming conventions

It is advised to use the following naming conventions in Ć code:

  • Local variables and parameters start with a lowercase letter, capitalize the first letter of the following words — that is, camelCase.

  • All other identifiers should start with an uppercase letter — that is, PascalCase.

Generators will translate the above convention to the one native to the output language, for example constants written as UPPERCASE_WITH_UNDERSCORES.