CYAML Internals
Loading...
Searching...
No Matches
Functions
utf8.c File Reference

CYAML functions for handling utf8 text. More...

#include <assert.h>
#include <stdint.h>
#include <stdbool.h>
#include "utf8.h"
Include dependency graph for utf8.c:

Functions

static unsigned cyaml_utf8_char_len (uint8_t b)
 
unsigned cyaml_utf8_get_codepoint (const uint8_t *s, unsigned *len)
 
static unsigned cyaml_utf8_to_lower (unsigned c)
 
static int cyaml_utf8_difference (unsigned a, unsigned b)
 
int cyaml_utf8_casecmp (const void *const str1, const void *const str2)
 

Detailed Description

CYAML functions for handling utf8 text.

Function Documentation

◆ cyaml_utf8_casecmp()

int cyaml_utf8_casecmp ( const void *const  str1,
const void *const  str2 
)

Case insensitive comparason.

Note
This has some limitations and only performs case insensitive comparason over some sectons of unicode.
Parameters
[in]str1First string to be compared.
[in]str2Second string to be compared.
Returns
0 if and only if strings are equal.

◆ cyaml_utf8_char_len()

static unsigned cyaml_utf8_char_len ( uint8_t  b)
inlinestatic

Get expected byte-length of UTF8 character.

Finds the number of bytes expected for the UTF8 sequence starting with the given byte.

Parameters
[in]bFirst byte of UTF8 sequence.
Returns
the byte width of the character or 0 if invalid.

◆ cyaml_utf8_difference()

static int cyaml_utf8_difference ( unsigned  a,
unsigned  b 
)
inlinestatic

Find the difference between two codepoints.

Parameters
aFirst codepoint.
bSecond codepoint.
Returns
the difference.

◆ cyaml_utf8_get_codepoint()

unsigned cyaml_utf8_get_codepoint ( const uint8_t *  s,
unsigned *  len 
)

Get a codepoint from the input string.

Caller must provide the expected length given the first input byte.

If a multi-byte character contains an invalid continuation byte, the character length will be updated on exit to the number of bytes consumed, and the replacement character, U+FFFD will be returned.

Parameters
[in]sString to read first codepoint from.
[in,out]lenExpected length of first character, updated on exit.
Returns
The codepoint or 0xfffd if character is invalid.

◆ cyaml_utf8_to_lower()

static unsigned cyaml_utf8_to_lower ( unsigned  c)
static

Convert a Unicode codepoint to lower case.

Note
This only handles some of the Unicode blocks. (Currently the Latin ones.)
Parameters
[in]cCodepoint to convert to lower-case, if applicable.
Returns
the lower-cased codepoint.