Re: jsonb and nested hstore - Архив списков рассылки pgsql-hackers (2024)

От	Andres Freund
Тема	Re: jsonb and nested hstore
Дата	11 февраля2014г. 05:11:25
Msg-id	20140211021113.GF15246@awork2.anarazel.de обсуждение исходный текст
Ответна	Re: jsonb and nested hstore (Andrew Dunstan <andrew@dunslane.net>)
Ответы	Re: jsonb and nested hstore Re: jsonb and nested hstore
Список	pgsql-hackers
Дерево обсуждения
Hi,Is it just me or is jsonapi.h not very well documented?On 2014-02-06 18:47:31 -0500, Andrew Dunstan wrote:> +/*> + * for jsonb we always want the de-escaped value - that's what's in token> + */> +static void> +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype)> +{> + JsonbInState *_state = (JsonbInState *) state;> + JsonbValue v;> +> + v.size = sizeof(JEntry);> +> + switch (tokentype)> + {> +> + case JSON_TOKEN_STRING:> + v.type = jbvString;> + v.string.len = token ? checkStringLen(strlen(token)) : 0;> + v.string.val = token ? pnstrdup(token, v.string.len) : NULL;> + v.size += v.string.len;> + break;> + case JSON_TOKEN_NUMBER:> + v.type = jbvNumeric;> + v.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1));> +> + v.size += VARSIZE_ANY(v.numeric) +sizeof(JEntry) /* alignment */ ;missing space.Why does + sizeof(JEntry) change anything about alignment? If it wasaligned before, adding a statically sized value doesn't give any newguarantees about alignment?> +/*> + * jsonb type recv function> + *> + * the type is sent as text in binary mode, so this is almost the same> + * as the input function.> + */> +Datum> +jsonb_recv(PG_FUNCTION_ARGS)> +{> + StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);> + text *result = cstring_to_text_with_len(buf->data, buf->len);> +> + return deserialize_json_text(result);> +}This is a bit absurd, we're receiving a string in a StringInfo buffer,just to copy it into text, and then in makeJsonLexContext() access theraw chars again.> +static void> +putEscapedValue(StringInfo out, JsonbValue *v)> +{> + switch (v->type)> + {> + case jbvNull:> + appendBinaryStringInfo(out, "null", 4);> + break;> + case jbvString:> + escape_json(out, pnstrdup(v->string.val, v->string.len));> + break;> + case jbvBool:> + if (v->boolean)> + appendBinaryStringInfo(out, "true", 4);> + else> + appendBinaryStringInfo(out, "false", 5);> + break;> + case jbvNumeric:> + appendStringInfoString(out, DatumGetCString(DirectFunctionCall1(numeric_out,PointerGetDatum(v->numeric))));> + break;> + default:> + elog(ERROR, "unknown jsonb scalar type");> + }> +}Hm, will the jbvNumeric always result in correct correct quoting?datum_to_json() does extra hangups for that case, any reason we don'tneed that here?> +char *> +JsonbToCString(StringInfo out, char *in, int estimated_len)> +{...> + while (redo_switch || ((type = JsonbIteratorGet(&it, &v, false)) != 0))> + {> + redo_switch = false;Not sure if I see the advantage over the goto here. A comment explainingwhat the reason for the goto is wouldhave sufficed.> + case WJB_KEY:> + if (first == false)> + appendBinaryStringInfo(out, ", ", 2);> + first = true;> +> + putEscapedValue(out, &v);> + appendBinaryStringInfo(out, ": ", 2);putEscapedValue doesn't gurantee only strings are output, butdatum_to_json does extra hangups for that case.> + type = JsonbIteratorGet(&it, &v, false);> + if (type == WJB_VALUE)> + {> + first = false;> + putEscapedValue(out, &v);> + }> + else> + {> + Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY);> + /*> + * We need to rerun current switch() due to put> + * in current place object which we just got> + * from iterator.> + */"due to put"?> +/*> + * jsonb type send function> + *> + * Just send jsonb as a string of text> + */> +Datum> +jsonb_send(PG_FUNCTION_ARGS)> +{> + Jsonb *jb = PG_GETARG_JSONB(0);> + StringInfoData buf;> + char *out;> +> + out = JsonbToCString(NULL, (JB_ISEMPTY(jb)) ? NULL : VARDATA(jb), VARSIZE(jb));> +> + pq_begintypsend(&buf);> + pq_sendtext(&buf, out, strlen(out));> + PG_RETURN_BYTEA_P(pq_endtypsend(&buf));> +}Why aren't you using using the stringbuf passing JsonbToCStringconvention here to avoid the strlen()?> +/*> + * Compare two jbvString JsonbValue values, third argument> + * 'arg', if it's not null, should be a pointer to bool> + * value which will be set to true if strings are equal and> + * untouched otherwise.> + */> +int> +compareJsonbStringValue(const void *a, const void *b, void *arg)> +{> + const JsonbValue *va = a;> + const JsonbValue *vb = b;> + int res;> +> + Assert(va->type == jbvString);> + Assert(vb->type == jbvString);> +> + if (va->string.len == vb->string.len)> + {> + res = memcmp(va->string.val, vb->string.val, va->string.len);> + if (res == 0 && arg)> + *(bool *) arg = true;Should be NULL, not 0.> +/*> + * qsort helper to compare JsonbPair values, third argument> + * arg will be trasferred as is to subsequent*transferred.> +/*> + * some constant order of JsonbValue> + */> +int> +compareJsonbValue(JsonbValue *a, JsonbValue *b)> +{Called recursively, needs to check for stack depth.> +JsonbValue *> +findUncompressedJsonbValueByValue(char *buffer, uint32 flags,> + uint32 *lowbound, JsonbValue *key)> +{Functions like this *REALLY* need documentation for theirparameters. And of their actual purpose.What's actually the uncompressed bit here? Isn't it actually thecontrary? This is navigating the compressed, non-tree form, no?> + if (flags & JB_FLAG_ARRAY & header)> + {> + JEntry *array = (JEntry *) (buffer + sizeof(header));> + char *data = (char *) (array + (header & JB_COUNT_MASK));> + int i;> + for (i = (lowbound) ? *lowbound : 0; i < (header & JB_COUNT_MASK); i++)> + {> + JEntry *e = array + i;> + else if (JBE_ISSTRING(*e) && key->type == jbvString)> + {> + if (key->string.len == JBE_LEN(*e) &&> + memcmp(key->string.val, data + JBE_OFF(*e),> + key->string.len) == 0)> + {So, here we have our own undocumented! indexing system. Grand.> + else if (flags & JB_FLAG_OBJECT & header)> + {> + JEntry *array = (JEntry *) (buffer + sizeof(header));> + char *data = (char *) (array + (header & JB_COUNT_MASK) * 2);> + uint32 stopLow = lowbound ? *lowbound : 0,> + stopHigh = (header & JB_COUNT_MASK),> + stopMiddle;I don't understand what the point of the lowbound logic could be here?If a key hasn't been found, it hasn't been found? Maybe the idea is touse it when testing containedness or somesuch? Wouldn't iterating overthe keyspace be a better idea for that case?> + if (key->type != jbvString)> + return NULL;That's not allowed, right?> +/*> + * Get i-th value of array or hash. if i < 0 then it counts from> + * the end of array/hash. Note: returns pointer to statically> + * allocated JsonbValue.> + */> +JsonbValue *> +getJsonbValue(char *buffer, uint32 flags, int32 i)> +{> + uint32 header = *(uint32 *) buffer;> + static JsonbValue r;Really? And why on earth would static allocation be a good idea? Specifyit on the caller's stack if need be. Or even return by value, today'scalling convention will just allocate that on the caller's stack withoutcopying.Accessing static data isn't even faster.> + if (JBE_ISSTRING(*e))> + {> + r.type = jbvString;> + r.string.val = data + JBE_OFF(*e);> + r.string.len = JBE_LEN(*e);> + r.size = sizeof(JEntry) + r.string.len;> + }> + else if (JBE_ISBOOL(*e))> + {> + r.type = jbvBool;> + r.boolean = (JBE_ISBOOL_TRUE(*e)) ? true : false;> + r.size = sizeof(JEntry);> + }> + else if (JBE_ISNUMERIC(*e))> + {> + r.type = jbvNumeric;> + r.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));> +> + r.size = 2 * sizeof(JEntry) + VARSIZE_ANY(r.numeric);> + }> + else if (JBE_ISNULL(*e))> + {> + r.type = jbvNull;> + r.size = sizeof(JEntry);> + }> + else> + {> + r.type = jbvBinary;> + r.binary.data = data + INTALIGN(JBE_OFF(*e));> + r.binary.len = JBE_LEN(*e) - (INTALIGN(JBE_OFF(*e)) - JBE_OFF(*e));> + r.size = r.binary.len + 2 * sizeof(JEntry);> + }This bit of code exists pretty similarly in several places, maybe consolitate?> +/****************************************************************************> + * Walk on tree representation of jsonb *> + ****************************************************************************/> +static void> +walkUncompressedJsonbDo(JsonbValue *v, walk_jsonb_cb cb, void *cb_arg, uint32 level)> +{> + int i;check stack limit.> +void> +walkUncompressedJsonb(JsonbValue *v, walk_jsonb_cb cb, void *cb_arg)> +{> + if (v)> + walkUncompressedJsonbDo(v, cb, cb_arg, 0);> +}> +> +/****************************************************************************> + * Iteration over binary jsonb *> + ****************************************************************************/This needs docs.> +static void> +parseBuffer(JsonbIterator *it, char *buffer)> +{Why invent completely independent naming conventions to the previousfunctions here?> +static bool> +formAnswer(JsonbIterator **it, JsonbValue *v, JEntry * e, bool skipNested)> +{Imaginatively undescriptive name. But if it were slightly more moreabstracted away from JsonbIterator it could be the answer to my prayersabove about removing redundant code.> +static JsonbIterator *> +up(JsonbIterator *it)> +{Not a good name.> +int> +JsonbIteratorGet(JsonbIterator **it, JsonbValue *v, bool skipNested)> +{> + int res;recursive, stack depth check.> + switch ((*it)->type | (*it)->state)> + {> + case JB_FLAG_ARRAY | jbi_start:I don't know, but I don't see the point in avoid if (), else if()... constructs if it requires such dirty tricks.> +/****************************************************************************> + * Transformation from tree to binary representation of jsonb *> + ****************************************************************************/> +typedef struct CompressState> +{> + char *begin;> + char *ptr;> +> + struct> + {> + uint32 i;> + uint32 *header;> + JEntry *array;> + char *begin;> + } *levelstate, *lptr, *pptr;> +> + uint32 maxlevel;> +> +} CompressState;> +> +#define curLevelState state->lptr> +#define prevLevelState state->pptrbrrr.I stopped looking at code at this point.> diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c> index e1d8aae..50ddf50 100644> --- a/src/backend/utils/adt/jsonfuncs.c> +++ b/src/backend/utils/adt/jsonfuncs.cthere's lots of whitespace/tab damage in this file. Check git log/diff--check or such.This is still a mess, sorry:* Large and important part continue to be undocumented. Especially in jsonb_support.c* Lots of naming inconsistencies.* There's no documentation about what compressed/uncompressed jsonbs are. The former is the ondisk representation, thelatterthe in-memory tree representation. * There's no non-code documentation about the on-disk format.Unfortunately I can't see how this patch could get ready in time forthis CF. There's *lots* of work to be done. The code as is isn't goingto be maintainable. Much of it obvious by simply scanning through thecode, without even looking for higher level issues. And much of it haspreviously been pointed out, without getting real attention.That's not to speak of the nested hstore patch, which I didn't evenstart to look at. That's twice this patches size.Greetings,Andres Freund-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &Services
Предыдущее

От:Merlin Moncure
Дата:11 февраля2014г., 05:03:23
Сообщение:Re: jsonb and nested hstore
Следующее

От:Peter Eisentraut
Дата:11 февраля2014г., 05:51:34
Сообщение:Re: newlines at end of generated SQL
Re: jsonb and nested hstore - Архив списков рассылки pgsql-hackers (2024)

Предыдущее

Следующее

References