Your Location is: Home > Mysql

Most efficient way to remove Duplicate JSON entries from Array?

From: Amsterdam View: 3275 Floobinator 

Question

I am curious what technique developers use to remove duplicate JSON entries in a JSON Array. For example:

[{"name":"philip"},{"name":"steve"},{"name":"philip"},{"name":"seth"}]

The goal is to remove the extra "philip".

Now I'm doing this on an array of objects that can be quite large (hundreds). Granted I can write code to iterate them manually, but speed is a factor.

I did try the technique of converting the JSON to a table and then using DISTINCT to scrub the data:

-- Clean the data by removing duplicate entries. We do this by using the new JSON_TABLE function.
-- We convert the JSON array to a table, and then re-select it as DISTINCT on the id value.
SELECT JSON_ARRAYAGG(JSON_OBJECT('id',sit_distinct.id,'name',sit_distinct.sit_name,'type',sit_distinct.sit_type_c)), 
    COUNT(*) INTO json_data_total, var_count_total 
    FROM
    (
    SELECT DISTINCT(id), sit_name, sit_type_c 
        FROM JSON_TABLE(json_data_total, '$[*]' COLUMNS(rowid FOR ORDINALITY, id int PATH '$.id', sit_name varchar(128) PATH '$.name', sit_type_c tinyint PATH '$.type')) AS JT ORDER BY id
    ) AS sit_distinct;

While it works perfectly on my local machine, MySQL on RDS crashes with the above. Definitely some sort of internal error I may not be able to work around.

For reference, here's the internal error the above code causes:

18:53:39 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x2b9d3eda6000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2b9d4627fcc0 thread_stack 0x80000
/rdsdbbin/mysql/bin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x2e) [0x1db533e]
/rdsdbbin/mysql/bin/mysqld(handle_fatal_signal+0x2f3) [0xf79383]
/lib64/libpthread.so.0(+0xf100) [0x2b9c90cc1100]
/rdsdbbin/mysql/bin/mysqld(temptable::Table::insert(unsigned char const*)+0x17) [0x2336ca7]
/rdsdbbin/mysql/bin/mysqld(handler::ha_write_row(unsigned char*)+0x184) [0x10848e4]
/rdsdbbin/mysql/bin/mysqld(Table_function::write_row()+0x1b) [0x12534db]
/rdsdbbin/mysql/bin/mysqld(Table_function_json::fill_json_table()+0x2e8) [0x1255af8]
/rdsdbbin/mysql/bin/mysqld(Table_function_json::fill_result_table()+0xc4) [0x1255bf4]
/rdsdbbin/mysql/bin/mysqld(MaterializedTableFunctionIterator::Init()+0x16) [0x10304a6]
/rdsdbbin/mysql/bin/mysqld(MaterializeIterator::Init()+0xf2) [0x1032752]
/rdsdbbin/mysql/bin/mysqld(filesort(THD*, Filesort*, bool, RowIterator*, Sort_result*, unsigned long long*, unsigned long long*)+0x409) [0x1069bb9]
/rdsdbbin/mysql/bin/mysqld(SortingIterator::DoSort(QEP_TAB*)+0xa9) [0xdd8529]
/rdsdbbin/mysql/bin/mysqld(SortingIterator::Init()+0x47) [0xdd8657]
/rdsdbbin/mysql/bin/mysqld(MaterializeIterator::Init()+0xf2) [0x1032752]
/rdsdbbin/mysql/bin/mysqld(AggregateIterator::Init()+0x16) [0x1030426]
/rdsdbbin/mysql/bin/mysqld(JOIN::exec()+0x425) [0xe25155]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x2ee) [0xea54ce]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x438) [0xead1e8]
/rdsdbbin/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x1c64) [0xe5c0b4]
/rdsdbbin/mysql/bin/mysqld(sp_instr_stmt::exec_core(THD*, unsigned int*)+0x50) [0xde0cf0]
/rdsdbbin/mysql/bin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x19b) [0xde2b0b]
/rdsdbbin/mysql/bin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xab) [0xde373b]
/rdsdbbin/mysql/bin/mysqld(sp_instr_stmt::execute(THD*, unsigned int*)+0xb1) [0xde50e1]
/rdsdbbin/mysql/bin/mysqld(sp_head::execute(THD*, bool)+0x5a8) [0xdda728]
/rdsdbbin/mysql/bin/mysqld(sp_head::execute_procedure(THD*, List<Item>*)+0x714) [0xddd674]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_call::execute_inner(THD*)+0x14a) [0x122704a]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x438) [0xead1e8]
/rdsdbbin/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x1c64) [0xe5c0b4]
/rdsdbbin/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x30f) [0xe5f00f]
/rdsdbbin/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x28a9) [0xe61da9]
/rdsdbbin/mysql/bin/mysqld(do_command(THD*)+0x179) [0xe62d39]
/rdsdbbin/mysql/bin/mysqld() [0xf6b370]
/rdsdbbin/mysql/bin/mysqld() [0x22b01e7]
/lib64/libpthread.so.0(+0x7dc5) [0x2b9c90cb9dc5]
/lib64/libc.so.6(clone+0x6d) [0x2b9c914dcc9d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (2b9d481f5200): SELECT JSON_ARRAYAGG(JSON_OBJECT('id',sit_distinct.id,'name',sit_distinct.sit_name,'type',sit_distinct.sit_type_c)), COUNT(*) INTO json_data_total, var_count_total FROM ( SELECT DISTINCT(id), sit_name, sit_type_c FROM JSON_TABLE( NAME_CONST('json_data_total',_utf8mb4'[{\"id\": 1023, \"name\": \"Automotive\", \"type\": 1}, {\"id\": 1072, \"name\": \"Hospitality\", \"type\": 1}, {\"id\": 1076, \"name\": \"Industrial Automation\", \"type\": 1}, {\"id\": 1082, \"name\": \"Internet\", \"type\": 1}, {\"id\": 1101, \"name\": \"Medical Devices\", \"type\": 1}, {\"id\": 1138, \"name\": \"Research\", \"type\": 1}, {\"id\": 1160, \"name\": \"Wireless\", \"type\": 1}]' COLLATE 'utf8mb4_bin'), '$[*]' COLUMNS(rowid FOR ORDINALITY, id int PATH '$.id', sit_name varchar(128) PATH '$.name', sit_type_c tinyint PATH '$.type')) AS JT ORDER BY id ) AS sit_distinct
Connection ID (thread ID): 21
Status: NOT_KILLED

So, I need to either figure out how to prevent this error (which ONLY occurs on RDS and not on my local machine) or come up with a new technique that is quick and efficient - as mentioned, I know I can write my own code to do it, but I was hoping to use a quicker more efficient internal approach.

Any thoughts are appreciated!

Best answer