Skip to content

[Java][C] sliced RecordBatch offset info is lost when imported from c-data #88

@hellishfire

Description

@hellishfire

Describe the bug, including details regarding any error messages, version, and platform.

Reproduced on latest arrow release (16.0)

When importing a sliced RecordBatch from c to java

On c side:

auto sliced_record_batch = original_record_batch->Slice(/*offset=*/8, /*length=*/2);
arrow::ExportRecordBatch(sliced_record_batch, arrow_array_ptr);

On java side:

ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
Data.importIntoVectorSchemaRoot(allocator, arrowArray, vectorSchemaRoot, null);

The imported vectorSchemaRoot maintains the correct length(which is 2), but the offset info (which is 8) is not respected, hence the content of the imported vectorSchemaRoot points to the first 2 rows of the original_record_batch, while the desired content is sliced_record_batch.

I'm not familiar with arrow code, but it seems that the offset info is actually present in org.apache.arrow.c.ArrowArray.Snapshot, but org.apache.arrow.c.ArrayImporter ignores the offset in org.apache.arrow.c.ArrayImporter.doImport(ArrowArray.Snapshot)

Component(s)

Java

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions