AZURE DATA FACTORY INTERVIEW QUESTIONS: FILES FORMATS

21.Diffrence types of file formats?

1.Delimited text files

CSV Files and TSV Files

CSV Files:-

*Comma Separated Values

* In which comma ,pipe etc. are used to separate the fields and values in CSV files.

TSV Files:-

* Tab separated values.

*In which tab or space are used to separate the fields and values.

*Fixed width data in which each field is allocated a fixed number of characters.

2.JSON

*Java Script Object Notation (JSON).

*JSON format file is a standard data interchange format.

*JSON files are text-based, human-readable, can be edited easily.

example:-

{

"metadata" :{

"Origin" : "Application",

"Alt" : {

"CorrelationId" : "1234"

"EventTimeStamp" :"67325282482"

"Event Name" : "Update"

{ "PayLoad" : {

"Application" : {

"Application Id" : "6274764",

"customer Id" : "65124851",

" Product " : "Finance ",

"Previous State " : "Submitted" }

}

3.XML :-

*Extensible Markup Language.

*It is human readable data format that was popular in the 1990's and 2000's.

*It is same as JSON format but not exactly.

*XML uses tags enclosed in the angle brackets ( <....... >) to define the elements and attributes (Columns)

Example:-

< / customer details >

< / customers >

4.BLOB :-

*Binary Large Object (BLOB)

*Blobs are typically Images ,audio or other multimedia objects stored as a blob in the binary form or unstructured data.

5.AVRO :-

*It is a Row-based format.

*Each record contains a header that describes the structure of the data in the record .This record is stored in JSON and data is stored as binary information.

*Easy to compress Data ,Minimizing storage .

Example:

{

  "type": "record",

  "name": "thecodebuzz_schema",

  "namespace": "thecodebuzz.avro",

  "fields": [

{

      "name": "username",

      "type": "string",

      "doc": "Name of the user account on Thecodebuzz.com"

},

{

      "name": "email",

      "type": "string",

      "doc": "The email of the user logging message on the blog"

},

{

      "name": "timestamp",

      "type": "long",

      "doc": "time in seconds"

}

],

  "doc:": "A basic schema for storing the code buzz blogs messages"

}

6.ORC:-

*Optimized Row Columnar (ORC) format.

*It organizes data into Columns rather than Rows.

*An ORC file contains Stripes of data.

*Each Stripe contains or holds the data for a column or set of columns.

*The Data should be stored in a columnar manner. Each column is stored separately,

enabling efficient compression and selective reading of data.

7.PARQUET:-

*Columnar Data Format.

*A Parquet file contains row groups.

*Data in each column is stored together in the same row group.

*Each row group contains one or more chunks of data.

*A Parquet file includes metadata that describes the set of rows found in each chunk.

*It is not in human readable format.

*Mostly used file format because of more security.

Search This Blog

SQL/ADF/ADB Interview Question

AZURE DATA FACTORY INTERVIEW QUESTIONS: FILES FORMATS

Comments

Post a Comment

Popular posts from this blog

SQL Interview Questions:3

AZURE DATA FACTORY INTERVIEW QUESTIONS :ACTIVITIES AND TYPES.