AZURE DATA FACTORY INTERVIEW QUESTIONS: FILES FORMATS

 21.Diffrence types of file formats?

1.Delimited text files 

     CSV Files and TSV Files

 CSV Files:- 

          *Comma Separated Values

          * In which comma ,pipe etc. are used to separate the fields and values in CSV files.

 TSV Files:-

          * Tab separated values.

          *In which tab or space are used to separate the fields and values.

          *Fixed width data in which each field is allocated a fixed number of characters.

2.JSON

          *Java Script Object Notation (JSON).

           *JSON format file is a standard data interchange format. 

          *JSON files are text-based, human-readable, can be edited easily. 

   example:-

         {

          "metadata" :{

                  "Origin" : "Application",

                  "Alt" : {

                            "CorrelationId" : "1234"

                      },

                   "EventTimeStamp" :"67325282482"

                   "Event Name" : "Update"

                },

         { "PayLoad" : {

                "Application" : {

                        "Application Id" : "6274764",

                         "customer Id" : "65124851",

                         " Product " : "Finance ",

                          "Previous State " : "Submitted" }

                  }

           }

 3.XML :-

           *Extensible Markup Language.

           *It is human readable data format that was popular in the 1990's and 2000's.

           *It is same as JSON format but not  exactly.

           *XML uses tags enclosed in the angle brackets ( <....... >) to define the elements and                 attributes (Columns)

      Example:-

        <customers>

                <customer name = "Joe " last name="Yash">

                       <customer details>

                              <customer type ="Home" number="784651">

                              <customer type ="Email" number="926312">

                       < / customer details >

        < / customers >


4.BLOB :-

         *Binary Large Object (BLOB)

         *Blobs are typically Images ,audio or other multimedia objects stored as a blob in the                 binary form or unstructured data.

5.AVRO :-

         *It is a Row-based format.

         *Each record contains a header that describes the structure of the data in the                             record .This record is stored in JSON and data is stored as binary information.

         *Easy to compress Data ,Minimizing storage .

  Example:

        {

  "type": "record",
  "name": "thecodebuzz_schema",
  "namespace": "thecodebuzz.avro",
  "fields": [
    {
      "name": "username",
      "type": "string",
      "doc": "Name of the user account on Thecodebuzz.com"
    },
    {
      "name": "email",
      "type": "string",
      "doc": "The email of the user logging message on the blog"
    },
    {
      "name": "timestamp",
      "type": "long",
      "doc": "time in seconds"
    }
  ],
  "doc:": "A basic schema for storing the code buzz blogs messages"
}

6.ORC:-

        *Optimized Row Columnar (ORC) format.

        *It organizes data into Columns rather than Rows.

        *An ORC file contains Stripes of data.

        *Each Stripe contains or holds the data for a column or set of columns.

        *The Data should be stored in a columnar manner. Each column is stored separately,

         enabling efficient compression and selective reading of data.

7.PARQUET:-

        *Columnar Data Format.

        *A Parquet file contains row groups.

        *Data in each column is stored together in the same row group.

        *Each row group contains one or more chunks of data.

        *A Parquet file includes metadata that describes the set of rows found in each chunk.

       *It is not in human readable format.

       *Mostly used file format because of more security.


       




   


Comments

Popular posts from this blog

SQL Interview Questions:3

AZURE DATA FACTORY INTERVIEW QUESTIONS :ACTIVITIES AND TYPES.