Data is
at the heart of Power BI. Whenever you’re exploring data, creating charts and
dashboards, asking questions all thee visualizations and answers you see are
getting their underlying data from a dataset. That dataset must come from a
data source.
Below
are listed the different types of data sources you can connect to right from
your Power BI service site. You can get data from any of these data sources in
Power BI by clicking My Workspace > Get Data.
Files
Excel
(.xlsx, xlxm) – Excel is unique in that a workbook can have both data you’ve
entered into worksheets yourself, and you can query and load data from external
data sources by using Power Query or Power Pivot.
You can
import data that is in tables in worksheets (the data must be in a table), or
import data that is loaded into a data model. To learn more, see Get data from
Excel.
Power
BI Desktop (.pbix) - You can use Power BI Desktop to query and load data from
external data sources, extend your data model with measures and relationships,
and create reports. You can import your Power BI Desktop file into your Power
BI site.
Note: Power BI Desktop is best for more advanced users who have a good understanding of their data sources, data query and transformation, and data modeling concepts.
Note: Power BI Desktop is best for more advanced users who have a good understanding of their data sources, data query and transformation, and data modeling concepts.
Comma
Separated Value (.csv) - Files are simple text files with rows of data. Each
row can contain one or more values, each separated by a comma. For example, a
.csv containing name and address data can have a number of rows where each row
has values for first name, last name, street address, city. You cannot import
data into a .csv file, but many applications, like Excel, can save simple table
data as a .csv file.
For
other file types like XML Table (.xml) or text (.txt) files, you can use Get
& Transform to query, transform, and load that data into an Excel or Power
BI Desktop file first. You can then import the Excel or Power BI Desktop file
into Power BI.
Where
you store your files makes a big difference, too. OneDrive for Business
provides the greatest amount of flexibility and integration with Power BI. If
you keep your files on your local drive, that’s ok, but if you need to refresh
your data, a few extra steps are involved. More details are provided in the linked
articles.
Content packs
Content
packs contain all of the data and reports you need already prepared for you. In
Power BI, there are two types of content packs; those from services like Google
Analytics, Marketo, or Salesforce, and those created and shared by other users
in your organization.
Services – There are literally dozens of
services with content packs for Power BI, and more are being added all the
time. Most services require you to have an account. To learn more, see Connect
to services.
Organizational – If you and other users in your
organization have a Power BI Pro account, you can create, share, and use
content packs. To learn more, see Organizational content packs.
Databases
Databases in the Cloud – From
the Power BI service, you can connect live to Azure SQL Database, Azure SQL
Data Warehouse, Spark on Azure HD Insight, and SQL Server Analysis Services
using DirectQuery. Connections from Power BI to these databases are live, that
is, when you’ve connected to say an Azure SQL Database, and you begin exploring
its data by creating reports in Power BI, anytime you slice your data or add
another field to a visualization, a query is made right to the database.
Databases on-premises – From
the Power BI service, you can connect directly to SQL Server Analysis Services
Tabular model databases. A Power BI Enterprise Gateway is required. If you’re
unsure how to connect to your organization’s tabular model database, check with
your administrator or IT department.
For
other types of databases, you’ll need to first use Power BI Desktop or Excel to
connect to, query, and load data into a data model. You can then import your
file into Power BI where a dataset is created. If you set up scheduled refresh,
Power BI will use connection information from the file along with refresh
settings you configure to connect directly to the data source and query for
updates. Those updates are then loaded into the dataset in Power BI.
What if my data comes from a
different source?
There
are literally hundreds of different data sources you can use with Power BI. But
regardless of where you get your data from, that data has to be in a format the
Power BI service can use to create reports and dashboards, answer questions and
so on.
Some
data sources already have their data in a format ready for the Power BI
service, like content packs from service providers such as Google Analytics,
and Twilio. SQL Server Analysis Services Tabular model databases are ready,
too. Also you can connect live to databases in the cloud like Azure SQL
Database and Spark on HDInsight.
In
other cases, it might be necessary to query and load the data you want into a
file. For example, you have logistics data in a data warehouse database on a
server in your organization. In the Power BI service, you cannot connect
directly to that database and begin exploring its data (unless it is a tabular
model database). You can, however, use Power BI Desktop or Excel to query and
load that logistics data into a data model you then save as a file. You can
then import that file into Power BI where a dataset is created.
You’re
probably thinking “But that logistics data on that database changes every day.
How do I make sure my dataset in Power BI is refreshed?” Connection information
from the Power BI Desktop or Excel file is imported into the dataset along with
the data. If you set up scheduled refresh or do a manual refresh on the
dataset, Power BI will use the connection information from the dataset, along
with a couple other settings, to connect directly to the database, query for
updates, and load those updates into the dataset. A Power BI Gateway will
likely be required to secure any data transfer between your on-premises server
and Power BI. Any visualizations in reports and dashboards are refreshed
automatically.
Dataset and Data Source
You’ll
see the terms dataset and data source used a lot in Power BI. They’re often use
synonymously, but they really are two different things, albeit related.
A dataset is automatically created in
Power BI when you use Get Data to connect to and import data from a content
pack, file, or you connect to a live data source. A dataset contains
information about the data source, data source credentials, and in many cases,
a sub-set of data copied from the data source. In most cases, when you create
visualizations in reports and dashboards, you’re looking at data in the
dataset.
A data source is where the data in a
dataset really comes from. For example, an online service like Google Analytics
or QuickBooks, a database in the cloud like Azure SQL Database, or a database
or file on a local computer or server in your own organization.
Data refresh
If you
save your files on your local drive, or a drive somewhere in your organization,
a Power BI Gateway might be required in-order to refresh the dataset in Power
BI. Also, the computer where the file is saved must be on when a refresh
happens. You can also re-import your file, or use Publish from Excel or Power
BI Desktop, but those are not automated processes.
If you
save your files on OneDrive for Business or SharePoint – Team Sites, and then
connect to or import them into Power BI, your dataset, reports, and dashboard
will always be up-to-date. Because both OneDrive and Power BI are in the Cloud,
Power BI can connect directly to your saved file, about once every hour, and
check for updates. If any are found, the dataset and any visualizations are
refreshed automatically.
Content
packs from services are automatically updated. In most cases, once a day. You
can manually refresh but, whether or not you’ll see any updated data, will
depend on the service provider. Content packs from others in your organization
will depend on the data sources used and how the person who created the content
pack setup refresh.
Azure
SQL Database, Azure SQL Data Warehouse, and Spark on Azure HDInsight are unique
in that they are data sources in the Cloud. Because the Power BI service is
also in the cloud, Power BI can connect to them live, using DirectQuery. What
you see in Power BI is always in-sync and there’s no need to setup refresh.
SQL
Server Analysis Services is unique in that when you connect to it from Power
BI, it’s a live connection just like an Azure database in the cloud, but the
database itself is on a server in your organization. This type of connection
requires a Power BI Gateway, which is usually configured by an IT department.
Considerations and Limitations
For all
data sources used in the Power BI service, the following considerations and
limitations apply. There are other limitations that apply to specific features,
but the following list apply to the Power BI service overall:
Dataset
size limit - there is a 1 GB limit for each dataset in the Power BI service.
Row
limit - the maximum number of rows in your dataset (when not using DirectQuery)
is 2 billion, with three of those rows reserved (resulting in a usable maximum
of 1,999,999,997 rows); the maximum number of rows when using DirectQuery is 1
million rows.
Column
limit - the maximum number of columns allowed in a dataset, across all tables
in the dataset, is 16,000 columns. This applies to the Power BI service and to
datasets used in Power BI Desktop. Power BI uses an internal row number column
per table included in the dataset, which means the maximum number of columns is
16,000 minus one for each table used in the dataset.
No comments:
Post a Comment