lakeapi package

Submodules

lakeapi.main module

lakeapi.main.available_symbols(table: Optional[Literal['book', 'book_delta', 'trades', 'trades_mpid', 'candles', 'level_1', 'funding', 'open_interest', 'liquidations', 'book_1m']], exchanges: Optional[List[str]] = None, *, bucket: Optional[str] = None, boto3_session: Optional[Session] = None)[source]

Return pd.Series containing count of days available for exchange-symbol combinations.

Contains multi-index of exchange, symbol pairs.

lakeapi.main.list_data(table: Optional[Literal['book', 'book_delta', 'trades', 'trades_mpid', 'candles', 'level_1', 'funding', 'open_interest', 'liquidations', 'book_1m']], start: Optional[datetime] = None, end: Optional[datetime] = None, symbols: Optional[List[str]] = None, exchanges: Optional[List[str]] = None, *, bucket: Optional[str] = None, boto3_session: Optional[Session] = None, last_modified_begin: Optional[datetime] = None, last_modified_end: Optional[datetime] = None) List[Dict[str, str]][source]

Returns list of all data s3 objects matching given conditions.

Elements describing s3 objects are dicts containing keys table, exchange, symbol, dt, filename.

lakeapi.main.load_data(table: Literal['book', 'book_delta', 'trades', 'trades_mpid', 'candles', 'level_1', 'funding', 'open_interest', 'liquidations', 'book_1m'], start: Optional[datetime] = None, end: Optional[datetime] = None, symbols: Optional[List[str]] = None, exchanges: Optional[List[str]] = None, *, bucket: Optional[str] = None, boto3_session: Optional[Session] = None, use_threads: bool = True, columns: Optional[List[str]] = None, row_slice: Optional[slice] = None, drop_partition_cols: bool = False, cached: bool = True) DataFrame[source]

Load data from Lake into Pandas DataFrame.

Fetches data from a range of exchanges/symbols/dates and returns them as a Pandas DataFrame. All network access can be cached into a .lake_cache directory, which is created in the working directory.

Parameters:
  • table – Data type to load. Eg. book (2x20 level order book snapshots ), trades, candles, level_1, funding, open_interest and more

  • start – Start datetime of data to load. If None, loads all data until end. Will be rounded to midnight.

  • end – End datetime of data to load. If None, loads all data from start. Will be rounded to midnight.

  • symbols – List of symbols to load. If None, loads all symbols available. Eg. [‘BTC-USDT’, ‘ETH-USDT’]

  • exchanges – List of exchanges to load. If None, loads all exchanges available. Eg. [‘BINANCE’, ‘BINANCE_FUTURES’, KUCOIN’]

  • bucket – S3 bucket to load data from. Reserved for internal usage.

  • boto3_session – Boto3 session to use for loading data. Usually left None = create a new session.

  • use_threads – Whether to use multiple threads for loading data for better performance.

  • columns – List of columns to load. If None, loads all columns available.

  • row_slice – DEPRECATED

  • drop_partition_cols – Whether to drop columns (dt, symbol and exchange) from the DataFrame. Useful when loading just one symbol and exchange.

  • cached – Whether to use file system cache for data download. However, always uses cache for file listing.

lakeapi.main.set_cache_size_limit(limit_bytes: int) None[source]

Set cache size limit in bytes.

Parameters:

limit_bytes – Cache size limit in bytes.

lakeapi.main.set_default_bucket(bucket: str) None[source]
lakeapi.main.use_sample_data(anonymous_access: bool) None[source]

Use sample data lake configuration, which is free for testing Lake.

Parameters:

anonymous_access – Whether to enable anonymous AWS access, that can be used without AWS credentials.

Module contents

Top-level package for Lake API.